refactor: separate .env from deployed endpoint env vars by deanq · Pull Request #257 · runpod/flash

deanq · 2026-03-05T18:26:43Z

Summary

Stop auto-carrying .env file contents to deployed endpoints
ServerlessResource.env defaults to None instead of reading .env via get_env_vars()
Delete EnvironmentVars class and environment.py
Stop stripping RUNPOD_API_KEY from explicit resource env in manifest
Fix flash run hot-reload not propagating env changes (module cache busting)
Fix CPU config_hash to include env for drift detection
Preserve platform-injected env vars (PORT, FLASH_HOST, etc.) during template updates

Motivation

.env files were implicitly dumped into every deployed endpoint, causing:

Platform-injected vars (PORT, PORT_HEALTH) overwritten on template updates
False config drift from runtime var injection into self.env
User confusion about what actually reaches deployed workers

How it works now

.env still populates os.environ via load_dotenv() for CLI and get_api_key() usage
To pass env vars to deployed endpoints, declare them explicitly: env={"HF_TOKEN": os.environ["HF_TOKEN"]}
Flash injects runtime vars (RUNPOD_API_KEY, FLASH_MODULE_PATH) into template.env automatically
On env changes, update() reads the live template via get_template and preserves platform-injected vars
flash run hot-reload properly reimports user modules with updated env values

Breaking change

ServerlessResource.env no longer defaults to .env file contents. Users relying on implicit carryover must add explicit env={} to their resource configs.

Upgrade note

Existing deployed endpoints have .env contents baked into resource.env. After upgrading, env defaults to None, so the config hash changes. flash deploy will flag all previously-deployed endpoints as drifted and trigger redeployment on the first deploy after upgrading. This is expected and harmless — the endpoints will be redeployed with only explicitly declared env vars.

Companion PR

flash-examples docs: docs: clarify .env scope for env separation flash-examples#39

Test plan

All 2533 tests pass (85.4% coverage)
Format and lint clean
Platform env preservation verified against real Runpod API (get_template query)
User-removed env vars correctly not resurrected
CPU config_hash includes env for drift detection
Module cache busting in generated server.py for hot-reload

src/runpod_flash/cli/utils/env_preview.py

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

src/runpod_flash/cli/utils/env_preview.py:1

The _SECRET_PATTERN regex matches any key containing TOKEN as a substring, so TOKENIZER_PATH gets masked even though it's not a secret. This is a false positive — environment variables like TOKENIZER_PATH, TOKEN_COUNT, or BUCKET_NAME (no match, but illustrative) could be misleadingly masked. The test at line 52-55 in test_env_preview.py confirms this behavior as intended, but the docstring says "TOKEN in TOKENIZER should still match" which suggests this was a deliberate choice. However, this will surprise users who have non-secret vars with these substrings. Consider using word-boundary matching (e.g., r"\b(KEY|TOKEN|SECRET|PASSWORD|CREDENTIAL)\b" or r"(^|_)(KEY|TOKEN|SECRET|PASSWORD|CREDENTIAL)($|_)") to avoid masking keys like TOKENIZER_PATH or TOKENIZER_CONFIG.

"""Deploy-time env preview: show what env vars go to each endpoint."""

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

src/runpod_flash/cli/utils/env_preview.py

docs/plans/2026-03-05-env-separation-plan.md

docs/plans/2026-03-05-env-separation-design.md

runpod-Henrik

Review

Clean, well-motivated change. The core fix (env defaults to None instead of implicitly loading .env) removes a real footgun. Good test coverage in test_env_separation.py. One confirmed bug and two nits below.

Bug: preview/deploy parity broken — `makes_remote_calls` default mismatch

_do_deploy calls _check_makes_remote_calls() which defaults to True in all failure cases:

# serverless.py:583
makes_remote_calls = resource_config.get("makes_remote_calls", True)
# also: "Manifest not found, assuming makes_remote_calls=True"
# also: "Resource not in manifest, assuming makes_remote_calls=True"

collect_env_for_preview uses the opposite default:

# env_preview.py
config.get("makes_remote_calls", False)  # ← should be True

On first deploy (no prior build/manifest), the preview shows no RUNPOD_API_KEY for QB endpoints but deploy injects it. The preview is wrong exactly when it matters most — the first time a user runs flash deploy.

Fix: config.get("makes_remote_calls", True) in env_preview.py to match deploy behaviour.

Nit: `env={}` explicit empty dict won't appear in manifest

manifest.py uses:

if hasattr(resource_config, "env") and resource_config.env:

An explicit env={} is falsy so it's silently skipped. Probably the right behaviour (nothing to store), but test_env_separation.py::test_serverless_resource_env_explicit_empty_dict_preserved only checks the resource object — not the manifest path. Worth a test or a comment to make the intent explicit.

Nit: `_MASK_VISIBLE_CHARS = 6` exposes key type prefix

RUNPOD_API_KEY values start with rp_ — 6 visible chars reveals the full type prefix (rp_xxx). Not a security issue since the prefix is public, but worth being intentional about if multiple key formats are ever supported.

Positives

Removing env_dict.pop("RUNPOD_API_KEY", None) in manifest.py is correct — user-explicit keys should be preserved, not silently stripped
except Exception: logger.debug(...) around render_env_preview in deploy.py is the right call — preview failure must not block deploy
test_env_separation.py covers the _configure_existing_template None case, which was the trickiest correctness question
Companion flash-examples PR (#39) is the right process for a breaking change

Verdict: PASS with fix — the makes_remote_calls default is a one-line fix but should land before merge to avoid shipping a misleading preview.

🤖 Reviewed by Henrik's AI-Powered Bug Finder

runpod-Henrik

1. Core change — clean and correct

env default changing from Field(default_factory=get_env_vars) to Field(default=None) is the right fix. Template creation falls back to self.env or {}, so None and {} both produce an empty template env. EnvironmentVars / environment.py deletion is complete — no remaining references.

The hash behavior is correct: exclude_none=True means env=None (default) is excluded from the drift hash, and setting env={"HF_TOKEN": "..."} correctly triggers drift detection.

2. Issue: all existing deployed endpoints will drift on upgrade

This is an expected consequence of the breaking change, but it should be in the upgrade notes: any endpoint deployed with the old SDK had .env contents baked into resource.env. After upgrading, env defaults to None, so the hash changes for every endpoint that previously had env vars → flash deploy will flag all of them as drifted and redeploy. Users should be warned to expect forced redeployments on their first deploy after upgrading.

3. Question: preview's `makes_remote_calls` default vs actual injection default

In collect_env_for_preview:

if not is_lb and config.get("makes_remote_calls", False):   # default: False

In _inject_runtime_template_vars → _check_makes_remote_calls:

log.debug("Manifest not found, assuming makes_remote_calls=True")  # default: True

If a resource's manifest entry lacks the makes_remote_calls key, the preview shows no RUNPOD_API_KEY while the actual deploy injects it. Worth confirming that the scanner always writes makes_remote_calls explicitly for every resource — if so, this default never fires and the preview is accurate. If not, the preview undersells what gets deployed.

4. Testing — solid

267-line test_env_preview.py covers masking, collect_env_for_preview, and rendering. 99-line test_env_separation.py covers the field default, explicit dict preservation, and template creation. The test_exactly_six_chars_fully_masked test correctly captures that 6-char values are fully masked rather than showing abcdef...****.

One gap: no test for the "user explicitly sets RUNPOD_API_KEY in env={}" path — the preview would show it as "user" source but the old code stripped it. Not a bug (the behavior is intentional now), but a test would lock in the intent.

Verdict

PASS with one upgrade-note item (item 2) and one question to confirm (item 3). The separation is clean, the deletion is thorough, and the preview table is well-tested.

src/runpod_flash/cli/commands/deploy.py

- Remove render_env_preview call from deploy command - Delete env_preview.py module and its tests (no other consumers) Addresses PR #257 feedback: table was visual clutter

Copilot

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

src/runpod_flash/core/resources/serverless_cpu.py

docs/plans/2026-03-05-env-separation-plan.md

docs/LoadBalancer_Runtime_Architecture.md

src/runpod_flash/cli/commands/build_utils/manifest.py

src/runpod_flash/core/resources/serverless.py

- Remove render_env_preview call from deploy command - Delete env_preview.py module and its tests (no other consumers) Addresses PR #257 feedback: table was visual clutter

- Use `is not None` instead of truthiness for env checks in serverless.py, serverless_cpu.py, and manifest.py so explicit env={} clears template env - Add RUNPOD_API_KEY injection note to LB architecture doc - Remove Claude-specific directive from plan doc

runpod-Henrik

Delta review — since 2026-03-12

From the 2026-03-06 review:

Bug: makes_remote_calls default mismatch in env_preview.py → RESOLVED. The env preview table was removed entirely (c1a98e2), so the mismatch no longer exists. The bug was valid at the time, but the decision to pull the preview table sidesteps it.
Nit: env={} skipped in manifest (falsy check) → RESOLVED. Latest commit (d4ecd776) changed and resource_config.env to and resource_config.env is not None in manifest.py. Consistent with the same fix applied to _configure_existing_template in serverless.py and serverless_cpu.py.
Nit: _MASK_VISIBLE_CHARS = 6 → MOOT. Preview table removed.

From the 2026-03-12 review:

Item 2: Upgrade drift warning — Still unaddressed. Any endpoint deployed with the old SDK has .env contents baked into resource.env. After upgrading, env defaults to None, the hash changes, and flash deploy flags every pre-existing endpoint as drifted. Should be in upgrade notes or a migration note in the PR body.
Item 3: makes_remote_calls default question → MOOT. Preview removed.

New concerns on current diff:

Issue: CPU config_hash silently ignores explicit env= changes. CpuServerlessEndpoint.config_hash hashes a fixed set of CPU fields and explicitly excludes env. Under the old model (env was a dynamic .env dump) this was reasonable. Now that env is user-declared intent, a user who adds env={"HF_TOKEN": "..."} to a CPU endpoint and redeploys will get no drift detection, no update, and no error. The base class correctly includes env in the hash; the CPU override silently drops it. If this is intentional (env changes don't trigger CPU re-deploy by design), it needs a comment explaining why. As written, it looks like an oversight.
PR body references env preview table that was removed. The summary still lists "Add deploy-time env preview table with secret masking" and the checklist has items for verifying preview behavior. The feature doesn't exist in the final diff. Worth cleaning up before merge to avoid confusing future readers.

Overall: The two bugs from the 2026-03-06 review are closed. The main open item is the CPU config_hash asymmetry — silent failure for a user-facing feature after a correctness fix. The upgrade drift note is still worth adding.

🤖 Reviewed by Henrik's AI-Powered Bug Finder

KAJdev

from what im seeing that env drift detection bug is real, but only present in flash run. If you flash run, send a request, change the env, have it auto reload, then send another request, the env is not updated at all

I have verified this

2026-03-19 17:00:05,739 | INFO  | CpuLiveServerless:c6i34mfjfj9gct | API /run
2026-03-19 17:00:06,230 | INFO  | CpuLiveServerless:c6i34mfjfj9gct | Started Job:0be84094-e004-4f10-b15c-1edab200e234-u1
2026-03-19 17:00:06,330 | INFO  | Job:0be84094-e004-4f10-b15c-1edab200e234-u1 | Status: IN_QUEUE
2026-03-19 17:00:06,847 | INFO  | Job:0be84094-e004-4f10-b15c-1edab200e234-u1 | .
2026-03-19 17:00:08,425 | INFO  | Job:0be84094-e004-4f10-b15c-1edab200e234-u1 | ..
2026-03-19 17:00:14,093 | INFO  | Job:0be84094-e004-4f10-b15c-1edab200e234-u1 | ...
2026-03-19 17:00:32,335 | INFO  | Job:0be84094-e004-4f10-b15c-1edab200e234-u1 | ....
2026-03-19 17:00:53,938 | INFO  | Job:0be84094-e004-4f10-b15c-1edab200e234-u1 | .....
2026-03-19 17:01:04,267 | INFO  | Job:0be84094-e004-4f10-b15c-1edab200e234-u1 | Status: COMPLETED
2026-03-19 17:01:04,426 | INFO  | Worker:4tfsc74uch0vfj | Delay Time: 48063 ms
2026-03-19 17:01:04,427 | INFO  | Worker:4tfsc74uch0vfj | Execution Time: 42 ms
WARNING:  WatchFiles detected changes in '.flash/server.py'. Reloading...
2026-03-19 17:01:15,994 | INFO  | CpuLiveServerless:c6i34mfjfj9gct | API /run
2026-03-19 17:01:16,179 | INFO  | CpuLiveServerless:c6i34mfjfj9gct | Started Job:af0c02dd-19c1-42f7-a9ac-db3788d80435-u1
2026-03-19 17:01:16,280 | INFO  | Job:af0c02dd-19c1-42f7-a9ac-db3788d80435-u1 | Status: IN_PROGRESS
2026-03-19 17:01:16,483 | INFO  | Job:af0c02dd-19c1-42f7-a9ac-db3788d80435-u1 | Status: COMPLETED
2026-03-19 17:01:16,578 | INFO  | Worker:4tfsc74uch0vfj | Delay Time: 127 ms
2026-03-19 17:01:16,578 | INFO  | Worker:4tfsc74uch0vfj | Execution Time: 40 ms

(the edit was a change to the env)

I verified that the change didnt apply to the endpoint in the console.

Change ServerlessResource.env default from get_env_vars() (reads .env) to None. Delete EnvironmentVars class and get_env_vars(). Template creation now uses self.env or {} instead of falling back to .env file. .env still populates os.environ via load_dotenv() in __init__.py for CLI and get_api_key() usage. This change only affects what gets sent to deployed endpoints.

Replace TestServerlessResourceEnvLoading tests that imported deleted get_env_vars/EnvironmentVars with tests that verify env defaults to None and preserves explicit dicts.

With env separation, resource.env only contains user-declared vars. If a user explicitly sets RUNPOD_API_KEY in their resource env, it should be preserved. Runtime injection via _inject_runtime_template_vars() handles the automatic case.

New module renders a Rich table per resource showing all env vars that will be sent to deployed endpoints. User-declared vars shown directly; flash-injected vars (RUNPOD_API_KEY, FLASH_MODULE_PATH) labeled as 'injected by flash'. Secret values masked based on key pattern matching (KEY, TOKEN, SECRET, PASSWORD, CREDENTIAL). Wired into flash deploy: renders before provisioning so users see exactly what goes to each endpoint.

Clarify that .env populates os.environ for CLI/local dev only. Resource env vars must be explicit via env={} on resource config. RUNPOD_API_KEY injected automatically for makes_remote_calls=True.

- Fix preview/deploy mismatch: LB endpoints no longer show RUNPOD_API_KEY injection in preview (matches _do_deploy behavior) - Wrap render_env_preview in try-except so preview failures don't abort deployment - Fix stale comment referencing .env file in serverless_cpu.py - Correct drift detection doc: env is conditionally included in hash, not always excluded - Fix LB architecture doc: LB endpoints get FLASH_MODULE_PATH, not RUNPOD_API_KEY - Update config_hash docstring to reflect exclude_none behavior - Add tests for render_env_preview, LB API key exclusion, and _configure_existing_template with env=None

- Compact env preview into single table with resource column to reduce terminal clutter - Show "user" source label for user-declared vars instead of empty string - Remove local filesystem paths from plan docs - Fix design doc: env=None omits the key, not stores empty dict - Update render tests for new table format

- Remove render_env_preview call from deploy command - Delete env_preview.py module and its tests (no other consumers) Addresses PR #257 feedback: table was visual clutter

- Use `is not None` instead of truthiness for env checks in serverless.py, serverless_cpu.py, and manifest.py so explicit env={} clears template env - Add RUNPOD_API_KEY injection note to LB architecture doc - Remove Claude-specific directive from plan doc

…eservation - Include env in CPU config_hash so env-only changes trigger drift detection and redeployment (was silently excluded) - Purge user modules from sys.modules in generated server.py so uvicorn hot-reload reimports them with updated env values - Add get_template GraphQL query (podTemplate) to read live template state before updates - Add _preserve_platform_env to carry forward platform-injected vars (FLASH_HOST, FLASH_PORT, PYTHONPATH, etc.) that the platform sets once at initial deploy and does not re-inject on saveTemplate - Fix _configure_existing_template to use truthiness check so empty env={} does not clobber explicit template.env entries

Copilot

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

docs/Resource_Config_Drift_Detection.md

src/runpod_flash/core/resources/serverless.py

- Update drift detection docs to reflect env included in CPU hash - Fix _preserve_platform_env to not resurrect user-removed keys by comparing against old config env (old_env parameter) - Add test_update_does_not_resurrect_user_removed_env_vars - Remove --no-verify from plan doc test command

KAJdev reviewed Mar 5, 2026

View reviewed changes

src/runpod_flash/cli/utils/env_preview.py Outdated Show resolved Hide resolved

deanq requested a review from Copilot March 5, 2026 19:13

deanq force-pushed the deanq/ae-1549-explicit-env-vars branch from 5b765f6 to be4269a Compare March 5, 2026 19:20

Copilot AI reviewed Mar 5, 2026

View reviewed changes

deanq requested a review from Copilot March 5, 2026 22:53

Copilot AI reviewed Mar 5, 2026

View reviewed changes

src/runpod_flash/cli/utils/env_preview.py Outdated Show resolved Hide resolved

docs/plans/2026-03-05-env-separation-plan.md Outdated Show resolved Hide resolved

docs/plans/2026-03-05-env-separation-design.md Outdated Show resolved Hide resolved

Copilot started reviewing on behalf of deanq March 5, 2026 23:05 View session

Copilot started reviewing on behalf of deanq March 5, 2026 23:06 View session

Copilot started reviewing on behalf of deanq March 5, 2026 23:07 View session

Copilot started reviewing on behalf of deanq March 5, 2026 23:09 View session

deanq force-pushed the deanq/ae-1549-explicit-env-vars branch from bab2b46 to 6782404 Compare March 6, 2026 06:05

runpod-Henrik reviewed Mar 6, 2026

View reviewed changes

deanq force-pushed the deanq/ae-1549-explicit-env-vars branch 2 times, most recently from 7a21f55 to 1bb160c Compare March 12, 2026 19:12

runpod-Henrik reviewed Mar 12, 2026

View reviewed changes

deanq force-pushed the deanq/ae-1549-explicit-env-vars branch from 1bb160c to e3f74d4 Compare March 13, 2026 16:42

KAJdev requested changes Mar 18, 2026

View reviewed changes

src/runpod_flash/cli/commands/deploy.py Outdated Show resolved Hide resolved

deanq force-pushed the deanq/ae-1549-explicit-env-vars branch from dd37adc to f3f1c59 Compare March 18, 2026 23:19

deanq requested review from KAJdev, Copilot and runpod-Henrik March 18, 2026 23:19

Copilot AI reviewed Mar 18, 2026

View reviewed changes

deanq force-pushed the deanq/ae-1549-explicit-env-vars branch from f3f1c59 to c1a98e2 Compare March 19, 2026 18:04

runpod-Henrik reviewed Mar 19, 2026

View reviewed changes

deanq requested a review from jhcipar March 19, 2026 22:54

KAJdev requested changes Mar 20, 2026

View reviewed changes

deanq added 9 commits March 19, 2026 17:14

test: update tests for env separation (remove get_env_vars references)

882bd69

Replace TestServerlessResourceEnvLoading tests that imported deleted get_env_vars/EnvironmentVars with tests that verify env defaults to None and preserves explicit dicts.

docs: update flash docs for env separation

4388d11

Clarify that .env populates os.environ for CLI/local dev only. Resource env vars must be explicit via env={} on resource config. RUNPOD_API_KEY injected automatically for makes_remote_calls=True.

fix(review): remove env preview table from deploy output

bc3a84e

- Remove render_env_preview call from deploy command - Delete env_preview.py module and its tests (no other consumers) Addresses PR #257 feedback: table was visual clutter

deanq force-pushed the deanq/ae-1549-explicit-env-vars branch from d4ecd77 to 5ac0bb6 Compare March 20, 2026 00:15

deanq requested review from KAJdev, Copilot and runpod-Henrik March 20, 2026 00:59

Copilot AI reviewed Mar 20, 2026

View reviewed changes

docs/Resource_Config_Drift_Detection.md Outdated Show resolved Hide resolved

src/runpod_flash/core/resources/serverless.py Outdated Show resolved Hide resolved

Conversation

deanq commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

How it works now

Breaking change

Upgrade note

Companion PR

Test plan

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

runpod-Henrik left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Review

Bug: preview/deploy parity broken — makes_remote_calls default mismatch

Nit: env={} explicit empty dict won't appear in manifest

Nit: _MASK_VISIBLE_CHARS = 6 exposes key type prefix

Positives

Uh oh!

runpod-Henrik left a comment

Choose a reason for hiding this comment

1. Core change — clean and correct

2. Issue: all existing deployed endpoints will drift on upgrade

3. Question: preview's makes_remote_calls default vs actual injection default

4. Testing — solid

Verdict

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

runpod-Henrik left a comment

Choose a reason for hiding this comment

Uh oh!

KAJdev left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

deanq commented Mar 5, 2026 •

edited

Loading

runpod-Henrik left a comment •

edited

Loading

Bug: preview/deploy parity broken — `makes_remote_calls` default mismatch

Nit: `env={}` explicit empty dict won't appear in manifest

Nit: `_MASK_VISIBLE_CHARS = 6` exposes key type prefix

3. Question: preview's `makes_remote_calls` default vs actual injection default

KAJdev left a comment •

edited

Loading