refactor: separate .env from deployed endpoint env vars#257
refactor: separate .env from deployed endpoint env vars#257
Conversation
5b765f6 to
be4269a
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
src/runpod_flash/cli/utils/env_preview.py:1
- The
_SECRET_PATTERNregex matches any key containingTOKENas a substring, soTOKENIZER_PATHgets masked even though it's not a secret. This is a false positive — environment variables likeTOKENIZER_PATH,TOKEN_COUNT, orBUCKET_NAME(no match, but illustrative) could be misleadingly masked. The test at line 52-55 intest_env_preview.pyconfirms this behavior as intended, but the docstring says "TOKEN in TOKENIZER should still match" which suggests this was a deliberate choice. However, this will surprise users who have non-secret vars with these substrings. Consider using word-boundary matching (e.g.,r"\b(KEY|TOKEN|SECRET|PASSWORD|CREDENTIAL)\b"orr"(^|_)(KEY|TOKEN|SECRET|PASSWORD|CREDENTIAL)($|_)") to avoid masking keys likeTOKENIZER_PATHorTOKENIZER_CONFIG.
"""Deploy-time env preview: show what env vars go to each endpoint."""
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
bab2b46 to
6782404
Compare
There was a problem hiding this comment.
Review
Clean, well-motivated change. The core fix (env defaults to None instead of implicitly loading .env) removes a real footgun. Good test coverage in test_env_separation.py. One confirmed bug and two nits below.
Bug: preview/deploy parity broken — makes_remote_calls default mismatch
_do_deploy calls _check_makes_remote_calls() which defaults to True in all failure cases:
# serverless.py:583
makes_remote_calls = resource_config.get("makes_remote_calls", True)
# also: "Manifest not found, assuming makes_remote_calls=True"
# also: "Resource not in manifest, assuming makes_remote_calls=True"collect_env_for_preview uses the opposite default:
# env_preview.py
config.get("makes_remote_calls", False) # ← should be TrueOn first deploy (no prior build/manifest), the preview shows no RUNPOD_API_KEY for QB endpoints but deploy injects it. The preview is wrong exactly when it matters most — the first time a user runs flash deploy.
Fix: config.get("makes_remote_calls", True) in env_preview.py to match deploy behaviour.
Nit: env={} explicit empty dict won't appear in manifest
manifest.py uses:
if hasattr(resource_config, "env") and resource_config.env:An explicit env={} is falsy so it's silently skipped. Probably the right behaviour (nothing to store), but test_env_separation.py::test_serverless_resource_env_explicit_empty_dict_preserved only checks the resource object — not the manifest path. Worth a test or a comment to make the intent explicit.
Nit: _MASK_VISIBLE_CHARS = 6 exposes key type prefix
RUNPOD_API_KEY values start with rp_ — 6 visible chars reveals the full type prefix (rp_xxx). Not a security issue since the prefix is public, but worth being intentional about if multiple key formats are ever supported.
Positives
- Removing
env_dict.pop("RUNPOD_API_KEY", None)inmanifest.pyis correct — user-explicit keys should be preserved, not silently stripped except Exception: logger.debug(...)aroundrender_env_previewindeploy.pyis the right call — preview failure must not block deploytest_env_separation.pycovers the_configure_existing_templateNonecase, which was the trickiest correctness question- Companion flash-examples PR (#39) is the right process for a breaking change
Verdict: PASS with fix — the makes_remote_calls default is a one-line fix but should land before merge to avoid shipping a misleading preview.
🤖 Reviewed by Henrik's AI-Powered Bug Finder
7a21f55 to
1bb160c
Compare
runpod-Henrik
left a comment
There was a problem hiding this comment.
1. Core change — clean and correct
env default changing from Field(default_factory=get_env_vars) to Field(default=None) is the right fix. Template creation falls back to self.env or {}, so None and {} both produce an empty template env. EnvironmentVars / environment.py deletion is complete — no remaining references.
The hash behavior is correct: exclude_none=True means env=None (default) is excluded from the drift hash, and setting env={"HF_TOKEN": "..."} correctly triggers drift detection.
2. Issue: all existing deployed endpoints will drift on upgrade
This is an expected consequence of the breaking change, but it should be in the upgrade notes: any endpoint deployed with the old SDK had .env contents baked into resource.env. After upgrading, env defaults to None, so the hash changes for every endpoint that previously had env vars → flash deploy will flag all of them as drifted and redeploy. Users should be warned to expect forced redeployments on their first deploy after upgrading.
3. Question: preview's makes_remote_calls default vs actual injection default
In collect_env_for_preview:
if not is_lb and config.get("makes_remote_calls", False): # default: FalseIn _inject_runtime_template_vars → _check_makes_remote_calls:
log.debug("Manifest not found, assuming makes_remote_calls=True") # default: TrueIf a resource's manifest entry lacks the makes_remote_calls key, the preview shows no RUNPOD_API_KEY while the actual deploy injects it. Worth confirming that the scanner always writes makes_remote_calls explicitly for every resource — if so, this default never fires and the preview is accurate. If not, the preview undersells what gets deployed.
4. Testing — solid
267-line test_env_preview.py covers masking, collect_env_for_preview, and rendering. 99-line test_env_separation.py covers the field default, explicit dict preservation, and template creation. The test_exactly_six_chars_fully_masked test correctly captures that 6-char values are fully masked rather than showing abcdef...****.
One gap: no test for the "user explicitly sets RUNPOD_API_KEY in env={}" path — the preview would show it as "user" source but the old code stripped it. Not a bug (the behavior is intentional now), but a test would lock in the intent.
Verdict
PASS with one upgrade-note item (item 2) and one question to confirm (item 3). The separation is clean, the deletion is thorough, and the preview table is well-tested.
1bb160c to
e3f74d4
Compare
- Remove render_env_preview call from deploy command - Delete env_preview.py module and its tests (no other consumers) Addresses PR #257 feedback: table was visual clutter
- Remove render_env_preview call from deploy command - Delete env_preview.py module and its tests (no other consumers) Addresses PR #257 feedback: table was visual clutter
dd37adc to
f3f1c59
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
- Remove render_env_preview call from deploy command - Delete env_preview.py module and its tests (no other consumers) Addresses PR #257 feedback: table was visual clutter
f3f1c59 to
c1a98e2
Compare
- Use `is not None` instead of truthiness for env checks in serverless.py,
serverless_cpu.py, and manifest.py so explicit env={} clears template env
- Add RUNPOD_API_KEY injection note to LB architecture doc
- Remove Claude-specific directive from plan doc
runpod-Henrik
left a comment
There was a problem hiding this comment.
Delta review — since 2026-03-12
From the 2026-03-06 review:
- Bug:
makes_remote_callsdefault mismatch inenv_preview.py→ RESOLVED. The env preview table was removed entirely (c1a98e2), so the mismatch no longer exists. The bug was valid at the time, but the decision to pull the preview table sidesteps it. - Nit:
env={}skipped in manifest (falsy check) → RESOLVED. Latest commit (d4ecd776) changedand resource_config.envtoand resource_config.env is not Noneinmanifest.py. Consistent with the same fix applied to_configure_existing_templateinserverless.pyandserverless_cpu.py. - Nit:
_MASK_VISIBLE_CHARS = 6→ MOOT. Preview table removed.
From the 2026-03-12 review:
- Item 2: Upgrade drift warning — Still unaddressed. Any endpoint deployed with the old SDK has
.envcontents baked intoresource.env. After upgrading,envdefaults toNone, the hash changes, andflash deployflags every pre-existing endpoint as drifted. Should be in upgrade notes or a migration note in the PR body. - Item 3:
makes_remote_callsdefault question → MOOT. Preview removed.
New concerns on current diff:
-
Issue: CPU
config_hashsilently ignores explicitenv=changes.CpuServerlessEndpoint.config_hashhashes a fixed set of CPU fields and explicitly excludesenv. Under the old model (env was a dynamic.envdump) this was reasonable. Now thatenvis user-declared intent, a user who addsenv={"HF_TOKEN": "..."}to a CPU endpoint and redeploys will get no drift detection, no update, and no error. The base class correctly includesenvin the hash; the CPU override silently drops it. If this is intentional (env changes don't trigger CPU re-deploy by design), it needs a comment explaining why. As written, it looks like an oversight. -
PR body references env preview table that was removed. The summary still lists "Add deploy-time env preview table with secret masking" and the checklist has items for verifying preview behavior. The feature doesn't exist in the final diff. Worth cleaning up before merge to avoid confusing future readers.
Overall: The two bugs from the 2026-03-06 review are closed. The main open item is the CPU config_hash asymmetry — silent failure for a user-facing feature after a correctness fix. The upgrade drift note is still worth adding.
🤖 Reviewed by Henrik's AI-Powered Bug Finder
There was a problem hiding this comment.
from what im seeing that env drift detection bug is real, but only present in flash run. If you flash run, send a request, change the env, have it auto reload, then send another request, the env is not updated at all
I have verified this
2026-03-19 17:00:05,739 | INFO | CpuLiveServerless:c6i34mfjfj9gct | API /run
2026-03-19 17:00:06,230 | INFO | CpuLiveServerless:c6i34mfjfj9gct | Started Job:0be84094-e004-4f10-b15c-1edab200e234-u1
2026-03-19 17:00:06,330 | INFO | Job:0be84094-e004-4f10-b15c-1edab200e234-u1 | Status: IN_QUEUE
2026-03-19 17:00:06,847 | INFO | Job:0be84094-e004-4f10-b15c-1edab200e234-u1 | .
2026-03-19 17:00:08,425 | INFO | Job:0be84094-e004-4f10-b15c-1edab200e234-u1 | ..
2026-03-19 17:00:14,093 | INFO | Job:0be84094-e004-4f10-b15c-1edab200e234-u1 | ...
2026-03-19 17:00:32,335 | INFO | Job:0be84094-e004-4f10-b15c-1edab200e234-u1 | ....
2026-03-19 17:00:53,938 | INFO | Job:0be84094-e004-4f10-b15c-1edab200e234-u1 | .....
2026-03-19 17:01:04,267 | INFO | Job:0be84094-e004-4f10-b15c-1edab200e234-u1 | Status: COMPLETED
2026-03-19 17:01:04,426 | INFO | Worker:4tfsc74uch0vfj | Delay Time: 48063 ms
2026-03-19 17:01:04,427 | INFO | Worker:4tfsc74uch0vfj | Execution Time: 42 ms
WARNING: WatchFiles detected changes in '.flash/server.py'. Reloading...
2026-03-19 17:01:15,994 | INFO | CpuLiveServerless:c6i34mfjfj9gct | API /run
2026-03-19 17:01:16,179 | INFO | CpuLiveServerless:c6i34mfjfj9gct | Started Job:af0c02dd-19c1-42f7-a9ac-db3788d80435-u1
2026-03-19 17:01:16,280 | INFO | Job:af0c02dd-19c1-42f7-a9ac-db3788d80435-u1 | Status: IN_PROGRESS
2026-03-19 17:01:16,483 | INFO | Job:af0c02dd-19c1-42f7-a9ac-db3788d80435-u1 | Status: COMPLETED
2026-03-19 17:01:16,578 | INFO | Worker:4tfsc74uch0vfj | Delay Time: 127 ms
2026-03-19 17:01:16,578 | INFO | Worker:4tfsc74uch0vfj | Execution Time: 40 ms
(the edit was a change to the env)
I verified that the change didnt apply to the endpoint in the console.
Change ServerlessResource.env default from get_env_vars() (reads .env)
to None. Delete EnvironmentVars class and get_env_vars(). Template
creation now uses self.env or {} instead of falling back to .env file.
.env still populates os.environ via load_dotenv() in __init__.py for
CLI and get_api_key() usage. This change only affects what gets sent
to deployed endpoints.
Replace TestServerlessResourceEnvLoading tests that imported deleted get_env_vars/EnvironmentVars with tests that verify env defaults to None and preserves explicit dicts.
With env separation, resource.env only contains user-declared vars. If a user explicitly sets RUNPOD_API_KEY in their resource env, it should be preserved. Runtime injection via _inject_runtime_template_vars() handles the automatic case.
New module renders a Rich table per resource showing all env vars that will be sent to deployed endpoints. User-declared vars shown directly; flash-injected vars (RUNPOD_API_KEY, FLASH_MODULE_PATH) labeled as 'injected by flash'. Secret values masked based on key pattern matching (KEY, TOKEN, SECRET, PASSWORD, CREDENTIAL). Wired into flash deploy: renders before provisioning so users see exactly what goes to each endpoint.
Clarify that .env populates os.environ for CLI/local dev only.
Resource env vars must be explicit via env={} on resource config.
RUNPOD_API_KEY injected automatically for makes_remote_calls=True.
- Fix preview/deploy mismatch: LB endpoints no longer show RUNPOD_API_KEY injection in preview (matches _do_deploy behavior) - Wrap render_env_preview in try-except so preview failures don't abort deployment - Fix stale comment referencing .env file in serverless_cpu.py - Correct drift detection doc: env is conditionally included in hash, not always excluded - Fix LB architecture doc: LB endpoints get FLASH_MODULE_PATH, not RUNPOD_API_KEY - Update config_hash docstring to reflect exclude_none behavior - Add tests for render_env_preview, LB API key exclusion, and _configure_existing_template with env=None
- Compact env preview into single table with resource column to reduce terminal clutter - Show "user" source label for user-declared vars instead of empty string - Remove local filesystem paths from plan docs - Fix design doc: env=None omits the key, not stores empty dict - Update render tests for new table format
- Remove render_env_preview call from deploy command - Delete env_preview.py module and its tests (no other consumers) Addresses PR #257 feedback: table was visual clutter
- Use `is not None` instead of truthiness for env checks in serverless.py,
serverless_cpu.py, and manifest.py so explicit env={} clears template env
- Add RUNPOD_API_KEY injection note to LB architecture doc
- Remove Claude-specific directive from plan doc
d4ecd77 to
5ac0bb6
Compare
…eservation
- Include env in CPU config_hash so env-only changes trigger drift
detection and redeployment (was silently excluded)
- Purge user modules from sys.modules in generated server.py so
uvicorn hot-reload reimports them with updated env values
- Add get_template GraphQL query (podTemplate) to read live template
state before updates
- Add _preserve_platform_env to carry forward platform-injected vars
(FLASH_HOST, FLASH_PORT, PYTHONPATH, etc.) that the platform sets
once at initial deploy and does not re-inject on saveTemplate
- Fix _configure_existing_template to use truthiness check so empty
env={} does not clobber explicit template.env entries
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Update drift detection docs to reflect env included in CPU hash - Fix _preserve_platform_env to not resurrect user-removed keys by comparing against old config env (old_env parameter) - Add test_update_does_not_resurrect_user_removed_env_vars - Remove --no-verify from plan doc test command
Summary
.envfile contents to deployed endpointsServerlessResource.envdefaults toNoneinstead of reading.envviaget_env_vars()EnvironmentVarsclass andenvironment.pyRUNPOD_API_KEYfrom explicit resource env in manifestflash runhot-reload not propagating env changes (module cache busting)config_hashto includeenvfor drift detectionMotivation
.envfiles were implicitly dumped into every deployed endpoint, causing:PORT,PORT_HEALTH) overwritten on template updatesself.envHow it works now
.envstill populatesos.environviaload_dotenv()for CLI andget_api_key()usageenv={"HF_TOKEN": os.environ["HF_TOKEN"]}RUNPOD_API_KEY,FLASH_MODULE_PATH) intotemplate.envautomaticallyupdate()reads the live template viaget_templateand preserves platform-injected varsflash runhot-reload properly reimports user modules with updated env valuesBreaking change
ServerlessResource.envno longer defaults to.envfile contents. Users relying on implicit carryover must add explicitenv={}to their resource configs.Upgrade note
Existing deployed endpoints have
.envcontents baked intoresource.env. After upgrading,envdefaults toNone, so the config hash changes.flash deploywill flag all previously-deployed endpoints as drifted and trigger redeployment on the first deploy after upgrading. This is expected and harmless — the endpoints will be redeployed with only explicitly declared env vars.Companion PR
Test plan