fix(drift): prevent false config drift on unchanged GPU endpoints by CircuitSerein · Pull Request #319 · runpod/flash

CircuitSerein · 2026-04-15T22:13:13Z

Problem

Two bugs caused every subsequent flash run to trigger a new RunPod release (cold start + billing event), even when the user's config hadn't changed at all.

Bug 1 — `instanceIds=[]` vs `None` false drift

The RunPod API returns instanceIds=[] for GPU endpoints that have no instance restriction. Locally the user never sets instanceIds, so it stays None. model_dump(exclude_none=True) drops None but keeps [], producing a different config_hash and a spurious update() call on every run.

Symptom: endpoint receives a new release on every invocation despite zero config changes.

Bug 2 — `RUNPOD_API_KEY` oscillating add/remove

_create_new_template() always passed env=[] explicitly to PodTemplate even when self.env was None. This put "env" into Pydantic's model_fields_set, which made update() evaluate has_explicit_template_env = True every time, setting env_needs_update = True.

For QB endpoints that don't make remote calls, _inject_runtime_template_vars() would not inject RUNPOD_API_KEY — so update_template was called with an empty env, removing the key. The next run added it back. Alternating add/remove on every run, visible in the RunPod web UI as repeated releases.

Fix

serverless.py — config_hash: normalize instanceIds=[] to absent, matching the treatment of None:

if not config_dict.get("instanceIds"):
    config_dict.pop("instanceIds", None)

serverless.py + serverless_cpu.py — _create_new_template(): only pass env= when self.env is truthy, keeping "env" out of model_fields_set for the default no-env case:

kwargs: dict = {"name": self.resource_id, "imageName": self.imageName}
if self.env:
    kwargs["env"] = KeyValuePair.from_dict(self.env)
return PodTemplate(**kwargs)

Tests

Regression tests added in TestInstanceIdsFalseDrift and TestCreateNewTemplateEnvFieldSet:

instanceIds=None and instanceIds=[] produce identical hash
non-empty instanceIds still changes hash (real drift still caught)
_create_new_template() with env=None does not put "env" in model_fields_set (GPU and CPU endpoints)
_create_new_template() with env populated correctly sets the field

Test run

2640 passed, 2 skipped, 1 xfailed, 4 warnings

deanq · 2026-04-16T02:37:40Z

Thanks, @CircuitSerein . Why did your changes rewrite the entire files? Look at what's removed and added. Are you using an editor that completely reformats files (tabs to spaces)? We cannot accept PR that does this. I can't even tell what changed. Please reformat it back to what it was originally and apply the actual code change you intended.

Two bugs caused every subsequent run to trigger a new RunPod release (cold start), even when the user's config hadn't changed. Bug 1 — instanceIds=[] vs None false drift: The RunPod API returns instanceIds=[] for GPU endpoints that have no instance restriction. Locally the user never sets instanceIds, so it stays None. exclude_none=True drops None but keeps [], producing a different config_hash and a spurious update on every run. Fix: normalize instanceIds=[] to absent in config_hash, matching the same treatment applied to None. Bug 2 — RUNPOD_API_KEY oscillating add/remove: _create_new_template() always passed env=[] explicitly to PodTemplate even when self.env was None. This put 'env' in Pydantic's model_fields_set, causing update() to set env_needs_update=True on every run. For QB endpoints that don't make remote calls this toggled RUNPOD_API_KEY in and out of the template on alternate runs. Fix: only pass env= to PodTemplate when self.env is truthy, keeping 'env' out of model_fields_set for the default (no-env) case. Regression tests added for both bugs.

CircuitSerein · 2026-04-16T07:21:45Z

Sorry about that - CRLF problem with Windows and WSL. Recommited clean changes

runpod-Henrik

Two real bugs, both fixes correct. One observation worth a follow-up, one test gap.

Bug 1 — instanceIds normalization (serverless.py:436)

Correct. model_dump(exclude_none=True) drops None but keeps [] — without this fix the serialized JSON diverges and triggers a spurious release on every run. if not config_dict.get("instanceIds") handles both cases cleanly.

Confirmed non-issue for CPU: CpuServerlessEndpoint.config_hash uses an explicit include=cpu_fields allowlist that doesn't include instanceIds, so the field never enters the CPU hash. No fix needed there.

Bug 2 — _create_new_template() (serverless.py:622, serverless_cpu.py:162)

Correct for the described symptom. The old env=KeyValuePair.from_dict(self.env or {}) always put "env" into model_fields_set, which made has_explicit_template_env True on every update call → env_needs_update = True → _inject_runtime_template_vars() ran → for QB endpoints without remote calls, RUNPOD_API_KEY was alternately added and removed. The fix breaks that chain.

Observation — env={} inconsistency between creation and update paths.

_create_new_template uses if self.env:, which treats env={} identically to env=None. _configure_existing_template (serverless.py:633) uses if self.env is not None:, which does not. The result: a user who sets env={} gets different template state depending on whether it's a new deploy or an update — new deploy leaves env unset, update explicitly clears it to []. Unlikely to hit in practice, but the inconsistency is a quiet trap if this area is touched later. Suggest either aligning _configure_existing_template to match, or adding a comment in _create_new_template explaining the intentional choice.

Update() mechanics

Traced through the env_needs_update guard at line 1087: after the fix, new_config.env is None → "env" not in template_fields_set → has_explicit_template_env = False, env_changed = False → env_needs_update = False → _inject_runtime_template_vars() not called, RUNPOD_API_KEY not touched. The logic chain is clean.

Tests

TestInstanceIdsFalseDrift covers all three relevant cases (None, absent, empty list) and confirms non-empty still changes the hash. TestCreateNewTemplateEnvFieldSet covers both GPU and CPU endpoints in set and unset state. Test run is clean (2640/0).

Gap: no test for env={} in _create_new_template. Given the if self.env: subtlety, a test pinning env={} → "env" not in template.model_fields_set would document the intended behavior and guard against a later "fix" to is not None.

Copilot

Pull request overview

Fixes false serverless config drift that was triggering unnecessary RunPod releases on subsequent flash run executions, even when the user’s configuration hadn’t changed.

Changes:

Normalize instanceIds=[] to be treated the same as “absent”/None when computing config_hash.
Adjust _create_new_template() to avoid marking env as explicitly set when env is unset by the user.
Add regression tests for instanceIds hashing and template env field-set behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
`src/runpod_flash/core/resources/serverless.py`	Normalizes `instanceIds` for hashing; changes template creation to conditionally include `env`.
`src/runpod_flash/core/resources/serverless_cpu.py`	Mirrors the template `env`-handling change for CPU endpoints.
`tests/unit/resources/test_serverless.py`	Adds regression tests for `instanceIds` hash normalization and template `env` field-set behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    def test_create_new_template_env_not_in_fields_set_when_env_none(self):
+        """When self.env is None, 'env' must NOT appear in template.model_fields_set."""
+        resource = ServerlessEndpoint(name="test", imageName="test:latest")
+        assert resource.env is None
+
+        template = resource._create_new_template()
+
+        assert "env" not in template.model_fields_set
+
+    def test_create_new_template_env_in_fields_set_when_env_set(self):
+        """When self.env is populated, 'env' MUST appear in template.model_fields_set."""
+        resource = ServerlessEndpoint(
+            name="test",
+            imageName="test:latest",
+            env={"MY_VAR": "value"},
+        )
+
+        template = resource._create_new_template()
+
+        assert "env" in template.model_fields_set
+        assert any(kv.key == "MY_VAR" for kv in template.env)
+
+    def test_create_new_template_env_not_in_fields_set_cpu_endpoint(self):
+        """CpuServerlessEndpoint: same fix applies — env=None must not set 'env' field."""
+        resource = CpuServerlessEndpoint(name="test", imageName="test:latest")
+        assert resource.env is None
+
+        template = resource._create_new_template()
+
+        assert "env" not in template.model_fields_set


Use 'if self.env is not None' instead of 'if self.env' in _create_new_template() so that an explicitly-set empty env ({}) is still represented in template.model_fields_set, keeping it distinct from the default env=None case. Add regression test for env={} behavior per reviewer feedback.

CircuitSerein force-pushed the fix/false-drift-instance-ids-and-template-env branch from eafe09b to 6740ef4 Compare April 16, 2026 07:19

runpod-Henrik reviewed Apr 16, 2026

View reviewed changes

deanq requested a review from Copilot April 16, 2026 18:27

Copilot started reviewing on behalf of deanq April 16, 2026 18:28 View session

Copilot AI reviewed Apr 16, 2026

View reviewed changes

CircuitSerein force-pushed the fix/false-drift-instance-ids-and-template-env branch from b69d9ab to f2d0080 Compare April 16, 2026 19:37

CircuitSerein and others added 2 commits April 16, 2026 22:39

Merge branch 'main' into fix/false-drift-instance-ids-and-template-env

f6ff0cb

Merge branch 'main' into fix/false-drift-instance-ids-and-template-env

aacd305

deanq approved these changes Apr 17, 2026

View reviewed changes

deanq merged commit f5884f2 into runpod:main Apr 17, 2026
4 checks passed

runpod-release-please-bot bot mentioned this pull request Apr 17, 2026

chore: release 1.14.0 #321

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(drift): prevent false config drift on unchanged GPU endpoints#319

fix(drift): prevent false config drift on unchanged GPU endpoints#319
deanq merged 4 commits intorunpod:mainfrom
CircuitSerein:fix/false-drift-instance-ids-and-template-env

CircuitSerein commented Apr 15, 2026

Uh oh!

deanq commented Apr 16, 2026 •

edited

Loading

Uh oh!

CircuitSerein commented Apr 16, 2026

Uh oh!

runpod-Henrik left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

CircuitSerein commented Apr 15, 2026

Problem

Bug 1 — instanceIds=[] vs None false drift

Bug 2 — RUNPOD_API_KEY oscillating add/remove

Fix

Tests

Test run

Uh oh!

deanq commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CircuitSerein commented Apr 16, 2026

Uh oh!

runpod-Henrik left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Bug 1 — `instanceIds=[]` vs `None` false drift

Bug 2 — `RUNPOD_API_KEY` oscillating add/remove

deanq commented Apr 16, 2026 •

edited

Loading