Skip to content

refactor envs hashing#1230

Open
alexcos20 wants to merge 1 commit intomainfrom
feature/compute_envs_hash
Open

refactor envs hashing#1230
alexcos20 wants to merge 1 commit intomainfrom
feature/compute_envs_hash

Conversation

@alexcos20
Copy link
Member

PR: C2D robustness and distinct cluster identity

Summary

This PR improves C2D compute engine behavior in three areas: (1) cluster hashes are made unique when multiple clusters share the same URL or config, (2) environment IDs are derived only from fees so they stay stable when other env fields change, and (3) getComputeEnvironments / getUsedResources are hardened against missing data and DB failures.

Changes

1. Config builder – distinct cluster hashes (src/utils/config/builder.ts)

  • Problem: Cluster hash was create256Hash(theURL) for operator clusters and create256Hash(JSON.stringify(dockerC2d)) for Docker clusters. Multiple clusters with the same URL or identical Docker config produced the same hash, which can break anything that uses the hash as a unique cluster id (e.g. job routing, storage paths, env ids).
  • Change: A running index is included in the hash input:
    • Introduced let count = 0.
    • For each operator cluster: hash = create256Hash(String(count) + theURL), then count += 1.
    • For each Docker cluster: hash = create256Hash(String(count) + JSON.stringify(dockerC2d)), then count += 1.
  • Result: Each cluster gets a distinct hash even when URLs or Docker configs are identical.

2. Compute engine Docker – environment ID and getComputeEnvironments (src/components/c2d/compute_engine_docker.ts)

Environment ID (env id):

  • Before: this.envs[0].id = hash + '-' + create256Hash(JSON.stringify(this.envs[0])) — the whole env object was hashed, so any change (e.g. runningJobs, resources) changed the id.
  • After: this.envs[0].id = hash + '-' + create256Hash(JSON.stringify(this.envs[0].fees)) — only fees is hashed, so the id is stable across runtime updates and only changes when fee config changes.

getComputeEnvironments(chainId?):

  • Filter-then-clone: The loop now uses env from this.envs and applies the chainId filter first. The deep clone (JSON.parse(JSON.stringify(env))) is only created for envs that pass the filter, then used for getUsedResources and in-use stats. Avoids mutating this.envs and avoids unnecessary clones for filtered-out envs.
  • Optional resources: Iteration over computeEnv.resources is wrapped in if (computeEnv.resources) so envs with no resources (optional in the type) do not throw.

3. Compute engine base – getUsedResources error handling (src/components/c2d/compute_engine_base.ts)

  • Before: const jobs = await this.db.getRunningJobs(this.getC2DConfig().hash) — a DB failure threw and aborted getUsedResources, which could take down getComputeEnvironments for all envs.
  • After: let jobs: DBComputeJob[] = [] with a try/catch around getRunningJobs. On failure, the error is logged and jobs stays empty, so the method returns zeroed counts and in-use maps instead of throwing.
  • Result: A transient DB error no longer prevents returning compute environments; callers get envs with zeroed usage instead of a full failure.

Benefits

  • Stable env ids: Environment identity no longer changes when job counts or resource usage change; only fee config affects it.
  • Unique cluster hashes: Multiple clusters with the same URL or Docker config get different hashes, avoiding collisions in storage and routing.
  • Resilience: DB failures in getRunningJobs are logged and isolated; getComputeEnvironments can still return envs with zeroed usage.
  • Correctness: No crash when an env has no resources; no mutation of this.envs and fewer unnecessary clones when filtering by chainId.

@alexcos20
Copy link
Member Author

/run-security-scan

Copy link
Member Author

@alexcos20 alexcos20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI automated code review (Gemini 3).

Overall risk: medium

Summary:
The pull request introduces improved robustness for fetching running jobs, enhances the uniqueness of C2D cluster hashes, and adds defensive checks for compute environment resources. However, a critical change in compute_engine_docker.ts alters how compute environment IDs are generated, potentially leading to non-unique IDs if environments have different configurations but identical fees. This requires careful review and potential correction.

Comments:
• [INFO][other] Adding a try...catch block around this.db.getRunningJobs is a good practice to prevent the entire function from failing due to database access issues. This improves the robustness of resource monitoring.
• [ERROR][bug] This change modifies the compute environment ID generation to only include this.envs[0].fees in the hash, rather than the entire this.envs[0] object. If two compute environments have identical fees but differ in other crucial properties (e.g., resources, queue settings, runtimes), they will now generate the same ID. This could lead to identifier collisions and incorrect behavior when trying to reference or manage specific compute environments. The ID should ideally uniquely represent the full configuration of the environment. Please confirm if this reduction in scope for the ID hash is intentional and why, or if it's an oversight.
• [INFO][performance] Using JSON.parse(JSON.stringify(env)) to deep clone the env object is a common pattern for creating a disconnected copy. This is generally acceptable for simple data structures like configuration objects. However, be aware of its limitations: it doesn't handle functions, Date objects are converted to strings, and undefined properties are dropped. If env ever contains more complex types, a dedicated deep cloning library might be more robust.
• [INFO][other] Adding the if (computeEnv.resources) check is a good defensive programming measure. It prevents potential runtime errors if resources is null or undefined, making the code more robust.
• [INFO][other] Including String(count) in the hash generation for C2D clusters is an excellent improvement. This ensures that even if multiple identical OPERATOR_SERVICE_URL or dockerC2d configurations are provided, they will each receive a unique hash. This prevents potential conflicts and ensures proper identification when these hashes are used as keys or unique identifiers.

@alexcos20
Copy link
Member Author

alexcos20 commented Feb 26, 2026

[ERROR][bug]...

Not true, since an env.id = cluster.hash + sha256(env.fees) -> unique env ids across the board, even if you they have the same fee structure

@alexcos20 alexcos20 linked an issue Feb 26, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update remove allowList flow

2 participants