Fix Truncate Sequence Length#17
Conversation
WalkthroughSwitches validator scoring to MFU, tightens streamed downloads and verification (logits/sequence/gradients), adds deterministic fatal-error handling and safe DB checks to avoid marking failed URLs processed, and introduces immediate adaptive-threshold updates when a finished submission becomes the new leader. Changes
Sequence Diagram(s)sequenceDiagram
participant Validator as Validator
participant Downloader as Downloader
participant Evaluator as Evaluator
participant DB as Database
participant Threshold as ThresholdService
Validator->>Downloader: fetch submission URL (streamed, 500KB chunk cap)
Downloader-->>Validator: streamed content (HTML/JSON detection)
Validator->>DB: check if URL already processed
DB-->>Validator: not processed
Validator->>Evaluator: run evaluation (multiple runs)
Evaluator-->>Validator: per-run results (+/− fatal_error)
alt fatal_error occurred
Validator->>DB: mark submission FAILED
else all runs complete
Evaluator-->>Validator: final MFU score
Validator->>DB: save finalized submission
Validator->>Threshold: check/update if new leader (immediate)
Threshold-->>DB: persist new leader/threshold
Validator->>DB: mark URL processed
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
No actionable comments were generated in the recent review. 🎉 Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@neurons/validator.py`:
- Around line 550-554: The call to
_check_and_update_threshold_if_new_leader(submission_id, median_mfu) can raise
and abort the evaluation loop (propagating through _finalize_submission →
evaluate_submissions); wrap this call in a try/except that catches exceptions
(at minimum Exception), logs the error along with submission_id and relevant
context (e.g., median_mfu and hparams.adaptive_threshold), and does not re-raise
so the loop continues; ensure logging uses the existing logger and make clear
this is best-effort post-finish work since the submission is already marked
FINISHED.
- Around line 576-581: The code falls back to current_block = 0 when
self.commitment_reader is None, which then gets stored via
update_adaptive_threshold and corrupts decay calculations in
get_adaptive_threshold; change the logic in the block where current_block is
computed (the current_block assignment near commitment_reader usage) to return
early (or skip calling update_adaptive_threshold/get_adaptive_threshold) if
self.commitment_reader is None or cannot provide a valid block number instead of
assigning 0, ensuring last_update_block is not persisted with a sentinel zero
value; locate and modify the code paths that call update_adaptive_threshold and
get_adaptive_threshold so they only run when a real current_block is available.
- Around line 606-612: The float conversion of previous_winner_score_str is
unguarded and can raise ValueError; update the code around previous_winner_id /
previous_winner_score_str / previous_winner_score (the block that calls
self.db.get_validator_state) to safely parse the string: attempt to convert
previous_winner_score_str to float inside a try/except (or use a safe parse
helper), on exception or if the string is empty/non-numeric set
previous_winner_score = 0.0 and log or warn about the malformed stored value so
the evaluation loop won’t crash.
🧹 Nitpick comments (1)
neurons/validator.py (1)
614-645: Possible duplicate threshold update due to async interleaving with the weight setter.Between the reads on lines 606-611 and the writes on lines 644-645, there are multiple
awaitpoints (lines 620, 635, 644, 645). Ifmaybe_set_weightsruns during any of these yields and performs its own leader-check + threshold-update using the sameprevious_winner_id/previous_winner_scorekeys, both code paths will compute and persist the same threshold update independently.This is likely benign since both paths would compute the same result from the same inputs, but it's worth noting. If you want strict single-writer semantics, consider acquiring an async lock or performing the read-check-write in a single DB transaction.
44f27e6 to
875eda3
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/Validator.md (1)
311-324:⚠️ Potential issue | 🟡 MinorInconsistent
min_gpu_memory_gbdefault: JSON says64, table says80.The example JSON on line 311 sets
"min_gpu_memory_gb": 64, but the table on line 324 lists the default as80. Pick one value and use it consistently.
🤖 Fix all issues with AI agents
In `@environments/templar/env.py`:
- Around line 619-644: The _PROTECTED_ATTRS set itself can be overwritten by
miner code, so update the protection logic: add the string "_PROTECTED_ATTRS"
into the frozenset declared at class-level and change the check in __setattr__
to reference the class constant instead of the instance attribute (e.g. use
type(self)._PROTECTED_ATTRS or self.__class__._PROTECTED_ATTRS) so the guard
cannot be bypassed by assigning to self._PROTECTED_ATTRS; keep the rest of
__init__ behavior using object.__setattr__ unchanged.
- Around line 765-791: The default for norm_ratio_max is inconsistent with the
docstring: change the parameter default norm_ratio_max from 1.02 to 1.01 so
relative_error_threshold = norm_ratio_max - 1.0 yields 0.01 (1%), and update the
nearby example comments that reference norm_ratio_max values (e.g., the
"norm_ratio_max=1.01 -> 1% relative error" comment) to remain accurate; ensure
the docstring line "Production default: 0.01 (1% relative error)" stays as-is
and that the symbols norm_ratio_max and relative_error_threshold reflect the new
default.
In `@neurons/validator.py`:
- Around line 679-706: The threshold update is happening before persisting the
winner identity, risking duplicate threshold bumps if subsequent
set_validator_state("previous_winner_id", ...) or
set_validator_state("previous_winner_score", ...) fail; change the order so the
winner identity and score are saved to the database first (via
self.db.set_validator_state calls) before calling
self.db.update_adaptive_threshold, or, if supported, perform both actions inside
a single atomic DB transaction using self.db (e.g., a transactional method or
begin/commit API) so update_adaptive_threshold and the two set_validator_state
updates succeed or roll back together.
🧹 Nitpick comments (5)
src/crusades/__init__.py (1)
12-12: Remove the unusednoqadirective.Ruff reports
E402is not triggered here, making# noqa: E402unnecessary. The import follows only module-level assignments, which doesn't violate E402.Suggested fix
-from crusades.logging import LOKI_URL, setup_loki_logger # noqa: E402 +from crusades.logging import LOKI_URL, setup_loki_loggersrc/crusades/core/exceptions.py (1)
62-87: Consider composingis_fatalfromis_verification_failureto avoid duplicating the verification-failure set.If a new verification code is added, both
is_verification_failureandis_fatalmust be updated in lockstep. Composing avoids that risk:Suggested refactor
`@classmethod` def is_fatal(cls, code: "EvaluationErrorCode") -> bool: """Check if error is fatal/deterministic (no point retrying). Fatal errors will fail the same way on every retry because they are caused by the miner's code logic, not transient issues. """ - return code in { - # Code validation errors - code won't magically fix itself - cls.NO_CODE, - cls.SYNTAX_ERROR, - cls.MISSING_INNER_STEPS, - cls.INVALID_RETURN_TYPE, - # All verification failures - deterministic security checks - cls.INSUFFICIENT_TRAINABLE_PARAMS, - cls.INSUFFICIENT_PARAMS_CHANGED, - cls.GRADIENT_COVERAGE_FAILED, - cls.GRADIENT_NORM_RATIO_FAILED, - cls.GRADIENT_COSINE_FAILED, - cls.LOSS_MISMATCH, - cls.TOKEN_COUNT_MISMATCH, - cls.NO_GRADIENTS_CAPTURED, - cls.MISSING_LOGITS, - cls.INVALID_LOGITS_SHAPE, - cls.SEQUENCE_TRUNCATION, - } + # Code validation errors - code won't magically fix itself + code_validation = code in { + cls.NO_CODE, + cls.SYNTAX_ERROR, + cls.MISSING_INNER_STEPS, + cls.INVALID_RETURN_TYPE, + } + # All verification failures are also fatal (deterministic) + return code_validation or cls.is_verification_failure(code)neurons/validator.py (2)
255-264: Uselogger.exceptioninstead oflogger.errorto capture the traceback.When catching a broad
Exception, the traceback is valuable for debugging submission creation failures.logger.exceptionautomatically includes it.Proposed fix
except Exception as e: - logger.error(f"Failed to create submission for {code_url[:60]}: {e}") + logger.exception(f"Failed to create submission for {code_url[:60]}: {e}") # Don't record URL - will retry on next cycle
548-561: Fatal error finalization:failed_evals[-1]may pick up a failure from a different evaluator.
get_evaluations(submission_id)returns evaluations from all evaluators. If another validator previously recorded a non-fatal failure,failed_evals[-1]could surface that error message instead of the actual fatal error from the current evaluator. Consider filtering byself.hotkey:Proposed fix
if fatal_error: all_evals = await self.db.get_evaluations(submission_id) - failed_evals = [e for e in all_evals if not e.success and e.error] + failed_evals = [ + e for e in all_evals + if not e.success and e.error and e.evaluator_hotkey == self.hotkey + ] error_msg = failed_evals[-1].error if failed_evals else "Fatal error in evaluation"environments/templar/env.py (1)
848-866: Consider usingstrict=Trueinzip(ref_vecs, cand_vecs)(line 853).The length mismatch is already checked at line 842, so
strict=Truewould be redundant at runtime, but it documents the invariant and would catch bugs if the pre-check is ever refactored away. Minor nit flagged by Ruff (B905).Proposed fix
- for ref_layer, cand_layer in zip(ref_vecs, cand_vecs): + for ref_layer, cand_layer in zip(ref_vecs, cand_vecs, strict=True):
| cosine_min: float = 0.8, # Legacy, unused - kept for signature compat | ||
| norm_ratio_min: float = 0.5, # Legacy, unused - kept for signature compat | ||
| norm_ratio_max: float = 1.02, # Relative error threshold: 1.02 = 2% max error | ||
| ) -> tuple[bool, str | None, dict]: | ||
| """Verify candidate gradients match reference using cosine similarity and norm ratio. | ||
| """Verify candidate gradients match reference using relative error |g - g_truth| / |g_truth|. | ||
|
|
||
| This is a direct comparison that catches any deviation including truncation, | ||
| layer freezing, and step skipping. The threshold should be near numerical | ||
| precision (bfloat16 has ~0.8% relative error for accumulated ops). | ||
|
|
||
| Args: | ||
| reference_grad: Reference implementation gradients | ||
| candidate_grad: Miner's implementation gradients | ||
| cosine_min: Minimum cosine similarity required | ||
| norm_ratio_min: Minimum ratio of candidate/reference gradient norm | ||
| norm_ratio_max: Maximum ratio of candidate/reference gradient norm | ||
| cosine_min: (legacy, kept for signature compat) Not used in new check | ||
| norm_ratio_min: (legacy, kept for signature compat) Not used in new check | ||
| norm_ratio_max: Maximum allowed relative error |g - g_truth| / |g_truth| | ||
| Repurposed: norm_ratio_max is used as the relative error threshold. | ||
| Production default: 0.01 (1% relative error) | ||
|
|
||
| Returns: | ||
| Tuple of (success, error_message, details) | ||
| """ | ||
| # Repurpose norm_ratio_max as the relative error threshold | ||
| # This keeps backward compat with hparams without adding a new field | ||
| # norm_ratio_max=1.1 means 10% relative error allowed (old default) | ||
| # norm_ratio_max=1.01 means 1% relative error allowed (tight) | ||
| relative_error_threshold = norm_ratio_max - 1.0 # e.g., 1.01 -> 0.01 |
There was a problem hiding this comment.
Docstring says "Production default: 0.01 (1%)" but the code default is 1.02 → 2%.
Line 767 declares norm_ratio_max: float = 1.02, yielding a 2% threshold via norm_ratio_max - 1.0. The docstring at line 782 claims "Production default: 0.01 (1% relative error)". One of them is stale — update whichever is incorrect.
🧰 Tools
🪛 Ruff (0.14.14)
[warning] 765-765: Unused function argument: cosine_min
(ARG001)
[warning] 766-766: Unused function argument: norm_ratio_min
(ARG001)
🤖 Prompt for AI Agents
In `@environments/templar/env.py` around lines 765 - 791, The default for
norm_ratio_max is inconsistent with the docstring: change the parameter default
norm_ratio_max from 1.02 to 1.01 so relative_error_threshold = norm_ratio_max -
1.0 yields 0.01 (1%), and update the nearby example comments that reference
norm_ratio_max values (e.g., the "norm_ratio_max=1.01 -> 1% relative error"
comment) to remain accurate; ensure the docstring line "Production default: 0.01
(1% relative error)" stays as-is and that the symbols norm_ratio_max and
relative_error_threshold reflect the new default.
875eda3 to
10241ea
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/Validator.md (1)
311-324:⚠️ Potential issue | 🟡 MinorInconsistent
min_gpu_memory_gbvalues across docs and code.Three different defaults appear:
- Line 311 (JSON example in this doc): 64
- Line 324 (settings table in this doc): 80
src/crusades/affinetes/runner.pyline 181 (code default): 40Pick one canonical default and align all three locations to avoid operator confusion.
🧹 Nitpick comments (7)
src/crusades/storage/database.py (1)
411-415: Good guard against division by zero.One consideration: returning
state.current_threshold(undecayed) whendecay_interval_blocks <= 0means a misconfigured parameter silently freezes the threshold at its last-set value, which could be much higher thanbase_threshold. Returningbase_thresholdinstead would be a safer fallback. Either way, consider logging a warning here to surface the bad configuration.💡 Optional: safer fallback + warning
if decay_interval_blocks <= 0: - return state.current_threshold + import logging + logging.getLogger(__name__).warning( + "decay_interval_blocks=%d is invalid, falling back to base_threshold", + decay_interval_blocks, + ) + return base_thresholdsrc/crusades/__init__.py (1)
12-12: Remove unused# noqa: E402directive.Ruff reports this
noqacomment is unnecessary (theE402rule is not being triggered here). Keeping unused suppressions adds noise and can mask future real violations.Suggested fix
-from crusades.logging import LOKI_URL, setup_loki_logger # noqa: E402 +from crusades.logging import LOKI_URL, setup_loki_loggersrc/crusades/affinetes/runner.py (1)
847-851: Consider usinglogger.exceptioninstead of manual traceback formatting.
logger.exception()automatically captures the current exception's traceback, making the explicittraceback.format_exc()call redundant.Suggested fix
except Exception as e: - logger.error("[BASILICA] Failed to deploy!") - logger.error(f" Error: {e}") - logger.error(traceback.format_exc()) + logger.exception("[BASILICA] Failed to deploy: %s", e) return Noneenvironments/templar/env.py (1)
835-897: Relative error check is a sound replacement for separate cosine + norm checks.A single
|g - g_truth| / |g_truth|metric detects all classes of gradient deviation (truncation, freezing, skipping) in one pass. The incremental layer-by-layer computation avoids large memory allocations.Minor: the
zip()at line 853 could usestrict=Truesince length equality is pre-checked at line 842, but this is a nit.local_test/verify.py (3)
204-217:batch_sizeparameter is unused.The
batch_sizeparameter at line 209 is accepted but never referenced in the function body. Either remove it or use it (e.g., for a batch-dimension check on logits).Proposed fix
def verify_all( reference, candidate, expected_tokens: int, expected_seq_len: int, - batch_size: int, initial_state: dict, model, reference_grad: GradientInfo, candidate_grad: GradientInfo, max_loss_difference: float = 0.3, min_changed_ratio: float = 0.8, gradient_norm_ratio_max: float = 1.02, ) -> bool:And update the call site at line 593:
passed = verify_all( reference=reference, candidate=candidate, expected_tokens=expected_tokens, expected_seq_len=expected_seq_len, - batch_size=batch_size, initial_state=initial_state,
41-48:GradientInfofield names diverge from production (vectorsvsgrad_vector).In
environments/templar/env.py, the field is namedgrad_vector, while here it'svectors. Though they're separate dataclasses, keeping the naming consistent would reduce confusion for developers working across both files.
51-80: Same_PROTECTED_ATTRSbypass as production code —"_PROTECTED_ATTRS"not in the set.For a local test tool the security impact is minimal, but keeping this in sync with the production fix ensures miners get accurate test results. Apply whatever fix is adopted in
environments/templar/env.py.
10241ea to
e267c35
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
scripts/view_submission.py (1)
121-128:⚠️ Potential issue | 🟡 MinorPotential
TypeErrorif numeric fields areNULL.If
mfu,tokens_per_second, orwall_time_secondsareNULLin the database, the:.2f/:.0fformat specifiers on lines 124-125 will raise aTypeError. Consider adding fallback defaults similar to howfinal_scoreis handled on line 56.Proposed fix
for i, e in enumerate(evals, 1): status = "PASS" if e["success"] else "FAIL" + mfu = e['mfu'] or 0.0 + tps = e['tokens_per_second'] or 0.0 + wall_time = e['wall_time_seconds'] or 0.0 print( - f" #{i}: MFU={e['mfu']:.2f}% TPS={e['tokens_per_second']:.0f} " - f"tokens={e['total_tokens']} time={e['wall_time_seconds']:.2f}s [{status}]" + f" #{i}: MFU={mfu:.2f}% TPS={tps:.0f} " + f"tokens={e['total_tokens']} time={wall_time:.2f}s [{status}]" )docs/Validator.md (1)
196-203:⚠️ Potential issue | 🟡 MinorInconsistency between documented hparams example and code defaults.
The example
hparams.jsonshows"gradient_norm_ratio_max": 1.1(10% relative error), but bothenvironments/templar/env.pyandlocal_test/verify.pydefault to1.02(2% relative error). Similarly,"max_loss_difference": 0.3in the example config matchesverify.pybut not the code default inenv.py(0.5). Validators using these example values will get different behavior than the code defaults suggest.Consider aligning the example config with the current code defaults (or vice versa) to avoid confusion.
🤖 Fix all issues with AI agents
In `@local_test/verify.py`:
- Around line 51-80: The __setattr__ protection can be bypassed by reassigning
the instance attribute _PROTECTED_ATTRS; change the check in
GradientCapturingOptimizer.__setattr__ to reference the class-level constant
rather than the instance (e.g., use GradientCapturingOptimizer._PROTECTED_ATTRS
or type(self)._PROTECTED_ATTRS) so assignments like optimizer._PROTECTED_ATTRS =
... no longer disable protection, and keep the existing _initialized guard
intact when performing the check.
- Around line 569-581: The check for required attributes misses final_loss
causing a possible AttributeError when constructing InnerStepsResult; update the
validation where ok is computed (currently checking miner_result.final_logits
and miner_result.total_tokens) to also require hasattr(miner_result,
"final_loss") before creating candidate, and ensure the error/exit path remains
the same so the printed failure message matches the actual checked attributes
(references: miner_result, ok, InnerStepsResult, candidate).
🧹 Nitpick comments (3)
src/crusades/__init__.py (1)
12-12: Unusednoqadirective.Ruff reports
E402is not enabled, so the# noqa: E402comment is unnecessary.Proposed fix
-from crusades.logging import LOKI_URL, setup_loki_logger # noqa: E402 +from crusades.logging import LOKI_URL, setup_loki_loggersrc/crusades/affinetes/runner.py (1)
847-851: Considerlogger.exceptioninstead of manual traceback formatting.
logger.exception()automatically includes the traceback, making line 850 redundant. This is also flagged by Ruff (TRY400).Proposed fix
except Exception as e: - logger.error("[BASILICA] Failed to deploy!") - logger.error(f" Error: {e}") - logger.error(traceback.format_exc()) + logger.exception("[BASILICA] Failed to deploy: %s", e) return Nonelocal_test/verify.py (1)
204-217: Unusedbatch_sizeparameter inverify_all.The
batch_sizeparameter (line 209) is accepted but never used insideverify_all. This was also flagged by static analysis (Ruff ARG001). Consider removing it if unneeded, or adding a_prefix if reserved for future use.
a7795dd to
2b9cfc4
Compare
Security: - Replace cosine sim + norm ratio with direct gradient relative error |g - g_truth| / |g_truth| - Protect GradientCapturingOptimizer attributes from miner tampering (__setattr__ guard) - Fix default gradient threshold from 100% to 2% relative error - Add logits None check, 3D shape check, and sequence length check - Use expected_tokens instead of miner-reported tokens for MFU calculation - Chunked download to prevent OOM from malicious large responses - Require loss to be positive (cross-entropy > 0) - Mandatory loss comparison (no silent skip when reference unavailable) Validator: - Add fatal error early termination (skip remaining eval runs on deterministic failures) - Add immediate adaptive threshold update after new leader finishes - Fix URL tracked even when DB save fails (submissions silently lost) - Fix DB error check fallthrough that could create duplicate submissions - Fix fatal error classification: only code-level checks are fatal, not data-dependent ones Verification: - Add new error codes: MISSING_LOGITS, INVALID_LOGITS_SHAPE, SEQUENCE_TRUNCATION, NO_GRADIENTS_CAPTURED - Add is_fatal() classifier (code-level vs data-dependent errors) - Add is_fatal() method on EvaluationResult - Add error_code to EvaluateResponse HTTP API Docs & cleanup: - Fix MFU formula in Validator.md (was using wrong per-token calculation) - Update verification checks table (cosine+norm -> relative error) - Fix README: TPS -> MFU, correct repo URL, add verification rules for miners - Fix pyproject.toml version mismatch (0.1.0 -> 0.2.2) - Update train.py with clear verification rules and legitimate optimizations - Rewrite verify.py to match production validator (uses GradientCapturingOptimizer) - Remove all emoji from codebase (use plain text labels) - Bump version to 0.2.2 - ruff check + format across entire repo Co-authored-by: Cursor <cursoragent@cursor.com>
2b9cfc4 to
faddb4e
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
scripts/view_submission.py (1)
121-128:⚠️ Potential issue | 🟡 MinorPotential
TypeErrorifmfuorwall_time_secondsisNULLin the database.
e['mfu']ande['wall_time_seconds']are formatted with:.2f/:.0f, which will raiseTypeErroronNone. Consider defaulting, e.g.(e['mfu'] or 0.0).🛡️ Suggested fix
- f" #{i}: MFU={e['mfu']:.2f}% TPS={e['tokens_per_second']:.0f} " - f"tokens={e['total_tokens']} time={e['wall_time_seconds']:.2f}s [{status}]" + f" #{i}: MFU={e['mfu'] or 0.0:.2f}% TPS={e['tokens_per_second'] or 0:.0f} " + f"tokens={e['total_tokens'] or 0} time={e['wall_time_seconds'] or 0.0:.2f}s [{status}]"docs/Validator.md (2)
196-211:⚠️ Potential issue | 🟡 MinorStale hparams example:
gradient_cosine_minis no longer used, andmin_gpu_memory_gbconflicts with the table.Two inconsistencies in this example JSON block:
gradient_cosine_min: 0.95(line 200) is documented here but the verification table above has replaced the cosine check with Gradient Relative Error (gradient_norm_ratio_max - 1.0). This will confuse operators.min_gpu_memory_gb: 64(line 311 in Basilica config) conflicts with the default of 80 shown in the table at line 324.Proposed fix for the verification block
"verification": { "max_loss_difference": 0.3, "min_params_changed_ratio": 0.8, - "gradient_cosine_min": 0.95, - "gradient_norm_ratio_min": 0.9, - "gradient_norm_ratio_max": 1.1 + "gradient_norm_ratio_max": 1.02 },
306-315:⚠️ Potential issue | 🟡 Minor
min_gpu_memory_gbvalue inconsistent with the defaults table below.The JSON example shows
64but the table at line 324 states the default is80.Proposed fix
- "min_gpu_memory_gb": 64, + "min_gpu_memory_gb": 80,
🧹 Nitpick comments (6)
environments/templar/env.py (1)
860-860: Consider addingstrict=Truetozip()for defense-in-depth.The length check on line 849 already guards this, but
strict=Truemakes the invariant self-documenting and catches any future refactoring that removes the earlier check.- for ref_layer, cand_layer in zip(ref_vecs, cand_vecs): + for ref_layer, cand_layer in zip(ref_vecs, cand_vecs, strict=True):src/crusades/__init__.py (1)
3-12: Unnecessarynoqa: E402directive.Ruff reports
E402is not enabled in your lint config, so this suppression is dead. Remove it to avoid confusion.🧹 Suggested fix
-from crusades.logging import LOKI_URL, setup_loki_logger # noqa: E402 +from crusades.logging import LOKI_URL, setup_loki_loggersrc/crusades/affinetes/runner.py (1)
847-851: Considerlogger.exceptioninstead oflogger.error+ manualtraceback.format_exc().Inside the
exceptblock,logger.exception(...)automatically attaches the traceback, making the explicittraceback.format_exc()call on line 850 redundant.🧹 Suggested fix
- logger.error("[BASILICA] Failed to deploy!") - logger.error(f" Error: {e}") - logger.error(traceback.format_exc()) + logger.exception("[BASILICA] Failed to deploy: %s", e)neurons/validator.py (1)
683-684: Minor atomicity gap between the twoset_validator_statecalls.If line 683 succeeds but line 684 fails (unlikely but possible), the DB will have
previous_winner_idset to this submission butprevious_winner_scorewill be stale. On the next cycle, line 676 will see the ID as "already handled" and skip, leaving the old score persisted. The threshold update would have been skipped too, so no corruption occurs — but the leader's score in the DB will be wrong until a new leader emerges.Given that both are simple upserts and failure between them is extremely unlikely, this is acceptable. A transactional batch write would be the ideal fix if the DB supports it.
local_test/verify.py (2)
209-222:batch_sizeparameter is unused inverify_all.Ruff ARG001 confirms
batch_sizeis never referenced in the function body. Either remove it or use it (e.g., to cross-check logits batch dimension).Option A: Remove unused parameter
def verify_all( reference, candidate, expected_tokens: int, expected_seq_len: int, - batch_size: int, initial_state: dict, model, reference_grad: GradientInfo, candidate_grad: GradientInfo, max_loss_difference: float = 0.3, min_changed_ratio: float = 0.8, gradient_norm_ratio_max: float = 1.02, ) -> bool:(Also remove the argument at the call site, line 602.)
Option B: Use it to validate logits batch dimension
# CHECK: Logits shape is 3D check_num += 1 print(f"\n[CHECK {check_num}] Logits shape must be 3D (batch, seq, vocab)") if candidate.final_logits is not None: shape = candidate.final_logits.shape print(f" Shape: {tuple(shape)}") if len(shape) != 3: print(f" [FAILED] Expected 3D tensor, got {len(shape)}D") all_passed = False + elif shape[0] != batch_size: + print(f" [FAILED] Batch dim is {shape[0]}, expected {batch_size}") + all_passed = False else: print(" [PASSED]")
355-392: Gradient relative error implementation is correct.The layer-by-layer accumulation of
diff_norm_sqandref_norm_sqavoids allocating a single concatenated vector — consistent with the memory optimization noted inGradientInfo's docstring.One minor nit:
zip(ref_vecs, cand_vecs)at line 360 could usestrict=Truefor extra safety, though the length check at line 357 already guards against mismatched lengths.Nitpick: add strict=True
- for ref_layer, cand_layer in zip(ref_vecs, cand_vecs): + for ref_layer, cand_layer in zip(ref_vecs, cand_vecs, strict=True):
Summary by CodeRabbit
New Features
Bug Fixes
Documentation