Skip to content

PR #577: Fix XGB training query filters + dispatch dedup return value#447

Merged
jaayslaughter-cpu merged 1 commit into
mainfrom
pr-577-xgb-training-dispatch-dedup-fix
May 18, 2026
Merged

PR #577: Fix XGB training query filters + dispatch dedup return value#447
jaayslaughter-cpu merged 1 commit into
mainfrom
pr-577-xgb-training-dispatch-dedup-fix

Conversation

@jaayslaughter-cpu
Copy link
Copy Markdown
Owner

@jaayslaughter-cpu jaayslaughter-cpu commented May 18, 2026

PR #577 — Critical Fixes Before Tonight's 2:30 AM Retrain

Problem 1: XGB training query missing model_prob filters

  • Training included 75 rows where model_prob < 59% (historical gate bypass) — causes biased models
  • Training included rows with model_prob = NULL (crash orphans from May 15-16) — causes null errors

Fix: Added AND model_prob IS NOT NULL AND model_prob >= 0.59 to the training query.

Problem 2: dispatch_date_log written even when Discord send fails

  • run_agent_tasklet() returned True unconditionally → dedup blocked same-day retries
  • May 15-16: pipeline crashed mid-dispatch, dedup blocked retries all day

Fix:

  • Added _any_discord_sent: bool = False before parlay loop
  • Set _any_discord_sent = True ONLY when Discord webhook confirmed (_flipped > 0)
  • Return _any_discord_sent — orchestrator's existing if result is True: _record_dispatch_ran_today() guard now works correctly

Summary by cubic

Fixes XGB training data selection and dispatch dedup logic to prevent biased models and unblock same‑day retries after failures. Enforces model_prob filters and only records a dispatch when a Discord send actually succeeds.

  • Bug Fixes
    • Training query: added AND model_prob IS NOT NULL AND model_prob >= 0.59 to exclude nulls and low‑confidence rows.
    • Dispatch dedup: track _any_discord_sent and return it; only mark dispatch as run when Discord confirms at least one send.

Written for commit fff78e8. Summary will update on new commits. Review in cubic

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Discord dispatch confirmations now accurately reflect whether picks were successfully sent to ensure reliable status reporting.
  • Improvements

    • Model training now applies stricter quality filters to historical data, improving the robustness of retraining cycles.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 18, 2026

📝 Walkthrough

Walkthrough

run_agent_tasklet() now returns an explicit boolean flag indicating whether Discord dispatch was actually confirmed in the database, replacing a hardcoded True return. Separately, run_xgboost_tasklet() tightens its training data query to exclude NULL model probabilities and require minimum confidence of 0.59.

Changes

Discord Dispatch Confirmation Signal

Layer / File(s) Summary
Discord dispatch confirmation flag lifecycle
tasklets.py
_any_discord_sent boolean is initialized before the send loop, set to True when the DB update confirms at least one row's discord_sent field flipped, and returned at the end of the function in place of the previous constant True.

XGBoost Training Data Quality Gates

Layer / File(s) Summary
Model probability filtering in training query
tasklets.py
Training data query now filters out rows where model_prob IS NULL and only includes rows where model_prob >= 0.59, raising the quality threshold for historical data used in model retraining.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

  • jaayslaughter-cpu/mework#303: Uses the new run_agent_tasklet() return value to guard _record_dispatch_ran_today() logic based on confirmed Discord sends.
  • jaayslaughter-cpu/mework#416: Modifies run_agent_tasklet() dispatch/send flow dedup logic to prevent opposite-direction double dispatch, directly adjacent to this PR's confirmation signal changes.
  • jaayslaughter-cpu/mework#217: Adjusts how discord_sent flag is initialized/validated in season_record.py, affecting the same DB state that this PR now explicitly tracks.

Poem

🐰 The tasklet now tells the truth at last—
Did Discord truly send, or was it past?
And models train on confidence so high,
No whispers in the dark, no NULL goodbye! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and clearly describes the two main changes: XGB training query filters and dispatch dedup return value, matching the core objectives.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch pr-577-xgb-training-dispatch-dedup-fix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@deepsource-io
Copy link
Copy Markdown

deepsource-io Bot commented May 18, 2026

DeepSource Code Review

We reviewed changes in 65d509d...fff78e8 on this pull request. Below is the summary for the review, and you can see the individual issues we found as inline review comments.

See full review on DeepSource ↗

PR Report Card

Overall Grade   Security  

Reliability  

Complexity  

Hygiene  

Code Review Summary

Analyzer Status Updated (UTC) Details
Docker May 18, 2026 2:22a.m. Review ↗
JavaScript May 18, 2026 2:22a.m. Review ↗
Python May 18, 2026 2:22a.m. Review ↗
SQL May 18, 2026 2:22a.m. Review ↗
Secrets May 18, 2026 2:22a.m. Review ↗

Important

AI Review is run only on demand for your team. We're only showing results of static analysis review right now. To trigger AI Review, comment @deepsourcebot review on this thread.

@codacy-production
Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

Comment thread tasklets.py
agent_name, _flipped,
sum(1 for v in _send_clv_map.values() if v is not None),
len(_send_clv_map))
if _flipped > 0:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IndentationError: unexpected indent


A Python syntax error is detected.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates tasklets.py to track Discord confirmation status, ensuring the tasklet returns True only when at least one message is sent. It also introduces a model_prob filter in the run_xgboost_tasklet SQL query. A critical issue was identified regarding the scale of the model_prob filter; the current threshold of 0.59 should be updated to 59 to correctly reflect the percentage-based storage used elsewhere in the codebase.

Comment thread tasklets.py
AND (lookahead_safe IS NULL OR lookahead_safe = TRUE)
AND agent_name NOT ILIKE '%seed%'
AND model_prob IS NOT NULL
AND model_prob >= 0.59
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The filter model_prob >= 0.59 appears to have a scale mismatch. Throughout the codebase (e.g., in _BaseAgent._model_prob and _american_to_implied), model_prob is calculated and stored as a percentage in the 0-100 range (e.g., 60.0 for 60%). Using 0.59 in the SQL query will include almost all rows (including those with 55% probability), failing to exclude the sub-59% rows mentioned in the PR description. This should be changed to 59 to correctly filter for the 59% threshold.

                  AND model_prob >= 59

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tasklets.py`:
- Around line 7871-7872: The training filter currently compares model_prob using
the wrong scale (model_prob >= 0.59) so the 59% cutoff isn't applied; update the
SQL/condition that references model_prob (the training filter clause containing
"AND model_prob IS NOT NULL" / "AND model_prob >= 0.59") to use the percent
scale (e.g., model_prob >= 59 or model_prob >= 59.0) and ensure the column is
treated as a numeric type so the comparison behaves as intended.
- Around line 6466-6467: The if-statement checking _flipped is over-indented,
causing a syntax error; unindent the line "if _flipped > 0:" so it aligns with
the surrounding block that contains "_pg3.close()" and "logger.info()", then
ensure the following line setting "_any_discord_sent = True" remains inside that
if-block. Locate the occurrences of "_flipped", "_any_discord_sent",
"_pg3.close()", and "logger.info()" in the same function and adjust the
indentation to match the current block level.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6b083747-ef52-4569-8128-41ffd3ce3c54

📥 Commits

Reviewing files that changed from the base of the PR and between 65d509d and fff78e8.

📒 Files selected for processing (1)
  • tasklets.py

Comment thread tasklets.py
Comment on lines +6466 to +6467
if _flipped > 0:
_any_discord_sent = True # PR #577
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify indentation around the failing block
nl -ba tasklets.py | sed -n '6458,6472p'

Repository: jaayslaughter-cpu/mework

Length of output: 109


🏁 Script executed:

#!/bin/bash
# View lines 6458-6475 with visible whitespace to check indentation
sed -n '6458,6475p' tasklets.py | cat -A

Repository: jaayslaughter-cpu/mework

Length of output: 1296


🏁 Script executed:

#!/bin/bash
# Also try to check Python syntax using Python's ast module
python3 <<'EOF'
import ast
try:
    with open('tasklets.py', 'r') as f:
        code = f.read()
    ast.parse(code)
    print("✓ tasklets.py has valid Python syntax")
except SyntaxError as e:
    print(f"✗ Syntax error found:")
    print(f"  Line {e.lineno}: {e.msg}")
    if e.text:
        print(f"  Code: {e.text.rstrip()}")
        print(f"  {' ' * (e.offset - 1)}^")
EOF

Repository: jaayslaughter-cpu/mework

Length of output: 189


🏁 Script executed:

#!/bin/bash
# Get exact context around lines 6466-6467 using sed with line numbers
sed -n '6460,6475p' tasklets.py | head -20

Repository: jaayslaughter-cpu/mework

Length of output: 978


Fix the indentation error causing parse failure.

Line 6466 is over-indented, which makes the module invalid Python syntax. The if _flipped > 0: statement should be aligned with the preceding _pg3.close() and logger.info() calls at the same block level, not indented further.

Proposed fix
-                    if _flipped > 0:
-                        _any_discord_sent = True   # PR `#577`
+                if _flipped > 0:
+                    _any_discord_sent = True   # PR `#577`
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if _flipped > 0:
_any_discord_sent = True # PR #577
if _flipped > 0:
_any_discord_sent = True # PR `#577`
🧰 Tools
🪛 Ruff (0.15.12)

[warning] 6466-6466: Unexpected indentation

(invalid-syntax)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tasklets.py` around lines 6466 - 6467, The if-statement checking _flipped is
over-indented, causing a syntax error; unindent the line "if _flipped > 0:" so
it aligns with the surrounding block that contains "_pg3.close()" and
"logger.info()", then ensure the following line setting "_any_discord_sent =
True" remains inside that if-block. Locate the occurrences of "_flipped",
"_any_discord_sent", "_pg3.close()", and "logger.info()" in the same function
and adjust the indentation to match the current block level.

Comment thread tasklets.py
Comment on lines +7871 to +7872
AND model_prob IS NOT NULL
AND model_prob >= 0.59
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use the correct model_prob scale in the training filter.

model_prob is stored as percent values (e.g., 59.0), so model_prob >= 0.59 is effectively always true for normal rows and won’t enforce the 59% gate.

Proposed fix
-                  AND model_prob >= 0.59
+                  AND model_prob >= 59.0
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
AND model_prob IS NOT NULL
AND model_prob >= 0.59
AND model_prob IS NOT NULL
AND model_prob >= 59.0
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tasklets.py` around lines 7871 - 7872, The training filter currently compares
model_prob using the wrong scale (model_prob >= 0.59) so the 59% cutoff isn't
applied; update the SQL/condition that references model_prob (the training
filter clause containing "AND model_prob IS NOT NULL" / "AND model_prob >=
0.59") to use the percent scale (e.g., model_prob >= 59 or model_prob >= 59.0)
and ensure the column is treated as a numeric type so the comparison behaves as
intended.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="tasklets.py">

<violation number="1" location="tasklets.py:7872">
P1: `model_prob >= 0.59` uses the wrong unit scale; with percent-based `model_prob`, this filter does not remove sub-59% training rows.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Re-trigger cubic

Comment thread tasklets.py
AND (lookahead_safe IS NULL OR lookahead_safe = TRUE)
AND agent_name NOT ILIKE '%seed%'
AND model_prob IS NOT NULL
AND model_prob >= 0.59
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: model_prob >= 0.59 uses the wrong unit scale; with percent-based model_prob, this filter does not remove sub-59% training rows.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At tasklets.py, line 7872:

<comment>`model_prob >= 0.59` uses the wrong unit scale; with percent-based `model_prob`, this filter does not remove sub-59% training rows.</comment>

<file context>
@@ -7865,6 +7868,8 @@ def run_xgboost_tasklet() -> None:
                   AND (lookahead_safe IS NULL OR lookahead_safe = TRUE)
                   AND agent_name NOT ILIKE '%seed%'
+                  AND model_prob IS NOT NULL
+                  AND model_prob >= 0.59
                   AND prop_type NOT IN (
                       'fantasy_score', 'fantasy_hitter', 'fantasy_pitcher',
</file context>
Suggested change
AND model_prob >= 0.59
AND model_prob >= 59

@jaayslaughter-cpu jaayslaughter-cpu merged commit 10c5b95 into main May 18, 2026
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant