Skip to content

Refactor dataset list#286

Merged
PGijsbers merged 7 commits intomainfrom
refactor-dataset-list
Mar 23, 2026
Merged

Refactor dataset list#286
PGijsbers merged 7 commits intomainfrom
refactor-dataset-list

Conversation

@PGijsbers
Copy link
Contributor

Refactors the list dataset function to make it easier to read and check that input parameters are provided correctly.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 23, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 0db0b29d-fb93-4f46-b0ad-1d8ca291c984

📥 Commits

Reviewing files that changed from the base of the PR and between 6b28375 and 03cb1c8.

📒 Files selected for processing (1)
  • src/routers/openml/datasets.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/routers/openml/datasets.py

Walkthrough

Added a new helper _quality_clause() to validate and generate SQL fragments for data quality filtering. The list_datasets query was refactored to build SQL from a clauses list with a consolidated parameters dict instead of branching string interpolation. Status filtering now uses a conditional equality check; visibility rules differ by user and admin status; optional parameterized clauses are added for uploader, data_name, data_version, and tag only when provided. data_id filtering now uses a parameterized IN :data_ids with expanding bind, pagination uses bound :limit/:offset, and expdb_db.execute() receives the unified parameters dict.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Refactor dataset list' is fully related to the main change in the changeset, which is a significant refactoring of the list_datasets function for improved readability.
Description check ✅ Passed The description is related to the changeset, explaining that the refactoring improves readability and ensures input parameters are validated correctly, which aligns with the code changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch refactor-dataset-list

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The AND d.did IN :data_ids clause is unlikely to work as intended with text() and a list parameter; consider using an expanding bindparam (e.g., bindparam('data_ids', expanding=True)) or explicitly generating the IN (...) list to ensure the IDs are bound correctly.
  • Instead of suppressing C901 on list_datasets, consider pushing more of the condition-building into separate helper functions (e.g., one for visibility/user logic, one for ID/name/version filters) to keep this endpoint’s core query assembly simpler and easier to maintain.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `AND d.`did` IN :data_ids` clause is unlikely to work as intended with `text()` and a list parameter; consider using an expanding bindparam (e.g., `bindparam('data_ids', expanding=True)`) or explicitly generating the `IN (...)` list to ensure the IDs are bound correctly.
- Instead of suppressing C901 on `list_datasets`, consider pushing more of the condition-building into separate helper functions (e.g., one for visibility/user logic, one for ID/name/version filters) to keep this endpoint’s core query assembly simpler and easier to maintain.

## Individual Comments

### Comment 1
<location path="src/routers/openml/datasets.py" line_range="162-163" />
<code_context>
+        clauses.append("AND `version`=:data_version")
+        parameters["data_version"] = data_version
+
+    if data_id:
+        clauses.append("AND d.`did` IN :data_ids")
+        parameters["data_ids"] = data_id

</code_context>
<issue_to_address>
**issue (bug_risk):** Binding a list to `IN :data_ids` with `text()` is unlikely to work as intended

With `sqlalchemy.text()`, binding a Python list to `IN :data_ids` won’t automatically expand to `IN (..., ...)`; most dialects treat it as a single value (the list repr), causing errors or empty results when `data_id` is set.

You could instead:
- Use an expanding bindparam (e.g. `bindparam("data_ids", expanding=True)` with `d."did" IN :data_ids`), or
- Dynamically build `IN (:id_0, :id_1, ...)` with separate params, or
- Use SQLAlchemy Core/ORM with `in_()` so expansion is handled for you.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@codecov
Copy link

codecov bot commented Mar 23, 2026

Codecov Report

❌ Patch coverage is 6.97674% with 40 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@5a9eb8c). Learn more about missing BASE report.

Files with missing lines Patch % Lines
src/routers/openml/datasets.py 6.97% 40 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #286   +/-   ##
=======================================
  Coverage        ?   53.50%           
=======================================
  Files           ?       34           
  Lines           ?     1497           
  Branches        ?      130           
=======================================
  Hits            ?      801           
  Misses          ?      694           
  Partials        ?        2           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/routers/openml/datasets.py`:
- Around line 162-164: The IN-clause handling for data_id is incorrect: change
the truthy check to "data_id is not None" so empty lists are not skipped, and
bind the parameter for the "AND d.`did` IN :data_ids" clause using
SQLAlchemy.bindparam with expanding=True (i.e., set parameters["data_ids"] via
bindparam("data_ids", value=data_id, expanding=True)) so the list expands
correctly at execution; update the code that builds clauses and parameters (look
for variables named data_id, clauses, parameters and the IN :data_ids fragment)
accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 3fda056e-e7b8-48b5-b810-7b238d7e40bf

📥 Commits

Reviewing files that changed from the base of the PR and between 5a9eb8c and 6b28375.

📒 Files selected for processing (1)
  • src/routers/openml/datasets.py

@PGijsbers PGijsbers merged commit a74b056 into main Mar 23, 2026
9 checks passed
@PGijsbers PGijsbers deleted the refactor-dataset-list branch March 23, 2026 12:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant