Refactor dataset list by PGijsbers · Pull Request #286 · openml/server-api

PGijsbers · 2026-03-23T11:26:37Z

Refactors the list dataset function to make it easier to read and check that input parameters are provided correctly.

coderabbitai · 2026-03-23T11:26:51Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 0db0b29d-fb93-4f46-b0ad-1d8ca291c984

📥 Commits

Reviewing files that changed from the base of the PR and between 6b28375 and 03cb1c8.

📒 Files selected for processing (1)

src/routers/openml/datasets.py

🚧 Files skipped from review as they are similar to previous changes (1)

src/routers/openml/datasets.py

Walkthrough

Added a new helper _quality_clause() to validate and generate SQL fragments for data quality filtering. The list_datasets query was refactored to build SQL from a clauses list with a consolidated parameters dict instead of branching string interpolation. Status filtering now uses a conditional equality check; visibility rules differ by user and admin status; optional parameterized clauses are added for uploader, data_name, data_version, and tag only when provided. data_id filtering now uses a parameterized IN :data_ids with expanding bind, pagination uses bound :limit/:offset, and expdb_db.execute() receives the unified parameters dict.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Refactor dataset list' is fully related to the main change in the changeset, which is a significant refactoring of the list_datasets function for improved readability.
Description check	✅ Passed	The description is related to the changeset, explaining that the refactoring improves readability and ensures input parameters are validated correctly, which aligns with the code changes.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch refactor-dataset-list

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

sourcery-ai

Hey - I've found 1 issue, and left some high level feedback:

The AND d.did IN :data_ids clause is unlikely to work as intended with text() and a list parameter; consider using an expanding bindparam (e.g., bindparam('data_ids', expanding=True)) or explicitly generating the IN (...) list to ensure the IDs are bound correctly.
Instead of suppressing C901 on list_datasets, consider pushing more of the condition-building into separate helper functions (e.g., one for visibility/user logic, one for ID/name/version filters) to keep this endpoint’s core query assembly simpler and easier to maintain.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The `AND d.`did` IN :data_ids` clause is unlikely to work as intended with `text()` and a list parameter; consider using an expanding bindparam (e.g., `bindparam('data_ids', expanding=True)`) or explicitly generating the `IN (...)` list to ensure the IDs are bound correctly.
- Instead of suppressing C901 on `list_datasets`, consider pushing more of the condition-building into separate helper functions (e.g., one for visibility/user logic, one for ID/name/version filters) to keep this endpoint’s core query assembly simpler and easier to maintain.

## Individual Comments

### Comment 1
<location path="src/routers/openml/datasets.py" line_range="162-163" />
<code_context>
+        clauses.append("AND `version`=:data_version")
+        parameters["data_version"] = data_version
+
+    if data_id:
+        clauses.append("AND d.`did` IN :data_ids")
+        parameters["data_ids"] = data_id

</code_context>
<issue_to_address>
**issue (bug_risk):** Binding a list to `IN :data_ids` with `text()` is unlikely to work as intended

With `sqlalchemy.text()`, binding a Python list to `IN :data_ids` won’t automatically expand to `IN (..., ...)`; most dialects treat it as a single value (the list repr), causing errors or empty results when `data_id` is set.

You could instead:
- Use an expanding bindparam (e.g. `bindparam("data_ids", expanding=True)` with `d."did" IN :data_ids`), or
- Dynamically build `IN (:id_0, :id_1, ...)` with separate params, or
- Use SQLAlchemy Core/ORM with `in_()` so expansion is handled for you.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

src/routers/openml/datasets.py

codecov · 2026-03-23T11:28:30Z

Codecov Report

❌ Patch coverage is 6.97674% with 40 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@5a9eb8c). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
src/routers/openml/datasets.py	6.97%	40 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #286   +/-   ##
=======================================
  Coverage        ?   53.50%           
=======================================
  Files           ?       34           
  Lines           ?     1497           
  Branches        ?      130           
=======================================
  Hits            ?      801           
  Misses          ?      694           
  Partials        ?        2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/routers/openml/datasets.py`:
- Around line 162-164: The IN-clause handling for data_id is incorrect: change
the truthy check to "data_id is not None" so empty lists are not skipped, and
bind the parameter for the "AND d.`did` IN :data_ids" clause using
SQLAlchemy.bindparam with expanding=True (i.e., set parameters["data_ids"] via
bindparam("data_ids", value=data_id, expanding=True)) so the list expands
correctly at execution; update the code that builds clauses and parameters (look
for variables named data_id, clauses, parameters and the IN :data_ids fragment)
accordingly.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 3fda056e-e7b8-48b5-b810-7b238d7e40bf

📥 Commits

Reviewing files that changed from the base of the PR and between 5a9eb8c and 6b28375.

📒 Files selected for processing (1)

src/routers/openml/datasets.py

src/routers/openml/datasets.py

PGijsbers added 6 commits March 18, 2026 11:54

Simplify status clause

5cd5516

Follow consistent pattern when constructing where clauses

78d15fb

Extract quality subquery creation function

60ad465

Use sqlaclhemy parametrization for pagination

5dda026

rename variable for clarity

1205f73

Ignore complexity check for now

6b28375

sourcery-ai bot reviewed Mar 23, 2026

View reviewed changes

src/routers/openml/datasets.py Show resolved Hide resolved

Fix issue with data_ids binding

03cb1c8

coderabbitai bot reviewed Mar 23, 2026

View reviewed changes

src/routers/openml/datasets.py Show resolved Hide resolved

PGijsbers merged commit a74b056 into main Mar 23, 2026
9 checks passed

PGijsbers deleted the refactor-dataset-list branch March 23, 2026 12:31

PGijsbers mentioned this pull request Mar 23, 2026

security: implement SQL parameterization and table whitelisting in da… #282

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor dataset list#286

Refactor dataset list#286
PGijsbers merged 7 commits intomainfrom
refactor-dataset-list

PGijsbers commented Mar 23, 2026

Uh oh!

coderabbitai bot commented Mar 23, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

sourcery-ai bot left a comment

Uh oh!

Uh oh!

codecov bot commented Mar 23, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

PGijsbers commented Mar 23, 2026

Uh oh!

coderabbitai bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

❌ Failed checks (1 warning)

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Mar 23, 2026 •

edited

Loading

codecov bot commented Mar 23, 2026 •

edited

Loading