add extra error messages and fix response schema by Mahhheshh · Pull Request #253 · openml/server-api

Mahhheshh · 2026-02-22T10:39:44Z

fixes #114

add's additional messages when something goes offff!
Fixes response schema.

TODO:

find valid dataset_id for code 362.

php ref: https://github.com/openml/OpenML/blob/b1265bda1b52780a6f10a7d239e3f54914517c2f/openml_OS/models/api/v1/Api_data.php#L1957-L2019

coderabbitai · 2026-02-22T10:40:08Z

Warning

Rate limit exceeded

@PGijsbers has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 12 minutes and 19 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 6723bc06-bede-4372-ba8d-09f742e9c3a3

📥 Commits

Reviewing files that changed from the base of the PR and between 5c31d23 and 570f51f.

📒 Files selected for processing (2)

src/routers/openml/qualities.py
tests/routers/openml/qualities_test.py

Walkthrough

Added one new quality-related ProblemDetailError subclass NoQualitiesError. The qualities endpoint now validates dataset processing state: it raises DatasetNotProcessedError if no processing update exists, DatasetProcessingError if the latest update reports an error, and NoQualitiesError if no qualities are found; the endpoint signature and successful return of the qualities list are unchanged. Tests expanded the data-id range, folded error-case checks into the main parameterized test, and changed comparisons to use regex/title/detail-based validations.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title partially relates to the changeset, referring to error messages and response schema fixes, but is somewhat vague and doesn't clearly convey the main change.
Description check	✅ Passed	The description is related to the changeset, mentioning error messages, response schema fixes, and a TODO for finding a valid dataset_id, with a PHP reference for guidance.
Linked Issues check	✅ Passed	The PR fulfills issue `#114` by adding error messages for the data/qualities/{id} endpoint through three new exception classes (NoQualitiesError, DatasetNotProcessedError, DatasetProcessingError) and validation logic.
Out of Scope Changes check	✅ Passed	All changes are directly related to adding error messages and fixing response schema for the data/qualities endpoint as required by issue `#114`; no out-of-scope changes detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

sourcery-ai

Hey - I've left some high level feedback:

The get_qualities return type is now a deeply nested dict[Literal[...]] which is hard to read and maintain; consider introducing a dedicated Pydantic response model to describe this shape more clearly and keep the OpenAPI schema in sync.
The multiple HTTPException raises in get_qualities all use HTTPStatus.PRECONDITION_FAILED and a {code, message} detail; consider extracting a small helper or factory to construct these consistently and reduce duplication.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The `get_qualities` return type is now a deeply nested `dict[Literal[...]]` which is hard to read and maintain; consider introducing a dedicated Pydantic response model to describe this shape more clearly and keep the OpenAPI schema in sync.
- The multiple `HTTPException` raises in `get_qualities` all use `HTTPStatus.PRECONDITION_FAILED` and a `{code, message}` detail; consider extracting a small helper or factory to construct these consistently and reduce duplication.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

tests/routers/openml/qualities_test.py (2)
165-278: ⚠️ Potential issue | 🔴 Critical

test_get_quality will always fail — expected is a flat list but the endpoint now returns a nested dict.

The endpoint's response shape changed from [Quality, ...] to {"data_qualities": {"quality": [Quality, ...]}}. The expected variable is still a flat list, so DeepDiff(expected, response.json(), ...) compares a list against a dict and will always report differences, causing assert not difference to fail.
🐛 Proposed fix
-    difference = deepdiff.DeepDiff(expected, response.json(), ignore_order=True)
+    difference = deepdiff.DeepDiff(expected, response.json()["data_qualities"]["quality"], ignore_order=True)
     assert not difference
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/routers/openml/qualities_test.py` around lines 165 - 278, The
test_get_quality currently compares the flat list expected to response.json(),
but the API now returns a nested structure {"data_qualities": {"quality":
[...]}}; update the test to either build expected as that nested dict shape or
extract the inner list before diff (e.g., assign qualities =
response.json()["data_qualities"]["quality"] and compare expected to qualities)
so DeepDiff compares two lists; adjust references to expected, response.json(),
and the test_get_quality function accordingly.
285-297: ⚠️ Potential issue | 🔴 Critical

test_get_quality_identical will always fail — comparing a flat list against the new nested dict response.

expected is built as a flat list from php_response.json()["data_qualities"]["quality"]. After this PR, python_response.json() returns {"data_qualities": {"quality": [...]}}, so assert python_response.json() == expected compares a dict to a list and will always fail.
🐛 Proposed fix
-    assert python_response.json() == expected
+    assert python_response.json()["data_qualities"]["quality"] == expected
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/routers/openml/qualities_test.py` around lines 285 - 297, The test
test_get_quality_identical builds expected as a flat list but
python_response.json() now returns a nested dict; change expected to mirror the
new response shape by wrapping the transformed list under the same keys as the
API: {"data_qualities": {"quality": <transformed_list>}}; keep the same
transformation logic (use php_response.json()["data_qualities"]["quality"], map
each quality to {"name": quality["name"], "value": None if quality["value"] ==
[] else float(quality["value"])}) so the final expected structure matches
python_response.json().

🧹 Nitpick comments (1)

src/routers/openml/qualities.py (1)
37-41: from None is a no-op here and inconsistent with the rest of the function.

from None suppresses an active exception chain, but this raise is not inside an except block, so there is no chained exception to suppress. It has no effect and is inconsistent with the three other raise HTTPException(...) calls in the same function (lines 45–48, 50–56, 60–63), none of which use it.
♻️ Suggested fix
-    ) from None
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/routers/openml/qualities.py` around lines 37 - 41, Remove the unnecessary
"from None" on the raise that checks dataset access so it matches the other
HTTPException raises in the function; locate the conditional that uses
_user_has_access(...) and raises HTTPException with QualityError.UNKNOWN_DATASET
and remove the "from None" suffix so the raise is a normal exception raise
without suppressing any exception chain.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/routers/openml/qualities_test.py`:
- Around line 309-311: Update the test skip to use a direct equality check and a
truthful message: replace the membership test `data_id in [116]` with `data_id
== 116`, and change the pytest.skip message called in the same block to say that
dataset 116 is not a valid dataset for testing error code 362 (fixture missing)
rather than "not yet supported" so it accurately reflects the TODO note; keep
the skip invocation using pytest.skip and leave surrounding test logic
unchanged.

---

Outside diff comments:
In `@tests/routers/openml/qualities_test.py`:
- Around line 165-278: The test_get_quality currently compares the flat list
expected to response.json(), but the API now returns a nested structure
{"data_qualities": {"quality": [...]}}; update the test to either build expected
as that nested dict shape or extract the inner list before diff (e.g., assign
qualities = response.json()["data_qualities"]["quality"] and compare expected to
qualities) so DeepDiff compares two lists; adjust references to expected,
response.json(), and the test_get_quality function accordingly.
- Around line 285-297: The test test_get_quality_identical builds expected as a
flat list but python_response.json() now returns a nested dict; change expected
to mirror the new response shape by wrapping the transformed list under the same
keys as the API: {"data_qualities": {"quality": <transformed_list>}}; keep the
same transformation logic (use php_response.json()["data_qualities"]["quality"],
map each quality to {"name": quality["name"], "value": None if quality["value"]
== [] else float(quality["value"])}) so the final expected structure matches
python_response.json().

---

Nitpick comments:
In `@src/routers/openml/qualities.py`:
- Around line 37-41: Remove the unnecessary "from None" on the raise that checks
dataset access so it matches the other HTTPException raises in the function;
locate the conditional that uses _user_has_access(...) and raises HTTPException
with QualityError.UNKNOWN_DATASET and remove the "from None" suffix so the raise
is a normal exception raise without suppressing any exception chain.

PGijsbers · 2026-03-12T14:54:18Z

Hi! Thanks for taking the time to contribute! We recently switched to RFC9457 style errors (#238) where we:

define the error in errors.py
preserve the original PHP "code" under the extra "code" field (for most errors, can be provided as _default_code on definition)
provide a more detailed error message in the "detail" field that can add relevant details such as entity identifiers or other context.

Would you be able and interested to update the PR to reflect those changes? If not, no worries, I can take over the PR.

Mahhheshh · 2026-03-15T17:52:20Z

I'll update the PR

codecov · 2026-03-23T12:34:41Z

Codecov Report

❌ Patch coverage is 33.33333% with 12 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@5a9eb8c). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
src/routers/openml/qualities.py	7.69%	12 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #253   +/-   ##
=======================================
  Coverage        ?   53.27%           
=======================================
  Files           ?       34           
  Lines           ?     1511           
  Branches        ?      133           
=======================================
  Hits            ?      805           
  Misses          ?      704           
  Partials        ?        2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai

♻️ Duplicate comments (1)

tests/routers/openml/qualities_test.py (1)
308-309: ⚠️ Potential issue | 🟡 Minor

Bug: Tuple created instead of formatted string.

The comma after the f-string creates a tuple rather than a single formatted string. When raised, the AssertionError will display the tuple representation instead of a clean message.
🐛 Proposed fix
-        msg = f"Dataset {data_id} response not under test:", php_response.json()
+        msg = f"Dataset {data_id} response not under test: {php_response.json()}"
         raise AssertionError(msg)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/routers/openml/qualities_test.py` around lines 308 - 309, The msg
assignment currently creates a tuple because of the comma; change it to build a
single formatted string by embedding the response into the f-string (e.g.,
include php_response.json() inside the f-string or str() it) so that msg is a
single string (refer to variable msg, data_id and php_response.json()) and then
raise AssertionError(msg) as before.

🧹 Nitpick comments (1)

tests/routers/openml/qualities_test.py (1)
289-289: Unnecessary set() wrapper around range().

range(1, 133) already produces unique values, so the set() conversion adds overhead without benefit.
♻️ Suggested simplification
-    list(set(range(1, 133))),
+    list(range(1, 133)),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/routers/openml/qualities_test.py` at line 289, Replace the unnecessary
set() wrapper around the range by using the range directly; change the
expression list(set(range(1, 133))) to use list(range(1, 133)) (i.e., remove set
around range) in the test containing that expression to avoid needless
conversion and overhead.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@tests/routers/openml/qualities_test.py`:
- Around line 308-309: The msg assignment currently creates a tuple because of
the comma; change it to build a single formatted string by embedding the
response into the f-string (e.g., include php_response.json() inside the
f-string or str() it) so that msg is a single string (refer to variable msg,
data_id and php_response.json()) and then raise AssertionError(msg) as before.

---

Nitpick comments:
In `@tests/routers/openml/qualities_test.py`:
- Line 289: Replace the unnecessary set() wrapper around the range by using the
range directly; change the expression list(set(range(1, 133))) to use
list(range(1, 133)) (i.e., remove set around range) in the test containing that
expression to avoid needless conversion and overhead.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 749d551a-e994-419f-9924-9018f9dca062

📥 Commits

Reviewing files that changed from the base of the PR and between a5c8d89 and eeca93f.

📒 Files selected for processing (3)

src/core/errors.py
src/routers/openml/qualities.py
tests/routers/openml/qualities_test.py

🚧 Files skipped from review as they are similar to previous changes (1)

src/core/errors.py

The used error class should in general be consistent for the cause of the error.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (3)

tests/routers/openml/qualities_test.py (3)

308-309: Build a string message instead of a tuple for assertion failures.

Line 308 currently creates a tuple, so the raised assertion text is less readable.

🧹 Suggested fix

-        msg = f"Dataset {data_id} response not under test:", php_response.json()
+        msg = f"Dataset {data_id} response not under test: {php_response.json()}"
         raise AssertionError(msg)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/routers/openml/qualities_test.py` around lines 308 - 309, The
AssertionError message is being constructed as a tuple (msg,
php_response.json()) which prints poorly; in
tests/routers/openml/qualities_test.py update the msg assignment to build a
single string that includes the response content (e.g., interpolate or str() the
php_response.json()) using the existing variables data_id and php_response, then
raise AssertionError(msg) as before so the failure message is readable.

302-307: Add explicit handling for additional expected error codes.

The router now emits additional quality-path errors (363/362), but this dispatcher only recognizes 361 and 364. Consider adding dedicated assertion branches (or an explicit, documented skip for the known missing 362 fixture) to keep parity coverage aligned with endpoint behavior.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/routers/openml/qualities_test.py` around lines 302 - 307, The
dispatcher only handles php_error_code values 361 and 364; add explicit branches
for the newly emitted codes 363 and 362 so tests mirror endpoint behavior: after
computing php_error_code, add elif blocks for 363 and 362 that call appropriate
assertion helpers (e.g., create/dispatch to a new helper like
_assert_get_quality_error_quality_path_missing or reuse an existing helper if
applicable) and if there is a known missing fixture for 362, make that branch
call pytest.skip(...) with a clear message; update the final else to keep
failing on unexpected codes.

289-290: Use deterministic parametrization order.

list(set(range(1, 133))) is unnecessary and can randomize test execution order. Use list(range(1, 133)) for stable, reproducible runs.

♻️ Suggested fix

-    list(set(range(1, 133))),
+    list(range(1, 133)),

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/routers/openml/qualities_test.py` around lines 289 - 290, Replace the
non-deterministic parametrization expression list(set(range(1, 133))) with a
deterministic list(range(1, 133)) wherever the test parameters are defined (the
parametrized argument using list(set(range(1, 133))) in the qualities_test
parametrization). This ensures stable, reproducible test execution order by
preserving numeric order instead of relying on set iteration.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/routers/openml/qualities.py`:
- Around line 42-47: The existing backward-compatibility comment is incorrect;
update the comment above the DatasetNotFoundError raise (the branch that builds
msg = f"Dataset with id {dataset_id} not found." and raises
DatasetNotFoundError(..., code=361)) to reflect the actual behavior: this branch
raises DatasetNotFoundError with code=361 (which uses the class's default 404
HTTP status) rather than "412 with code 113". Adjust the text to state that the
branch returns a 404 DatasetNotFoundError with code 361 for backward
compatibility (or remove the outdated PHP reference).
- Around line 54-55: The raise of DatasetProcessingError uses
processing.error.strip() which can be empty for whitespace-only errors; update
the code around the check for processing.error (the block that raises
DatasetProcessingError) to compute a non-empty detail (e.g., detail =
processing.error.strip() or "No processing error details provided") and pass
that detail into DatasetProcessingError(..., code=364) so the RFC9457 detail is
never blank.

---

Nitpick comments:
In `@tests/routers/openml/qualities_test.py`:
- Around line 308-309: The AssertionError message is being constructed as a
tuple (msg, php_response.json()) which prints poorly; in
tests/routers/openml/qualities_test.py update the msg assignment to build a
single string that includes the response content (e.g., interpolate or str() the
php_response.json()) using the existing variables data_id and php_response, then
raise AssertionError(msg) as before so the failure message is readable.
- Around line 302-307: The dispatcher only handles php_error_code values 361 and
364; add explicit branches for the newly emitted codes 363 and 362 so tests
mirror endpoint behavior: after computing php_error_code, add elif blocks for
363 and 362 that call appropriate assertion helpers (e.g., create/dispatch to a
new helper like _assert_get_quality_error_quality_path_missing or reuse an
existing helper if applicable) and if there is a known missing fixture for 362,
make that branch call pytest.skip(...) with a clear message; update the final
else to keep failing on unexpected codes.
- Around line 289-290: Replace the non-deterministic parametrization expression
list(set(range(1, 133))) with a deterministic list(range(1, 133)) wherever the
test parameters are defined (the parametrized argument using list(set(range(1,
133))) in the qualities_test parametrization). This ensures stable, reproducible
test execution order by preserving numeric order instead of relying on set
iteration.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 2aea0517-fd54-4565-8d8b-b0054f92ff66

📥 Commits

Reviewing files that changed from the base of the PR and between eeca93f and 5c31d23.

📒 Files selected for processing (3)

src/core/errors.py
src/routers/openml/qualities.py
tests/routers/openml/qualities_test.py

🚧 Files skipped from review as they are similar to previous changes (1)

src/core/errors.py

PGijsbers · 2026-03-23T13:32:18Z

Thanks for taking the time to update the PR :) I made a few more changes beyond what I had requested to avoid a back and forth, and included changes to things not even touched on in the PR before:

I reconsidered how I wanted to structure the test that compares the endpoint to php responses. The new setup that doesn't separate out the error to separate functions will be easier to use when testing against a different database (e.g., a copy of the production database)
I removed a set of error classes that really referred to errors of the dataset, and used those error classes instead. I think the raised error should depend on the cause (which is the same) not the endpoint (which differed)

sourcery-ai Bot reviewed Feb 22, 2026

View reviewed changes

coderabbitai Bot reviewed Feb 22, 2026

View reviewed changes

Comment thread tests/routers/openml/qualities_test.py Outdated

PGijsbers added the question Further information is requested label Mar 12, 2026

PGijsbers removed the question Further information is requested label Mar 16, 2026

add extra error messages and fix response schema

a5c8d89

Mahhheshh force-pushed the fix/114 branch from a7a942c to a5c8d89 Compare March 18, 2026 18:03

PGijsbers added 3 commits March 23, 2026 14:10

Remove unused error class

6c22227

Start unify the test function for now to ease migration tests

59f39ae

Add type hints

eeca93f

coderabbitai Bot reviewed Mar 23, 2026

View reviewed changes

Use the pre-existing Dataset error classes

5c31d23

The used error class should in general be consistent for the cause of the error.

coderabbitai Bot reviewed Mar 23, 2026

View reviewed changes

Comment thread src/routers/openml/qualities.py Outdated

Comment thread src/routers/openml/qualities.py Outdated

PGijsbers added 2 commits March 23, 2026 14:35

remove old misleading comment

a9dede1

add dataset that for sure is missing

570f51f

PGijsbers mentioned this pull request Mar 23, 2026

Add error messages for data/qualities/{id} #114

Open

PGijsbers merged commit f6ce940 into openml:main Mar 23, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add extra error messages and fix response schema#253

add extra error messages and fix response schema#253
PGijsbers merged 7 commits intoopenml:mainfrom
Mahhheshh:fix/114

Mahhheshh commented Feb 22, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Feb 22, 2026 •

edited

Loading

Rate limit exceeded

❌ Failed checks (1 warning)

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

PGijsbers commented Mar 12, 2026

Uh oh!

Mahhheshh commented Mar 15, 2026

Uh oh!

codecov Bot commented Mar 23, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

PGijsbers commented Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Mahhheshh commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

❌ Failed checks (1 warning)

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

PGijsbers commented Mar 12, 2026

Uh oh!

Mahhheshh commented Mar 15, 2026

Uh oh!

codecov Bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

PGijsbers commented Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Mahhheshh commented Feb 22, 2026 •

edited

Loading

coderabbitai Bot commented Feb 22, 2026 •

edited

Loading

codecov Bot commented Mar 23, 2026 •

edited

Loading