Skip to content

MAINT Normalize remote dataset file types from URLs#1486

Merged
romanlutz merged 3 commits intomicrosoft:mainfrom
biefan:fix-remote-dataset-url-file-types
Apr 15, 2026
Merged

MAINT Normalize remote dataset file types from URLs#1486
romanlutz merged 3 commits intomicrosoft:mainfrom
biefan:fix-remote-dataset-url-file-types

Conversation

@biefan
Copy link
Copy Markdown
Contributor

@biefan biefan commented Mar 17, 2026

Summary

  • normalize remote dataset file types before handler lookup
  • ignore URL query strings and fragments when inferring the file extension
  • add regression coverage for signed-style URLs and uppercase extensions

Problem

_RemoteDatasetLoader._fetch_from_url() currently infers the file type with source.split(".")[-1]. That breaks two real input shapes:

  • public URLs with query strings, e.g. https://example.com/data.json?download=1
  • URLs or paths with uppercase extensions, e.g. https://example.com/data.JSON

Both are currently rejected with Invalid file_type before the loader even reaches the HTTP fetch path, even though the underlying data format is valid and supported.

Testing

  • .venv/bin/pytest tests/unit/datasets/test_remote_dataset_loader.py -q

@romanlutz
Copy link
Copy Markdown
Contributor

This needs fixes to pre-commit steps

@romanlutz romanlutz changed the title Normalize remote dataset file types from URLs MAINT Normalize remote dataset file types from URLs Apr 15, 2026
romanlutz and others added 2 commits April 15, 2026 14:59
…commit issues

- Resolve merge conflict in test_remote_dataset_loader.py (keep both new tests)
- Add missing Returns section to _get_file_type docstring (ruff DOC201)
- Fix missing newline at end of test file (end-of-file-fixer)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Cover query strings, fragments, uppercase extensions, local paths,
and no-extension edge case.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@romanlutz romanlutz merged commit 7ee78e3 into microsoft:main Apr 15, 2026
39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants