Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace / with _ in legacy-format IDs in download filenames #118

Merged
merged 4 commits into from
Jul 11, 2023

Conversation

lukasschwab
Copy link
Owner

@lukasschwab lukasschwab commented Jul 11, 2023

Description

Replace / with _ in download ID components

Fixes #117. Differentiates character-escaping strategies for paper titles and paper IDs:

  • IDs: only replace / with _, to account for legacy-form IDs.
  • Titles: replace non-word (i.e. [^\w]) characters with _.

Adds a regression test.

Notes

I'm not super happy with this solution, but I think it's minimally-breaking. Have to retain backwards-compatibility with a bad filename-sanitization scheme — too bad!

For more sophisticated (but more restricted) strategies, see e.g. Django's slugify.

Breaking changes

List any changes that break the API usage supported on master.

None

Relevant issues

List GitHub issues relevant to this change.

Checklist

  • (If appropriate) README.md example usage has been updated.

Fails: `pytest tests/test_download.py`
Fixes #117. Differentiates character-escaping strategies for paper
titles and paper IDs:

+ IDs: only replace `/` with `_`, to account for legacy-form IDs.
+ Titles: replace non-word (i.e. `[^\w]`) characters with `_`.

Extends the regression test with an expected filename.
@lukasschwab lukasschwab merged commit f535ec0 into master Jul 11, 2023
3 checks passed
@lukasschwab lukasschwab deleted the fix-117-sanitize-paper-id-components branch July 11, 2023 05:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error when downloading paper with '/' in short id
1 participant