Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Require a more strict input format for specifying Project Gutenberg works #1

Open
tanius opened this issue Jul 27, 2018 · 0 comments
Open

Comments

@tanius
Copy link
Member

tanius commented Jul 27, 2018

(This issue does not have to be solved right now. Only recording it here for now. We'll solve it when we need this script again.)

Currently, the script to extract Project Gutenberg metadata will interpret all numbers as specifying Project Gutenberg works. This can lead to errors, as some numbers might be year numbers etc..

Example: in the following line, the current script would interpret both "1920" and "4924" as numbers specifying Project Gutenberg works, and proceed by extracting metadata for them.

Dry-Farming. Published 1920. Project Gutenberg text no. 4924.

Solution proposal: only accept input lines that contain a single number and nothing else per line, and ignore lines with anything else present (while emitting a warning for such lines).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant