Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapt scraper to openzim conventions #11

Closed
benoit74 opened this issue Jul 27, 2023 · 3 comments · Fixed by #14
Closed

Adapt scraper to openzim conventions #11

benoit74 opened this issue Jul 27, 2023 · 3 comments · Fixed by #14
Labels
enhancement New feature or request
Milestone

Comments

@benoit74
Copy link
Collaborator

benoit74 commented Jul 27, 2023

The scraper needs to be adapted to match openzim conventions.

List below is maybe not yet exhaustive, this is what came into my mind while working on openzim/zimfarm#804.

  • fix some issues around CLI parameters ; among others probably:

    • outpath CLI parameter is not taken into account
    • outzim is not relative to outpath and fallbacking to name, and does not support "{period}" placeholder
    • other dirs or not relative to outpath
    • more coherency among commands / phases of scraper operation
  • add support for stats JSON file

  • instead of multiple commands (fetch, prebuild, zim), have multiple phases in a single command, and flags to disable some phases if needed ; this will force to have more coherency among parameters of the various phases

  • take into account new openZIM Python conventions (see https://github.com/openzim/_python-bootstrap, including wiki pages)

PS: @mdp : no worries, this is something we will do on our own (or help you with) ; this is usual on new scrapers and mostly linked to a WIP on our side to better explain our expectations (or not, and consider this issue is normal / to do on our side since very specific to our way of working)

@benoit74 benoit74 added the enhancement New feature or request label Jul 27, 2023
@mdp
Copy link
Collaborator

mdp commented Jul 28, 2023

Hey @benoit74, Yeah this all make sense, I can tackle most of this early next week. Thanks for the _python-bootstrap repo link, I'll try and line this repo up with it.

@mdp
Copy link
Collaborator

mdp commented Aug 18, 2023

@benoit74 #12 is ready for review, although it doesn't address all the enhancements, it DRY's up the CLI arguments, and tries to match up to other python projects. It also should address #13

@benoit74
Copy link
Collaborator Author

I removed the "add support for stats JSON file" part because scraper is running so fast that it makes little sense to report progress.

@benoit74 benoit74 added this to the 1.0.0 milestone Aug 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants