You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The scraper needs to be adapted to match openzim conventions.
List below is maybe not yet exhaustive, this is what came into my mind while working on openzim/zimfarm#804.
fix some issues around CLI parameters ; among others probably:
outpath CLI parameter is not taken into account
outzim is not relative to outpath and fallbacking to name, and does not support "{period}" placeholder
other dirs or not relative to outpath
more coherency among commands / phases of scraper operation
add support for stats JSON file
instead of multiple commands (fetch, prebuild, zim), have multiple phases in a single command, and flags to disable some phases if needed ; this will force to have more coherency among parameters of the various phases
PS: @mdp : no worries, this is something we will do on our own (or help you with) ; this is usual on new scrapers and mostly linked to a WIP on our side to better explain our expectations (or not, and consider this issue is normal / to do on our side since very specific to our way of working)
The text was updated successfully, but these errors were encountered:
Hey @benoit74, Yeah this all make sense, I can tackle most of this early next week. Thanks for the _python-bootstrap repo link, I'll try and line this repo up with it.
@benoit74#12 is ready for review, although it doesn't address all the enhancements, it DRY's up the CLI arguments, and tries to match up to other python projects. It also should address #13
The scraper needs to be adapted to match openzim conventions.
List below is maybe not yet exhaustive, this is what came into my mind while working on openzim/zimfarm#804.
fix some issues around CLI parameters ; among others probably:
outpath
CLI parameter is not taken into accountoutzim
is not relative tooutpath
and fallbacking toname
, and does not support "{period}" placeholderoutpath
add support for stats JSON fileinstead of multiple commands (
fetch
,prebuild
,zim
), have multiple phases in a single command, and flags to disable some phases if needed ; this will force to have more coherency among parameters of the various phasestake into account new openZIM Python conventions (see https://github.com/openzim/_python-bootstrap, including wiki pages)
PS: @mdp : no worries, this is something we will do on our own (or help you with) ; this is usual on new scrapers and mostly linked to a WIP on our side to better explain our expectations (or not, and consider this issue is normal / to do on our side since very specific to our way of working)
The text was updated successfully, but these errors were encountered: