Skip to content

Add download Wikidata dump command to CLI #517

@andrewtavis

Description

@andrewtavis

Terms

Description

Scribe-Data will be expanding its functionality to work from Wikidata dumps. The first step in this is to add the ability for the CLI to download Wikidata Lexeme dumps. The following command should be added in this issue:

# Latest dump:
scribe-data download --wikidata-dump
scribe-data d -wd

# Specific dump:
scribe-data download --wikidata-dump YYYYMMDD
scribe-data d -wd YYYYMMDD

# Specific output directory:
scribe-data download --wikidata-dump --output-dir DIRECTORY_PATH
scribe-data d -wd -od DIRECTORY_PATH

The above will download the dumps from dumps.wikimedia.org/wikidatawiki/entities/. In the fist set of queries the latest .json.bz2 file will be downloaded, and in the second the URL for the given YYYYMMDD stamp will be checked and a .json.bz2 dump will be downloaded to the PWD. The third would add in an output directory path as is done on the get command, but let's not change the file name. We'll just allow the user to put it in a directory 😊

The functionality should be added in a file src/scribe_data/cli/download.py, with the option being added into src/scribe_data/cli/main.py :)

Contribution

Being worked on by @axif0 as a part of Outreachy! 📶🚀

Metadata

Metadata

Assignees

Labels

-priority-High priorityOutreachyAvailable for Outreachy participantsfeatureNew feature or requesthelp wantedExtra attention is needed

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions