Skip to content

noanabeshima/wikipedia-downloader

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Download 2020 English Wikipedia

The code in this repo downloads the 2020 Wikipedia TensorFlow dataset and converts it to a collection of JSON files, each a list of strings. Each string is a Wikipedia article.

To run, create a new conda environment with the commands

conda create --name wikipedia

conda activate wikipedia

Then install requirements

bash install_requirements.sh

and run the script

python main.py

JSON files will be saved in the 'output' folder.

About

Downloads 2020 English Wikipedia articles as plaintext

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published