Skip to content

gaybro8777/grab-wikipedia-abstracts

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Grab all Wikipedia abstracts, in all languages

For every dump in:
    http://dumps.wikimedia.org/backup-index.html
find the file abstract.xml and wget it.

USAGE:
    ./grab-wikipedia-abstracts.py

This will create a directory download.wikimedia.org/ with the abstract.xml files.

REQUIREMENTS:
    * BeautifulSoup

    * wget

About

Grab all Wikipedia abstracts, in all languages

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%