Skip to content

Latest commit

 

History

History
17 lines (11 loc) · 415 Bytes

README.md

File metadata and controls

17 lines (11 loc) · 415 Bytes

wikiwc

A Wikipedia word frequency counter.

This project makes use of Wikipedia_Extractor to pre-process a full Mediawiki dump into basically plain text files. It then parses these files into separate words, and counts the number of occurences of each word.

Usage

As a default, wikiwc downloads the german wikipedia.

$ make WIKILANG=en