Skip to content
Easy Namuwiki Extractor
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
namuwiki Update Python 2 Compatibility Oct 20, 2016
README.md Update tqdm and sample output Nov 29, 2016
Run_extractor.py Update tqdm and Python 2-3 compatibility Nov 29, 2016
namuwiki_sample.json initial commit Oct 20, 2016
test_output initial commit Oct 20, 2016

README.md

Easy NamuWiki Extractor

Simple Namuwiki Extractor extension of Namu Wiki Extractor

This module strips the namu mark from a namu wiki document and extracts its plain text only.

Environment

Usage

  • Clone this repo : git clone https://github.com/j-min/Easy-Namuwiki-Extractor

  • Download Namuwiki json dump inside directory of repo : wget http://file2.unofficialnis.ga/namuwiki_161031.json

  • You can find latest dumps here

  • Run extractor: python Run_extractor.py -i input_json_file -o outputfile_name

  • Tags:

--input (-i) : input filename
--output (-o) : output filename
--multiprocess (-m) : run multiprocessing module
--title (-t) : include titles of documents while extracting

How Namuwiki Json looks like

alt tag

Sample Output

alt tag

You can’t perform that action at this time.