Skip to content

j-min/Easy-Namuwiki-Extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Easy NamuWiki Extractor

Simple Namuwiki Extractor extension of Namu Wiki Extractor

This module strips the namu mark from a namu wiki document and extracts its plain text only.

Environment

Usage

  • Clone this repo : git clone https://github.com/j-min/Easy-Namuwiki-Extractor

  • Download Namuwiki json dump inside directory of repo : wget http://file2.unofficialnis.ga/namuwiki_161031.json

  • You can find latest dumps here

  • Run extractor: python Run_extractor.py -i input_json_file -o outputfile_name

  • Tags:

--input (-i) : input filename
--output (-o) : output filename
--multiprocess (-m) : run multiprocessing module
--title (-t) : include titles of documents while extracting

How Namuwiki Json looks like

alt tag

Sample Output

alt tag

About

Easy Namuwiki Extractor

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages