Skip to content

Commit

Permalink
Add multiprocess wiki-extractor argument
Browse files Browse the repository at this point in the history
  • Loading branch information
howl-anderson committed Jul 10, 2018
1 parent 3ce35a5 commit 644523e
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion extract_wikipedia_json_corpus.bash
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
#!/bin/bash

WikiExtractor.py --json raw_data/zhwiki-latest-pages-articles.xml.bz2 -o extracted_json_data
cpu_count=`nproc --all`
process_count=$(expr $cpu_count - 1)
WikiExtractor.py --json raw_data/zhwiki-latest-pages-articles.xml.bz2 -o extracted_json_data --processes ${process_count}

0 comments on commit 644523e

Please sign in to comment.