Skip to content

Scraper to download Japanese news, quizzes, and other resources for use offline. Data is used for personal study only, and NLP is applied to isolate Kanji for reading cards, for example.

License

Notifications You must be signed in to change notification settings

kinow/nihongo-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Nihongo Scraper

Scraper to download Japanese news, quizzes, and other resources for use offline. Data is used for personal study only, and NLP is applied to isolate Kanji for reading cards, for example.

  • nihongo-spider simply scrapes a known site with quizzes and saves the response as JSON/CSV

URL's used are hidden, to prevent a mass of requests to all the sites, or bots following links from GitHub.

Build

git clone https://github.com/kinow/nihongo-scraper.git
cd nihongo-scraper
pip install -r requirements

Execute nihongo-spider

cat > .env <<EOF
START_URL=http://<url>/context/path/
EOF
scrapy runspider nihongo-spider.py -o questions.json

About

Scraper to download Japanese news, quizzes, and other resources for use offline. Data is used for personal study only, and NLP is applied to isolate Kanji for reading cards, for example.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages