chanscraper

linguistic data scraper for 4chan, using basc-py4chan and nltk

in use by the iGen Project at Stanford University

usage

run chanscrape.py to collect board info as specified by the list of boards on line 10

chanscrape will output three types of files: .hist files to store board metadata, raw .txt dumps of an entire board's posts, and .xml files of individual posts (currently formatted for use in the iGen Project)

run chancheck.py to examine .hist metadata of boards

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
README.md		README.md
chancheck.py		chancheck.py
chanscrape.py		chanscrape.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

chanscraper

usage

About

Releases

Packages

Languages

maxfarr/chanscraper

Folders and files

Latest commit

History

Repository files navigation

chanscraper

usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages