Skip to content

linguistic data scraper for 4chan, using basc-py4chan and nltk

Notifications You must be signed in to change notification settings

maxfarr/chanscraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

chanscraper

linguistic data scraper for 4chan, using basc-py4chan and nltk

in use by the iGen Project at Stanford University

usage

run chanscrape.py to collect board info as specified by the list of boards on line 10

chanscrape will output three types of files: .hist files to store board metadata, raw .txt dumps of an entire board's posts, and .xml files of individual posts (currently formatted for use in the iGen Project)

run chancheck.py to examine .hist metadata of boards

About

linguistic data scraper for 4chan, using basc-py4chan and nltk

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages