-
Notifications
You must be signed in to change notification settings - Fork 0
janhuenermann/stackexchange-data-utils
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
stackexchange-dump ================== Scripts to deal with gathering, cleaning and loading of stackexchange data Using the scripts ./download.sh python ./ingest.py stackexchange/ db.sqlite --ignore-meta python ./tidy.py db.sqlite python ./export.py db.sqlite chunks/ Stack Exchange dataset - Overview of dataset https://archive.org/details/stackexchange - Download links https://archive.org/download/stackexchange - Documentation of dataset https://meta.stackexchange.com/questions/2677/database-schema-documentation-for-the-public-data-dump-and-sede - Size of question/answer/user database 100G - Size of question-answer pairs as plaintext 88G (includes non-English communities) - License of dataset CC-BY-SA 4.0
About
Gather and clean Stack Exchange dumps