This is a project to support studies based on the StackExchange dump data. Our main goal is to provide helpful tools/scripts to:
- extracting relevant data from XML files to CSV (i.e. map!?)
- processing these CSV intermediary data (i.e. reduce!?)
- generating processed data to be used in analysis tools (like R)
The idea behind this is to simplify StackExchange data analysis as the main known process now is through relational databases. This is gettinh expensive given that site's are growing in its amount of data.