SQL and Bash scripts to import the offical Stack Overflow data dump and the SOTorrent data set, to retrieve Stack Overflow references from the BigQuery GitHub data set, and to retrieve data from the SOTorrent dataset for analysis.
SOTorrent pipeline running on Google Cloud
Collection of utility classes and methods used across different projects related to SOTorrent.
Extracts the version history of text and code blocks from the official Stack Overflow data dump.
Repository for Maven deployment.
Extract viewcount of threads from Stack Overflow data dumps.
Extract tags from posts in Stack Overflow data dumps.
Tool to create manually validated Stack Overflow post histories.
R scripts used to retrieve samples of SO posts, to compare the results of the metrics evaluation, and to conduct analyses using the SOTorrent dataset.
Shows code clones on Stack Overflow.
Implementation of various string similarity metrics.
Comparision of different string similarity metrics for reconstructing the history Stack Overflow posts.
Comparator app to validate connections of ground truth and computed similarity.
Visualization of edit and comment events in Stack Overflow threads.