Skip to content
@sotorrent

SOTorrent

Reconstructing and analyzing the evolution of Stack Overflow posts.

Popular repositories

  1. SQL and Bash scripts to import the offical Stack Overflow data dump and the SOTorrent data set, to retrieve Stack Overflow references from the BigQuery GitHub data set, and to retrieve data from th…

    Shell 13 7

  2. Implementation of various string similarity metrics.

    Java 2 3

  3. R scripts used to retrieve samples of SO posts, to compare the results of the metrics evaluation, and to conduct analyses using the SOTorrent dataset.

    R 2 2

  4. Extracts the version history of text and code blocks from the official Stack Overflow data dump.

    Java 1 2

  5. Comparision of different string similarity metrics for reconstructing the history Stack Overflow posts.

    Java 1 2

  6. util Public

    Collection of utility classes and methods used across different projects related to SOTorrent.

    Java 1 3

Repositories

Showing 10 of 16 repositories
  • util Public

    Collection of utility classes and methods used across different projects related to SOTorrent.

    Java 1 Apache-2.0 3 0 0 Updated Jun 15, 2023
  • preprocessing-pipeline Public

    Preprocessing pipeline to extract and normalize text/code blocks from Stack Exchange forum posts and comments.

    Python 0 Apache-2.0 3 0 0 Updated Dec 20, 2022
  • posthistory-extractor Public

    Extracts the version history of text and code blocks from the official Stack Overflow data dump.

    Java 1 Apache-2.0 2 2 0 Updated Jun 27, 2022
  • so-edit-viz Public

    Visualization of edit and comment events in Stack Overflow threads.

    JavaScript 1 Apache-2.0 1 0 0 Updated Apr 21, 2022
  • so-clones Public

    Shows code clones on Stack Overflow.

    HTML 1 Apache-2.0 2 0 0 Updated Apr 21, 2022
  • db-scripts Public

    SQL and Bash scripts to import the offical Stack Overflow data dump and the SOTorrent data set, to retrieve Stack Overflow references from the BigQuery GitHub data set, and to retrieve data from the SOTorrent dataset for analysis.

    Shell 13 Apache-2.0 7 0 0 Updated Apr 7, 2022
  • pipeline Public

    SOTorrent pipeline running on Google Cloud

    Python 1 Apache-2.0 2 0 0 Updated Jun 30, 2021
  • so-internal-refs Public

    Scripts used to import and analyze internal web server logs provided by Stack Overflow under an NDA.

    R 0 Apache-2.0 0 0 0 Updated May 24, 2021
  • releases Public

    Repository for Maven deployment.

    0 Apache-2.0 1 0 0 Updated Nov 25, 2020
  • postview-extractor Public

    Extract viewcount of threads from Stack Overflow data dumps.

    Java 0 Apache-2.0 0 0 0 Updated Nov 1, 2020

Top languages

Loading…

Most used topics

Loading…