User modeling for sarcasm detection on Reddit corpus from Khodak et al. Published in EMNLP 2018.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
figures
logs
notebooks
sample_predictions
src
.gitignore
LICENSE
README.md
pull_all_data.sh
pull_small_data.sh
requirements.txt

README.md

reddit-sarc

Sarcasm detection on Reddit corpus from Khodak et al (http://nlp.cs.princeton.edu/SARC/2.0/)

pull_all_data.sh provides commands to download, extract, and move all data to the locations expected by the code pull_small_data.sh also does so, but only for a subset

logs/ contains the logs for all of the test runs reported in the paper, as well as the script logs/significance_tests.py which was used to generate confidence intervals

src/baselines.py contains code to replicate the baselines from Khodak et al's dataset paper

src/main.py is configured for a run to test whichever "spreadsheet cell index" model/dataset combination is passed on the command line, e.g. python main.py B2. Commented-out sections in main.py provide examples for other ways to use the code.

src/rnn.py contains the actual RNN class, as well as the helper class used to train and evaluate the RNN, generate graphs etc

src/rnn_util.py contains numerous helper functions for data transformation, generating user representations, etc

src/run_askreddit_tests.sh and run_tests.sh contain examples of how to run the final tests

src/test_configs.py contains hyperparameter configurations for all results reported in the paper

src/util.py contains generic helper functions for data reading and processing

sample_predictions/ contains sample predictions from the 3 user models on holdout sets for one run each on the balanced r/politics and balanced full datasets