Skip to content

picnicml/sarcasm-detection

 
 

Repository files navigation

Sarcasm Detection with Tensorflow and doddle-model

We embed 1 million Reddit comments with a Universal Sentence Encoder and then use the encoded data to train a logistic regression classifier that detects sarcasm.

Steps:

  • download the data from from Kaggle and unpack it into the root of this repository (directory name is sarcasm)
  • run pip install -r requirements.tx to install the Python dependencies (preferably in a virtual environment)
  • run python save_hub_module.py to download a pretrained universal sentence encoder model from Tensorflow Hub
  • run sbt "runMain io.picnicml.doddlemodel.sarcasm.EmbedDataset.class sarcasm/train-balanced-sarcasm.csv sarcasm/train-balanced-sarcasm-embedded.csv" to embed the text data
  • run sbt "runMain io.picnicml.doddlemodel.sarcasm.TrainClassifier.class sarcasm/train-balanced-sarcasm-embedded.csv logreg.model" to train a classifier

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Scala 91.0%
  • Python 9.0%