Skip to content
Go to file

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Expressively vulgar: The socio-dynamics of vulgarity and its effects on sentiment analysis in social media

Please check this paper for details regarding annotation and modeling


  author    = {Cachola, Isabel  and  Holgate, Eric  and  Preo\c{t}iuc-Pietro, Daniel  and  Li, Junyi Jessy},
  title     = {Expressively vulgar: The socio-dynamics of vulgarity and its effects on sentiment analysis in social media},
  booktitle = {Proceedings of the 27th International Conference on Computational Linguistics},
  year      = {2018},
  pages     = {2927--2938},
  url       = {},

Use of the data presented here must abide by the Twitter Terms of Service and Developer Policy

A bi-LSTM that predicts sentiment values, utilizing vulgarity features.

The three possible vulgarity features are: (1) Masking (2) Insertion (3) Concatenations

First, run to prepare data set for modeling. automatically uses the path ./data/coling_twitter_data.tsv to the original data set but if your file path is different then you can change it using the flag --data_set. saves the cleaned data to ./data/cleaned_data.tsv.

Example Usage: python3 --data_set=./data/coling_twitter_data.tsv

After cleaning data, run

Required parameters:

  • train=path to training data set
  • validation_data=path to validation data set
  • initial_embed_weights=path to initial embedding weights
  • prefix=prefix to save model

For initial embedding weights, we use 200d CBOW embeddings pre-trained on 50M tweets (Astudillo et al., 2015). Optional parameters:

  • rnndim=<rnn dimension, default=128>

  • dropout=<dropout rate, default=0.2>

  • maxsentlen=<maximum length of tweets by number of words, default=60>

  • num_cat=<number of categories, default=5>

  • lr=<learning rate, default=0.001>

  • only_testing=<boolean if you only want to load a saved model, default=False>

  • concat=<boolean if using concat method, default=False>

  • insert=<boolean if using insert method, default=False>

  • mask=<boolean if using mask method, default=False>

Example usage: python3 --train=<path> --test=<path> --prefix=example --concat=True


  • Saves model as h5 and json files to ./training
  • Prints summary of model

If a test set is provided:

  • Saves predictions of test set to ./training/predictions
  • Prints micro mean absolute error
  • Prints macro mean absolute error
  • Prints per class mean absolute error


Corpora for vulgar and censored tweets annotated for sentiment



No releases published


No packages published


You can’t perform that action at this time.