Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 

README.md

Expressively vulgar: The socio-dynamics of vulgarity and its effects on sentiment analysis in social media

Please check this paper for details regarding annotation and modeling

Citation:

@InProceedings{cachola2018vulgar,
  author    = {Cachola, Isabel  and  Holgate, Eric  and  Preo\c{t}iuc-Pietro, Daniel  and  Li, Junyi Jessy},
  title     = {Expressively vulgar: The socio-dynamics of vulgarity and its effects on sentiment analysis in social media},
  booktitle = {Proceedings of the 27th International Conference on Computational Linguistics},
  year      = {2018},
  pages     = {2927--2938},
  url       = {http://aclweb.org/anthology/C18-1248},
}

Use of the data presented here must abide by the Twitter Terms of Service and Developer Policy

A bi-LSTM that predicts sentiment values, utilizing vulgarity features.

The three possible vulgarity features are: (1) Masking (2) Insertion (3) Concatenations

First, run clean_data.py to prepare data set for modeling. clean_data.py automatically uses the path ./data/coling_twitter_data.tsv to the original data set but if your file path is different then you can change it using the flag --data_set. clean_data.py saves the cleaned data to ./data/cleaned_data.tsv.

Example Usage: python3 clean_data.py --data_set=./data/coling_twitter_data.tsv

After cleaning data, run bilstm.py

Required parameters:

  • train=path to training data set
  • validation_data=path to validation data set
  • initial_embed_weights=path to initial embedding weights
  • prefix=prefix to save model

For initial embedding weights, we use 200d CBOW embeddings pre-trained on 50M tweets (Astudillo et al., 2015). Optional parameters:

  • rnndim=<rnn dimension, default=128>

  • dropout=<dropout rate, default=0.2>

  • maxsentlen=<maximum length of tweets by number of words, default=60>

  • num_cat=<number of categories, default=5>

  • lr=<learning rate, default=0.001>

  • only_testing=<boolean if you only want to load a saved model, default=False>

  • concat=<boolean if using concat method, default=False>

  • insert=<boolean if using insert method, default=False>

  • mask=<boolean if using mask method, default=False>

Example usage: python3 bilstm.py --train=<path> --test=<path> --prefix=example --concat=True

Returns:

  • Saves model as h5 and json files to ./training
  • Prints summary of model

If a test set is provided:

  • Saves predictions of test set to ./training/predictions
  • Prints micro mean absolute error
  • Prints macro mean absolute error
  • Prints per class mean absolute error

About

Corpora for vulgar and censored tweets annotated for sentiment

Resources

Releases

No releases published

Packages

No packages published

Languages

You can’t perform that action at this time.