Skip to content

fvancesco/emoji_modifiers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

20 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

How Gender and Skin Tone Modifiers Affect Emoji Semantics in Twitter

Francesco Barbieri and Jose Camacho Collados

The following repository includes the code and pre-trained embeddings from the paper How Gender and Skin Tone Modifiers Affect Emoji Semantics in Twitter (*SEM 2018).

Use our embeddings

We release the two sets of 100-dimensional SW2V embeddings trained on Twitter (USA-based, English):

  1. Word, base emoji and modifier embeddings. The vocabulary includes words (e.g. house, car, ...), base emojis (without sex or skin tone modifiers, e.g. πŸ‘), and modifiers (e.g. male/female, or light/dark skin tone). Download embeddings here [~300 MB]

  2. Word and emoji (base and modified) embeddings. The vocabulary includes words (e.g. house, car, ...) and emojis, both base (without sex or skin tone modifiers, e.g. πŸ‘), and with modifiers (e.g. πŸ‘πŸ»,πŸ‘πŸ½,πŸ‘πŸΏ). Download embeddings here [~300 MB]

Notes:

  • All words are lowercased.
  • For obtaining the original emoji and modifier encoding from the embeddings, you can use the following mapping (tab separated: frequency ranking, emoji, cldr, emoji code with modifiers, emoji code without modifiers).

When you run example.py (with python3) the output should be the following:

Train New Embeddings

We used the original SW2V code for training the embeddings: http://lcl.uniroma1.it/sw2v/ . We ran the code from the terminal as follows (these are the same parameters used in our experiments):

  1. Word, base emoji and modifier embeddings:
INPUT="tweets.txt"
OUTPUT="word_emoji_embedding_s0.bin"
sw2v -train $INPUT -output $OUTPUT -cbow 1 -size 100 -window 6 -negative 0 -hs 1 -threads 1 -binary 1 -iter 5 -update 0 -senses 0 -synsets_input 1 -synsets_target 1
  1. Word and emoji (base and modified) embeddings:
INPUT="tweets.txt"
OUTPUT="word_emoji_embedding_s1.bin"
sw2v -train $INPUT -output $OUTPUT -cbow 1 -size 100 -window 6 -negative 0 -hs 1 -threads 1 -binary 1 -iter 5 -update 0 -senses 1 -synsets_input 1 -synsets_target 1

The provided models are freely available under Creative Commons CC BY 3.0, using the reference below for attribution:

@InProceedings{barbieri:sem2018,
  author = 	"Barbieri, Francesco
		and Camacho-Collados, Jose",
  title = 	"How Gender and Skin Tone Modifiers Affect Emoji Semantics in Twitter",
  booktitle = 	"Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics",
  year = 	"2018",
  publisher = 	"Association for Computational Linguistics",
  pages = 	"101--106",
  location = 	"New Orleans, Louisiana",
  url = 	"http://aclweb.org/anthology/S18-2011"
}

About

*sem paper 2018 - models and code

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages