# Emoji Sentiment

Are popular emojis generally associated with positive or negative sentiments?

The file `"emoji-sentiment.csv"` provides data on the sentiment associated with various emojis.

Researchers examined 1.6 million tweets across 13 European languages. Each tweet was labeled by annotators as positive (+1), negative (-1), or neutral (0). About 4% of these tweets included emojis.

Columns include:
- `Occurrences [5...max]`: Number of times the emoji appears in the dataset.
- `Position [0...1]`: Average position of the emoji in tweets, from start (0) to end (1).
- `Neg [0...1]`: Percentage of tweets with the emoji that are 'negative'.
- `Neu [0...1]`: Percentage of tweets with the emoji that are 'neutral'.
- `Pos [0...1]`: Percentage of tweets with the emoji that are 'positive'.



In [53]:
# FOR GOOGLE COLAB ONLY.
# Uncomment and run the code below. A dialog will appear to upload files.
# Upload 'emoji-sentiment.csv'.

# from google.colab import files
# uploaded = files.upload()

In [2]:
import pandas as pd
df = pd.read_csv('emoji-sentiment.csv')
df.head(3)

Unnamed: 0,Char,Image [twemoji],Unicode codepoint,Occurrences [5...max],Position [0...1],Neg [0...1],Neut [0...1],Pos [0...1],Sentiment bar (c.i. 95%),Unicode name,Unicode block
0,😂,😂,0x1f602,14622,0.805,0.247,0.285,0.468,,FACE WITH TEARS OF JOY,Emoticons
1,❤,❤,0x2764,8050,0.747,0.044,0.166,0.79,,HEAVY BLACK HEART,Dingbats
2,♥,♥,0x2665,7144,0.754,0.035,0.272,0.693,,BLACK HEART SUIT,Miscellaneous Symbols


### Project Ideas:

Data Cleaning: 
- Remove unnecessary columns that are not useful for your analysis.

- Rename the remaining columns using `snake_case` (all lowercase letters with underscores between words).

New Variables:
- Add a new column called `sentiment`, where sentiment = (% positive tweets) - (% negative tweets).

- Add a `positive_flag` column that is `True` if `sentiment > 0` (or above a set threshold), otherwise `False`.

Types of questions you can now answer more easily:
- What percentage of emojis in the dataset have a positive sentiment?

- What percentage of the top 20 most popular emojis are positive?

- Which emoji (with more than 500 mentions) is the most positive?

- Which emoji (with more than 500 mentions) is the most negative?

- Where in the tweets are most emojis located (i.e. at the beginning or the end)?

- Is there a difference in the placement of positive versus negative emojis within a tweet?

In [58]:
# YOUR CODE HERE (add additional cells as needed)
# Select relevant columns and rename them
analyze = ['Char', 'Occurrences [5...max]', 'Position [0...1]', 'Neg [0...1]', 'Neut [0...1]', 'Pos [0...1]']
# Select relevant columns
emoji_sentiment = df[analyze]

# Rename columns
col_map = {
    'Char': 'emoji',
    'Occurrences [5...max]': 'number_of_occurrences',
    'Position [0...1]': 'avg_position',
    'Neg [0...1]': 'percentage_negative',
    'Neut [0...1]': 'percentage_neutral',
    'Pos [0...1]': 'percentage_positive',
}
emoji_sentiment = emoji_sentiment.rename(columns=col_map)

# Add a new column 'sentiment' with the difference between positive and negative percentages
emoji_sentiment['sentiment'] = emoji_sentiment['percentage_positive'] - emoji_sentiment['percentage_negative']

# Add a new column 'positive_flag' to indicate if the sentiment is positive
emoji_sentiment['positive_flag'] = emoji_sentiment['sentiment'] > 0

# Calculate the percentage of emojis with positive sentiment
positive_sentiment = round(emoji_sentiment[emoji_sentiment['positive_flag']].shape[0] / emoji_sentiment.shape[0] * 100, 2)

# Calculate the percentage of popular emojis (top 20 by occurrences) that have positive sentiment
popular_positive_sentiment = (
    emoji_sentiment
    .nlargest(20, 'number_of_occurrences')
    .query("positive_flag == True")
    .shape[0] / 20 * 100
)

# Find the most positive and most negative emojis with more than 500 occurrences
most_positive = emoji_sentiment[emoji_sentiment['number_of_occurrences'] > 500].nlargest(1, 'sentiment')

# Find the most negative emoji with more than 500 occurrences
most_negative = emoji_sentiment[emoji_sentiment['number_of_occurrences'] > 500].nsmallest(1, 'sentiment')

# Determine if emojis are most often located at the beginning or end of texts
most_located = (
    'Beginning'
    if (emoji_sentiment['avg_position'] < 0.5).sum() > emoji_sentiment.shape[0] / 2
    else 'End'
)

positive_located = (
    'Beginning'
    if (emoji_sentiment.query("positive_flag == True")['avg_position'] < 0.5).sum()
       > emoji_sentiment.query("positive_flag == True").shape[0] / 2
    else 'End'
)

negative_located = (
    'Beginning'
    if (emoji_sentiment.query("positive_flag == False")['avg_position'] < 0.5).sum()
       > emoji_sentiment.query("positive_flag == False").shape[0] / 2
    else 'End'
)

same_location = most_located == positive_located == negative_located

# Print the results
print(f"Percentage of emojis with positive sentiment: {positive_sentiment}%")
print(f"Percentage of popular emojis with positive sentiment: {popular_positive_sentiment}%")
print("Most positive emoji with more than 500 occurrences:")
print(most_positive[['emoji', 'percentage_positive']])
print("Most negative emoji with more than 500 occurrences:")
print(most_negative[['emoji', 'percentage_negative']])
print(f"Emojis are most often located at the: {most_located}")
print(f"Positive emojis are most often located at the: {positive_located}")
print(f"Negative emojis are most often located at the: {negative_located}")

# Save the answers to a text file
with open('emoji_sentiment_analysis.txt', 'w') as f:
    f.write(f"Percentage of emojis with positive sentiment: {positive_sentiment}%\n")
    f.write(f"Percentage of popular emojis with positive sentiment: {popular_positive_sentiment}%\n")
    f.write("Most positive emoji with more than 500 occurrences:\n")
    f.write(most_positive[['emoji', 'percentage_positive']].to_string(index=False) + '\n')
    f.write("Most negative emoji with more than 500 occurrences:\n")
    f.write(most_negative[['emoji', 'percentage_negative']].to_string(index=False) + '\n')
    f.write(f"Emojis are most often located at the: {most_located}\n")
    f.write(f"Positive emojis are most often located at the: {positive_located}\n")
    f.write(f"Negative emojis are most often located at the: {negative_located}\n")
    f.write(f"Positive and negative emojis share the same predominant location: {'Yes' if same_location else 'No'}\n")

 


Percentage of emojis with positive sentiment: 82.42%
Percentage of popular emojis with positive sentiment: 90.0%
Most positive emoji with more than 500 occurrences:
  emoji  percentage_positive
1     ❤                 0.79
Most negative emoji with more than 500 occurrences:
   emoji  percentage_negative
23     😒                0.591
Emojis are most often located at the: End
Positive emojis are most often located at the: End
Negative emojis are most often located at the: End
