Description: Users assessed tweets related to various brands and products, providing evaluations on whether the sentiment conveyed was positive, negative, or       neutral.      Additionally, if the tweet conveyed any sentiment, contributors identified the specific brand or product targeted by that emotion.
 	
Columns: tweet_text
         emotion_in_tweet_is_directed_at
         is_there_an_emotion_directed_at_a_brand_or_product


In [214]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import re
import string
import nltk
import warnings
%matplotlib inline

warnings.filterwarnings('ignore')

In [215]:
#import data
df = pd.read_csv('ML Assignment Dataset - Train.csv')
df.head()

Unnamed: 0,tweet_text,emotion_in_tweet_is_directed_at,is_there_an_emotion_directed_at_a_brand_or_product
0,.@wesley83 I have a 3G iPhone. After 3 hrs twe...,iPhone,Negative emotion
1,@jessedee Know about @fludapp ? Awesome iPad/i...,iPad or iPhone App,Positive emotion
2,@swonderlin Can not wait for #iPad 2 also. The...,iPad,Positive emotion
3,@sxsw I hope this year's festival isn't as cra...,iPad or iPhone App,Negative emotion
4,@sxtxstate great stuff on Fri #SXSW: Marissa M...,Google,Positive emotion


In [216]:
#data info
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8589 entries, 0 to 8588
Data columns (total 3 columns):
 #   Column                                              Non-Null Count  Dtype 
---  ------                                              --------------  ----- 
 0   tweet_text                                          8588 non-null   object
 1   emotion_in_tweet_is_directed_at                     3291 non-null   object
 2   is_there_an_emotion_directed_at_a_brand_or_product  8589 non-null   object
dtypes: object(3)
memory usage: 201.4+ KB


In [217]:
#removing usernames
def remove_usernames(tweet_text):
    # Check if the input is NaN, and return it unchanged
    if pd.isna(tweet_text):
        return tweet_text
    
    # Remove "@usernames" using regular expression
    cleaned_text = re.sub(r"@[\w]*", "", tweet_text)
    
    return cleaned_text

In [218]:
#creating a new column with no usernames
df['cleaned_tweet'] = df['tweet_text'].apply(remove_usernames)

In [219]:
#dropping the old column
df = df.drop('tweet_text', axis=1)

df.head()

Unnamed: 0,emotion_in_tweet_is_directed_at,is_there_an_emotion_directed_at_a_brand_or_product,cleaned_tweet
0,iPhone,Negative emotion,. I have a 3G iPhone. After 3 hrs tweeting at ...
1,iPad or iPhone App,Positive emotion,Know about ? Awesome iPad/iPhone app that yo...
2,iPad,Positive emotion,Can not wait for #iPad 2 also. They should sa...
3,iPad or iPhone App,Negative emotion,I hope this year's festival isn't as crashy a...
4,Google,Positive emotion,great stuff on Fri #SXSW: Marissa Mayer (Goog...


In [220]:
df = df.rename(columns={'emotion_in_tweet_is_directed_at': 'device'})
df = df.rename(columns={'is_there_an_emotion_directed_at_a_brand_or_product': 'label'})
df.head(9)

Unnamed: 0,device,label,cleaned_tweet
0,iPhone,Negative emotion,. I have a 3G iPhone. After 3 hrs tweeting at ...
1,iPad or iPhone App,Positive emotion,Know about ? Awesome iPad/iPhone app that yo...
2,iPad,Positive emotion,Can not wait for #iPad 2 also. They should sa...
3,iPad or iPhone App,Negative emotion,I hope this year's festival isn't as crashy a...
4,Google,Positive emotion,great stuff on Fri #SXSW: Marissa Mayer (Goog...
5,,No emotion toward brand or product,New iPad Apps For #SpeechTherapy And Communic...
6,,No emotion toward brand or product,
7,Android,Positive emotion,"#SXSW is just starting, #CTIA is around the co..."
8,iPad or iPhone App,Positive emotion,Beautifully smart and simple idea RT wrote a...


In [221]:
#converting the string to boolean 
df['label'] = df['label'].map({'Positive emotion': 1, 'Negative emotion': 0})
df.head(9)

Unnamed: 0,device,label,cleaned_tweet
0,iPhone,0.0,. I have a 3G iPhone. After 3 hrs tweeting at ...
1,iPad or iPhone App,1.0,Know about ? Awesome iPad/iPhone app that yo...
2,iPad,1.0,Can not wait for #iPad 2 also. They should sa...
3,iPad or iPhone App,0.0,I hope this year's festival isn't as crashy a...
4,Google,1.0,great stuff on Fri #SXSW: Marissa Mayer (Goog...
5,,,New iPad Apps For #SpeechTherapy And Communic...
6,,,
7,Android,1.0,"#SXSW is just starting, #CTIA is around the co..."
8,iPad or iPhone App,1.0,Beautifully smart and simple idea RT wrote a...


In [222]:
df = df.dropna()
print(df)

                               device  label  \
0                              iPhone    0.0   
1                  iPad or iPhone App    1.0   
2                                iPad    1.0   
3                  iPad or iPhone App    0.0   
4                              Google    1.0   
...                               ...    ...   
8573                           iPhone    1.0   
8575                             iPad    1.0   
8576  Other Google product or service    0.0   
8581               iPad or iPhone App    1.0   
8584                             iPad    1.0   

                                          cleaned_tweet  
0     . I have a 3G iPhone. After 3 hrs tweeting at ...  
1      Know about  ? Awesome iPad/iPhone app that yo...  
2      Can not wait for #iPad 2 also. They should sa...  
3      I hope this year's festival isn't as crashy a...  
4      great stuff on Fri #SXSW: Marissa Mayer (Goog...  
...                                                 ...  
8573   your PR gu

In [223]:
df['device'].unique()

array(['iPhone', 'iPad or iPhone App', 'iPad', 'Google', 'Android',
       'Apple', 'Android App', 'Other Google product or service',
       'Other Apple product or service'], dtype=object)

In [224]:
# Create a new column 'brand' and map values
df['device'] = df['device'].replace({
    'iPhone': 'Apple',
    'iPad or iPhone App': 'Apple',
    'iPad': 'Apple',
    'Other Apple product or service': 'Apple'
})

df['device'] = df['device'].replace({
    'Google': 'Google',
    'Android': 'Google',
    'Android App': 'Google',
    'Other Google product or service': 'Google'
})


In [225]:
df.tail(9)

Unnamed: 0,device,label,cleaned_tweet
8560,Apple,1.0,you should see the line here at #SXSW in fron...
8566,Apple,1.0,You know you've made it to #sxsw when you see ...
8567,Apple,1.0,what are your essentials for #SxSW? Mine? poc...
8568,Apple,1.0,your iPhone 4 cases are Rad and Ready! Stop b...
8573,Apple,1.0,your PR guy just convinced me to switch back ...
8575,Apple,1.0,&quot;papyrus...sort of like the ipad&quot; - ...
8576,Google,0.0,Diller says Google TV &quot;might be run over ...
8581,Apple,1.0,I've always used Camera+ for my iPhone b/c it ...
8584,Apple,1.0,Ipad everywhere. #SXSW {link}


In [226]:
# remove short words
df['cleaned_tweet'] = df['cleaned_tweet'].apply(lambda x: " ".join([w for w in x.split() if len(w)>3]))
df.head()

Unnamed: 0,device,label,cleaned_tweet
0,Apple,0.0,"have iPhone. After tweeting #RISE_Austin, dead..."
1,Apple,1.0,Know about Awesome iPad/iPhone that you'll lik...
2,Apple,1.0,wait #iPad also. They should sale them down #S...
3,Apple,0.0,hope this year's festival isn't crashy this ye...
4,Google,1.0,"great stuff #SXSW: Marissa Mayer (Google), O'R..."


In [227]:
# remove special characters, numbers and punctuations

df['cleaned_tweet'] = df['cleaned_tweet'].str.replace('[^a-zA-Z0-9\s]', '', regex=True)
df.head()

Unnamed: 0,device,label,cleaned_tweet
0,Apple,0.0,have iPhone After tweeting RISEAustin dead nee...
1,Apple,1.0,Know about Awesome iPadiPhone that youll likel...
2,Apple,1.0,wait iPad also They should sale them down SXSW
3,Apple,0.0,hope this years festival isnt crashy this year...
4,Google,1.0,great stuff SXSW Marissa Mayer Google OReilly ...


In [228]:
# individual words considered as tokens
tokenized_tweet = df['cleaned_tweet'].apply(lambda x: x.split())
tokenized_tweet.head()

0    [have, iPhone, After, tweeting, RISEAustin, de...
1    [Know, about, Awesome, iPadiPhone, that, youll...
2    [wait, iPad, also, They, should, sale, them, d...
3    [hope, this, years, festival, isnt, crashy, th...
4    [great, stuff, SXSW, Marissa, Mayer, Google, O...
Name: cleaned_tweet, dtype: object

In [231]:
from sklearn.feature_extraction.text import CountVectorizer
bow_vectorizer = CountVectorizer(max_df=0.90, min_df=2, max_features=1000, stop_words='english')
bow = bow_vectorizer.fit_transform(df['cleaned_tweet'])

In [232]:

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(bow, df['label'], random_state=42, test_size=0.25)

In [233]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import f1_score, accuracy_score

In [234]:
model = LogisticRegression()
model.fit(x_train, y_train)

In [235]:
pred = model.predict(x_test)
f1_score(y_test, pred)

0.9236209335219236

In [236]:
accuracy_score(y_test,pred)

0.8646616541353384

In [240]:
pred_prob[0][1] >= 0.3

True

In [241]:
import joblib

# Assuming 'model' is your trained model
joblib.dump(model, 'Twittermodel.joblib')


['Twittermodel.joblib']