# Business Understanding

As the holiday season is approaching large retail sellers are beginning to focus on what should be showcased in
their stores.  Target is looking to better understand the sentiment behind the products which they are selling in
their technology section.   doing this they will better understand which products to have higher inventory on

# Data Understanding

## Import Packages

In [227]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import re

In [228]:
data = pd.read_csv('judge-1377884607_tweet_product_company.csv', encoding = 'unicode_escape')

In [229]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9093 entries, 0 to 9092
Data columns (total 3 columns):
 #   Column                                              Non-Null Count  Dtype 
---  ------                                              --------------  ----- 
 0   tweet_text                                          9092 non-null   object
 1   emotion_in_tweet_is_directed_at                     3291 non-null   object
 2   is_there_an_emotion_directed_at_a_brand_or_product  9093 non-null   object
dtypes: object(3)
memory usage: 213.2+ KB


In [230]:
data.head()

Unnamed: 0,tweet_text,emotion_in_tweet_is_directed_at,is_there_an_emotion_directed_at_a_brand_or_product
0,.@wesley83 I have a 3G iPhone. After 3 hrs twe...,iPhone,Negative emotion
1,@jessedee Know about @fludapp ? Awesome iPad/i...,iPad or iPhone App,Positive emotion
2,@swonderlin Can not wait for #iPad 2 also. The...,iPad,Positive emotion
3,@sxsw I hope this year's festival isn't as cra...,iPad or iPhone App,Negative emotion
4,@sxtxstate great stuff on Fri #SXSW: Marissa M...,Google,Positive emotion


# Check Null Values

In [231]:
data.isna().sum()

tweet_text                                               1
emotion_in_tweet_is_directed_at                       5802
is_there_an_emotion_directed_at_a_brand_or_product       0
dtype: int64

In [232]:
data.emotion_in_tweet_is_directed_at = data.emotion_in_tweet_is_directed_at.fillna(value = "NA")

In [233]:
data.tweet_text = data.tweet_text.fillna(value = "NA")

In [234]:
data.isna().sum()

tweet_text                                            0
emotion_in_tweet_is_directed_at                       0
is_there_an_emotion_directed_at_a_brand_or_product    0
dtype: int64

In [235]:
data.drop(columns=['emotion_in_tweet_is_directed_at'], inplace = True)

# Clean Columns

## is_there_an_emotion_directed_at_a_brand_or_product

In [236]:
data['is_there_an_emotion_directed_at_a_brand_or_product'].value_counts()

No emotion toward brand or product    5389
Positive emotion                      2978
Negative emotion                       570
I can't tell                           156
Name: is_there_an_emotion_directed_at_a_brand_or_product, dtype: int64

In [237]:
data['is_there_an_emotion_directed_at_a_brand_or_product'] = data.is_there_an_emotion_directed_at_a_brand_or_product.map({"Negative emotion": 0, "I can't tell": 1, "No emotion toward brand or product": 1, "Positive emotion": 2})

In [238]:
data['is_there_an_emotion_directed_at_a_brand_or_product'].value_counts()

1    5545
2    2978
0     570
Name: is_there_an_emotion_directed_at_a_brand_or_product, dtype: int64

# Train Test Split

In [239]:
from sklearn.model_selection import train_test_split

X = data.drop("is_there_an_emotion_directed_at_a_brand_or_product", axis = 1)
y = data["is_there_an_emotion_directed_at_a_brand_or_product"]

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Standardizing

## Lowering Case 

In [240]:
def to_lower(word):
    result = word.lower()
    return result

In [242]:
X_train["tweet_text"] = X_train["tweet_text"].apply(to_lower)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_train["tweet_text"] = X_train["tweet_text"].apply(to_lower)


In [171]:
X_train.iloc[100]["tweet_text"]

"don't miss your chance to win rt @mention going to #sxsw? come by the #emc consulting booth for your chance to win an ipad 2! @mention"

## Remove Mentions

In [172]:
def remove_mentions(word):
    result = re.sub(r"@\S+", "", word)
    return result

In [173]:
X_train["tweet_text"] = X_train["tweet_text"].apply(remove_mentions)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X_train["tweet_text"] = X_train["tweet_text"].apply(remove_mentions)


In [174]:
X_train.iloc[100]["tweet_text"]

"don't miss your chance to win rt  going to #sxsw? come by the #emc consulting booth for your chance to win an ipad 2! "