# Twitter Sentiment Analysis on Apple and Google Products

## 1. Business Understanding

### 1.1 Introduction

In the realm of technology giants, Apple and Google, public sentiment plays a pivotal role in shaping their strategies and brand perception. This analysis focuses on understanding the dynamics of sentiment expressed on Twitter concerning these companies. Apple and Google have a significant global presence and are influenced by various factors, including product launches and developments. Last year, Apple's revenue exceeded 300 billion USD, while Google's parent company, Alphabet, reported revenues of over 181 billion USD. Monitoring and leveraging public sentiment can provide valuable insights for both companies in guiding marketing strategies and product development decisions.

### 1.2 Problem Statement

The challenge at hand is to harness the amount of sentiment data from Twitter and convert it into actionable insights for Apple and Google. This analysis aims to uncover patterns and trends in sentiment fluctuations concerning these companies. Detecting spikes in sentiment and identifying their underlying causes can enable more informed decision-making, whether it involves addressing product issues or capitalizing on positive public perception.

### 1.3 Objectives

#### 1.3.1 Main Ojective

The primary goal of this project is to analyze Twitter sentiment data related to Apple and Google comprehensively. By doing so, we intend to provide stakeholders within these organizations with valuable insights into the ebb and flow of public sentiment. Understanding when and why sentiment shifts occur can guide product development strategies, marketing campaigns, and brand management.

#### 1.3.2 Specific Objectives

To explore and preprocess the dataset, including handling missing values, transforming features
To perform exploratory data analysis to gain insights into the distribution and relationships between different features and the target variable.
To build binary and multiclass classification and evaluate their performance using appropriate metrics.
To interpret the results of the models
To provide recommendations to stakeholders based on the insights gained from the modeling process

### 1.4 Experimental Design

- Data Collection

- Data Preprocessing

- Exploration Data Analysis

- Modelling

- Model evaluation

- Conclusion

- Recommendations

### 1.5 Metric of success

The success of our project hinges on its ability to extract meaningful insights from Twitter sentiment data. Our evaluation metric will include the effectiveness of identifying sentiment spikes, trends in sentiment over time, and the correlation between sentiment and significant events, such as product launches or controversies. Additionally, the project's impact on informing data-driven decision-making within Apple and Google will be a key measure of success.

## 2. Data Understanding

The dataset originates from CrowdFlower, accessed via data.world. The choice of this dataset is highly suitable for our project's objectives. It contains over 9,000 tweets that have been manually rated for sentiment (positive, negative, or neither). These tweets serve as a valuable resource for training and testing our sentiment analysis models. Since Twitter is a prominent platform for users to express their opinions and sentiments publicly, this data represents real-world sentiment effectively.

The dataset is sufficiently large, comprising over 9,000 tweets, which ensures an ample amount of data for model training and validation. The features used in our analysis have been carefully selected based on their properties and relevance to the project's objectives. Features such as tweet_text and emotion toward a brand or product are integral to understanding sentiment and determining the factors influencing it.

While the dataset is valuable, it does have limitations that could impact our analysis. For instance, tweet sentiment is not always straightforward to discern, as it may depend on context, sarcasm, or language nuances. Additionally, the dataset may not be fully representative of all sentiments expressed on Twitter.

## 3. Data Collection

In [1]:
# importing libraries
# importing libraries
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd 
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import regexp_tokenize, word_tokenize, RegexpTokenizer
from nltk.stem import PorterStemmer
from nltk.probability import FreqDist
from nltk.stem import WordNetLemmatizer
from nltk.corpus import stopwords
import os
import re
import sys
import string
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.model_selection import GridSearchCV
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report
from sklearn.neighbors import KNeighborsClassifier
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Embedding, Flatten
import xgboost as xgb
from wordcloud import WordCloud
import warnings
warnings.filterwarnings("ignore")

In [2]:
# importing libraries
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\User\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\User\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\User\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

In [3]:
# loading the dataset and displaying first five rows
data = pd.read_csv('judge-1377884607_tweet_product_company.csv', encoding='ISO-8859-1')
data.head()

Unnamed: 0,tweet_text,emotion_in_tweet_is_directed_at,is_there_an_emotion_directed_at_a_brand_or_product
0,.@wesley83 I have a 3G iPhone. After 3 hrs twe...,iPhone,Negative emotion
1,@jessedee Know about @fludapp ? Awesome iPad/i...,iPad or iPhone App,Positive emotion
2,@swonderlin Can not wait for #iPad 2 also. The...,iPad,Positive emotion
3,@sxsw I hope this year's festival isn't as cra...,iPad or iPhone App,Negative emotion
4,@sxtxstate great stuff on Fri #SXSW: Marissa M...,Google,Positive emotion


In [4]:
# checking the rows and columns
data.shape

(9093, 3)

In [6]:
# checking descriptive statistics
data.describe

<bound method NDFrame.describe of                                              tweet_text   
0     .@wesley83 I have a 3G iPhone. After 3 hrs twe...  \
1     @jessedee Know about @fludapp ? Awesome iPad/i...   
2     @swonderlin Can not wait for #iPad 2 also. The...   
3     @sxsw I hope this year's festival isn't as cra...   
4     @sxtxstate great stuff on Fri #SXSW: Marissa M...   
...                                                 ...   
9088                      Ipad everywhere. #SXSW {link}   
9089  Wave, buzz... RT @mention We interrupt your re...   
9090  Google's Zeiger, a physician never reported po...   
9091  Some Verizon iPhone customers complained their...   
9092  Ï¡Ïàü_ÊÎÒ£Áââ_£â_ÛâRT @...   

     emotion_in_tweet_is_directed_at   
0                             iPhone  \
1                 iPad or iPhone App   
2                               iPad   
3                 iPad or iPhone App   
4                             Google   
...              

In [7]:
# checking the data types
data.dtypes

tweet_text                                            object
emotion_in_tweet_is_directed_at                       object
is_there_an_emotion_directed_at_a_brand_or_product    object
dtype: object

In [9]:
# Calculating sentiment value counts
sentiment_counts = data['is_there_an_emotion_directed_at_a_brand_or_product'].value_counts()
print(sentiment_counts)

is_there_an_emotion_directed_at_a_brand_or_product
No emotion toward brand or product    5389
Positive emotion                      2978
Negative emotion                       570
I can't tell                           156
Name: count, dtype: int64


## 4. Data Preprocessing

These process involves:

1. Dropping unwanted columns
2. Handling missing values.
3. Renaming columns and sentiment
4. Cleaning text data
5. Text Vectorization

#### 4.1 Dropping unwanted columns

In [10]:
# dropping emotion_in_tweet_is_directed_at column since we wont be using it modelling
data = data.drop(columns['emotion_in_tweet_is_directed_at'])

NameError: name 'columns' is not defined