# ***Sentiment Analysis of Tweets on Apple and Google Products***



## *Introduction*

This project aims to perform sentiment analysis on tweets related to Apple and Google products using a labeled dataset. By classifying tweets as positive, negative, or neutral, we seek to understand public perception of these tech giants and uncover trends in user opinion. The dataset offers a realistic glimpse into consumer sentiment, making it ideal for training and evaluating NLP classification models.

### *Problem Statement*

With millions of users sharing their opinions on social media, understanding customer sentiment toward major tech brands is vital. This project seeks to automatically classify tweets mentioning Apple and Google products into positive, negative, or neutral sentiments. By leveraging natural language processing, we aim to build a model that can extract meaningful insights from public opinion, helping stakeholders understand brand perception and inform strategic decisions.

### Dataset Overview: "tweet_product_company.csv"

This dataset captures real-world tweet data mentioning Apple and Google products, offering a rich source of public sentiment expressed through social media. Each entry typically includes the tweet's text, the referenced product, the associated company (Apple or Google), and a sentiment label—categorized as **positive**, **negative**, or **neutral**.

### Data Source

The dataset was curated from Twitter via public scraping or aggregated repositories, specifically targeting mentions of Apple and Google. While exact sourcing details may vary, the collection aligns with ethical standards for public tweet analysis and is commonly used in NLP and sentiment modeling projects. It reflects organic user opinions and consumer reactions across various product releases, updates, and experiences.

### Why This Dataset?

This data is perfectly suited for sentiment analysis because:
- Tweets are short, noisy, and opinion-driven—ideal for testing robust NLP techniques.
- It supports both **binary** classification (positive vs negative) and **multiclass** sentiment prediction.
- Comparative sentiment between Apple and Google enables business insights and brand analysis.
- It offers real-world variability—emoji usage, slang, abbreviations—which makes preprocessing more meaningful.

Through this dataset, we aim to:
- Understand how users perceive Apple vs. Google across time.
- Build a scalable model that can automatically detect sentiment in real time.
- Extract actionable insights that can guide marketing, product improvements, and customer engagement.


## 1. ***DATA PREPARATION***

We will 
- import all neccessary libraries
- load the 'twee_product_company.csv' dataset
- exploratory inspection

#### **a) importing necessary libraries** 

In [1]:
# Core Libraries
import pandas as pd
import numpy as np

# Text Preprocessing
import re
import string
import nltk
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Feature Extraction
from sklearn.feature_extraction.text import TfidfVectorizer

# Model Building
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix

#### **b) Loading the dataset**

In [12]:
df = pd.read_csv('../data/tweet_product_company.csv', encoding='ISO-8859-1')
df.head()

Unnamed: 0,tweet_text,emotion_in_tweet_is_directed_at,is_there_an_emotion_directed_at_a_brand_or_product
0,.@wesley83 I have a 3G iPhone. After 3 hrs twe...,iPhone,Negative emotion
1,@jessedee Know about @fludapp ? Awesome iPad/i...,iPad or iPhone App,Positive emotion
2,@swonderlin Can not wait for #iPad 2 also. The...,iPad,Positive emotion
3,@sxsw I hope this year's festival isn't as cra...,iPad or iPhone App,Negative emotion
4,@sxtxstate great stuff on Fri #SXSW: Marissa M...,Google,Positive emotion


#### **C) Exploratory inspection of the dataset**

In [None]:
df.info

<bound method DataFrame.info of                                              tweet_text  \
0     .@wesley83 I have a 3G iPhone. After 3 hrs twe...   
1     @jessedee Know about @fludapp ? Awesome iPad/i...   
2     @swonderlin Can not wait for #iPad 2 also. The...   
3     @sxsw I hope this year's festival isn't as cra...   
4     @sxtxstate great stuff on Fri #SXSW: Marissa M...   
...                                                 ...   
9088                      Ipad everywhere. #SXSW {link}   
9089  Wave, buzz... RT @mention We interrupt your re...   
9090  Google's Zeiger, a physician never reported po...   
9091  Some Verizon iPhone customers complained their...   
9092  Ï¡Ïàü_ÊÎÒ£Áââ_£â_ÛâRT @...   

     emotion_in_tweet_is_directed_at  \
0                             iPhone   
1                 iPad or iPhone App   
2                               iPad   
3                 iPad or iPhone App   
4                             Google   
...                

DataFrame Summary

This dataset contains **9,093 tweet entries** spread across **three columns**:

1. `tweet_text`: The full text of each tweet, which serves as our input for sentiment classification.
2. `emotion_in_tweet_is_directed_at`: Indicates which brand or product the emotion is directed at (e.g., iPhone, iPad, Google). Some entries may be missing (`NaN`), suggesting tweets without clear product targeting.
3. `is_there_an_emotion_directed_at_a_brand_or_product`: Contains sentiment labels like "Positive emotion," "Negative emotion," or "No emotion toward brand or product."

In [11]:
df.describe(include='object')

Unnamed: 0,tweet_text,emotion_in_tweet_is_directed_at,is_there_an_emotion_directed_at_a_brand_or_product
count,9092,3291,9093
unique,9065,9,4
top,RT @mention Marissa Mayer: Google Will Connect...,iPad,No emotion toward brand or product
freq,5,946,5389


Data Summary

The `.describe()` method provides a high-level overview of the dataset’s non-numeric columns:

- **`tweet_text`**  
  - **Count**: 9,092 entries  
  - **Unique**: 9,065 distinct tweets  
  - **Most Frequent Tweet**: `"RT @mention Marissa Mayer: Google Will Connect..."`  
  - **Frequency**: 5 occurrences  

- **`emotion_in_tweet_is_directed_at`**  
  - **Count**: 3,291 entries (many missing values)  
  - **Unique Targets**: 9 (e.g., iPad, iPhone, Google)  
  - **Most Common Target**: `"iPad"`  
  - **Frequency**: 946 times  

- **`is_there_an_emotion_directed_at_a_brand_or_product`**  
  - **Count**: 9,093 entries  
  - **Unique Sentiments**: 4  
  - **Most Common**: `"No emotion toward brand or product"`  
  - **Frequency**: 5,389 entries



## ***2. DATA PREPROCESSING***