## Business Understanding
## 1.1 Business Overview

In today’s highly competitive and rapidly evolving technology market, consumer perception plays a critical role in shaping the success of products and brands. Technology companies such as Apple and Google continuously release new products and services, and public opinion about these offerings spreads quickly through social media platforms like Twitter. Users frequently share feedback, complaints, praise, and comparisons in real time, creating a large volume of unstructured text data that reflects customer sentiment.
Manually analyzing thousands of Tweets to understand public opinion is time-consuming and inefficient. As a result, organizations increasingly rely on data-driven approaches to extract meaningful insights from social media data. Natural Language Processing (NLP) techniques make it possible to automatically analyze and classify textual content, allowing businesses to better understand how consumers feel about their products and services.

The goal of this project is to build a sentiment analysis model capable of classifying Tweets related to Apple and Google products as positive, negative, or neutral. By analyzing over 9,000 human-labeled Tweets provided by CrowdFlower, this project demonstrates how NLP models can be used as a proof of concept to capture and quantify public sentiment expressed on social media.

The primary stakeholders for this model include product managers, marketing teams, brand analysts, and business strategists within technology companies. Such a model can help organizations monitor brand perception, evaluate customer reactions to product launches, identify common sources of dissatisfaction, and track sentiment trends over time. Additionally, researchers and data scientists can use this approach to explore consumer behavior and improve decision-making processes through real-time sentiment monitoring.

This project falls within the domains of business analytics, marketing intelligence, and data science, with a focus on applying NLP techniques to real-world social media data. The motivation behind this work is to demonstrate how sentiment analysis can support strategic business decisions by transforming unstructured text data into actionable insights.

### 2 Problem Statement

Technology companies and brand analysts often rely on manual review, surveys, and expert judgment to assess public opinion about their products and services. While these approaches provide valuable insights, they are time-consuming, resource-intensive, and difficult to scale in environments where consumer feedback is generated continuously and at high volume, such as on social media platforms. As a result, important trends in customer sentiment—both positive and negative—may be overlooked or identified too late to inform timely business decisions.

Consumers frequently express their opinions about Apple and Google products on Twitter using informal language, abbreviations, emojis, and sarcasm, making it challenging to interpret sentiment accurately through traditional analytical methods. The unstructured nature of social media text further complicates consistent and objective sentiment assessment, especially when thousands of Tweets must be analyzed simultaneously.

This project proposes the development of an NLP-based sentiment classification model that analyzes Tweet content and categorizes sentiment as positive, negative, or neutral. The model is designed to complement, rather than replace, human analysis by providing an automated, scalable tool for understanding public perception. Business stakeholders such as marketing teams, product managers, and brand analysts can use this model to support decision-making, monitor brand reputation, and identify emerging issues or opportunities in near real time.

The model is intended as a decision-support tool and should be used alongside domain expertise and contextual knowledge. While it enables efficient large-scale sentiment analysis, human interpretation remains essential to account for nuances such as sarcasm and evolving language trends. By combining automated NLP techniques with expert oversight, this project aims to improve the efficiency, consistency, and accessibility of sentiment analysis in a business context.



### 2. Data Understanding

This project uses a Twitter sentiment dataset sourced from CrowdFlower and made available through data.world. The dataset contains over 9,000 Tweets related to Apple and Google products, with each Tweet manually labeled by human annotators as positive, negative, or neutral. These labeled Tweets form the basis for building and evaluating a sentiment classification model using Natural Language Processing (NLP) techniques.

### 3. Importation of required libraries 

In [7]:

# Data loading and manipulation
import pandas as pd
import numpy as np

# Text preprocessing and NLP
import re
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer

# Download required NLTK resources
nltk.download('punkt', quiet=True)
nltk.download('stopwords', quiet=True)
nltk.download('wordnet', quiet=True)

# Data visualization
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

# Machine learning and preprocessing
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import LabelEncoder
from imblearn.pipeline import Pipeline

# Classification algorithms
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import MultinomialNB
from sklearn.svm import LinearSVC

# Model evaluation metrics
from sklearn.metrics import (
    accuracy_score,
    precision_score,
    recall_score,
    f1_score,
    confusion_matrix
)

# Suppress warnings
import warnings
warnings.filterwarnings('ignore')

# Visualization settings
sns.set(style="whitegrid")

In [19]:
import os
os.getcwd()


'c:\\Users\\ADMIN\\Documents\\data-science\\phase.4\\twitter_brand_sentiment_nlp\\Notebooks'

In [36]:
# Read the dataset
data = pd.read_csv(
    "../data/judge-1377884607_tweet_product_company.csv",
    encoding="latin1")

data.head()


Unnamed: 0,tweet_text,emotion_in_tweet_is_directed_at,is_there_an_emotion_directed_at_a_brand_or_product
0,.@wesley83 I have a 3G iPhone. After 3 hrs twe...,iPhone,Negative emotion
1,@jessedee Know about @fludapp ? Awesome iPad/i...,iPad or iPhone App,Positive emotion
2,@swonderlin Can not wait for #iPad 2 also. The...,iPad,Positive emotion
3,@sxsw I hope this year's festival isn't as cra...,iPad or iPhone App,Negative emotion
4,@sxtxstate great stuff on Fri #SXSW: Marissa M...,Google,Positive emotion


In [37]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9093 entries, 0 to 9092
Data columns (total 3 columns):
 #   Column                                              Non-Null Count  Dtype 
---  ------                                              --------------  ----- 
 0   tweet_text                                          9092 non-null   object
 1   emotion_in_tweet_is_directed_at                     3291 non-null   object
 2   is_there_an_emotion_directed_at_a_brand_or_product  9093 non-null   object
dtypes: object(3)
memory usage: 213.2+ KB
