### Problem Statement

**The goal of this project is to predict the sentiment expressed in the reviews and gain valuable insights into customer perceptions. With a vast dataset obtained from Kaggle, encompassing a diverse range of customer feedback, I delve into the world of sentiment analysis. By dividing the dataset into training and test sets, I ensure the accurate evaluation of my sentiment analysis model's performance. Employing the SVC algorithm, I train the model on the labeled training data, enabling it to learn and make sentiment predictions based on the review features. Through vectorization techniques, I convert the textual data into numerical representations suitable for the SVC algorithm, allowing for effective sentiment analysis. To enhance usability, I develop a function that takes user-input reviews and provides real-time sentiment predictions using the trained SVC model. This functionality empowers businesses, including McDonald's, to gain valuable insights into customer sentiments and make data-driven decisions to enhance customer satisfaction. By merging sentiment analysis, machine learning, and a vast dataset of McDonald's store reviews, this notebook aims to provide a comprehensive understanding of customer perceptions. It equips businesses with the tools to analyze sentiments effectively and improve their offerings based on customer feedback.**

### Sentiment Analysis:
**Based on sentiment scores, I classified the text into different sentiment categories. For example, if the sentiment score was positive, I flagged the text as positive sentiment. Similarly, for negative sentiment scores, I labeled the text as negative sentiment. This process allowed me to categorize the dataset based on sentiment and gain insights into the overall sentiment distribution.**

#### short Description of each column
* reviewer_id: Unique identifier for each reviewer (anonymized)
* store_name: Name of the McDonald's store
* category: Category or type of the store
* store_address: Address of the store
* latitude: Latitude coordinate of the store's location
* longitude: Longitude coordinate of the store's location
* rating_count: Number of ratings/reviews for the store
* review_time: Timestamp of the review
* review: Textual content of the review
* rating: Rating provided by the reviewer

In [1]:
### Problem Statement

#### The goal of this project is to predict the sentiment expressed in the reviews and gain valuable insights into customer perceptions. With a vast dataset obtained from Kaggle, encompassing a diverse range of customer feedback, I delve into the world of sentiment analysis. By dividing the dataset into training and test sets, I ensure the accurate evaluation of my sentiment analysis model's performance. Employing the SVC algorithm, I train the model on the labeled training data, enabling it to learn and make sentiment predictions based on the review features. Through vectorization techniques, I convert the textual data into numerical representations suitable for the SVC algorithm, allowing for effective sentiment analysis. To enhance usability, I develop a function that takes user-input reviews and provides real-time sentiment predictions using the trained SVC model. This functionality empowers businesses, including McDonald's, to gain valuable insights into customer sentiments and make data-driven decisions to enhance customer satisfaction. By merging sentiment analysis, machine learning, and a vast dataset of McDonald's store reviews, this notebook aims to provide a comprehensive understanding of customer perceptions. It equips businesses with the tools to analyze sentiments effectively and improve their offerings based on customer feedback.**

### Sentiment Analysis:
### Based on sentiment scores, I classified the text into different sentiment categories. For example, if the sentiment score was positive, I flagged the text as positive sentiment. Similarly, for negative sentiment scores, I labeled the text as negative sentiment. This process allowed me to categorize the dataset based on sentiment and gain insights into the overall sentiment distribution.**

#### short Description of each column
#### reviewer_id: Unique identifier for each reviewer (anonymized)
#### store_name: Name of the McDonald's store
#### category: Category or type of the store
#### store_address: Address of the store
#### latitude: Latitude coordinate of the store's location
#### longitude: Longitude coordinate of the store's location
#### rating_count: Number of ratings/reviews for the store
#### review_time: Timestamp of the review
#### review: Textual content of the review
#### rating: Rating provided by the reviewer

In [2]:
import nltk
nltk.download('vader_lexicon')

[nltk_data] Downloading package vader_lexicon to /root/nltk_data...


True

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import nltk
from nltk.sentiment import SentimentIntensityAnalyzer
import warnings
warnings.filterwarnings("ignore")

In [4]:
data = pd.read_csv("MCD.csv" , encoding="latin-1")
data.head(10)

Unnamed: 0,reviewer_id,store_name,category,store_address,latitude,longitude,rating_count,review_time,review,rating
0,1,McDonald's,Fast food restaurant,"13749 US-183 Hwy, Austin, TX 78750, United States",30.460718,-97.792874,1240,3 months ago,Why does it look like someone spit on my food?...,1 star
1,2,McDonald's,Fast food restaurant,"13749 US-183 Hwy, Austin, TX 78750, United States",30.460718,-97.792874,1240,5 days ago,It'd McDonalds. It is what it is as far as the...,4 stars
2,3,McDonald's,Fast food restaurant,"13749 US-183 Hwy, Austin, TX 78750, United States",30.460718,-97.792874,1240,5 days ago,Made a mobile order got to the speaker and che...,1 star
3,4,McDonald's,Fast food restaurant,"13749 US-183 Hwy, Austin, TX 78750, United States",30.460718,-97.792874,1240,a month ago,My mc. Crispy chicken sandwich was ï¿½ï¿½ï¿½ï¿...,5 stars
4,5,McDonald's,Fast food restaurant,"13749 US-183 Hwy, Austin, TX 78750, United States",30.460718,-97.792874,1240,2 months ago,"I repeat my order 3 times in the drive thru, a...",1 star
5,6,McDonald's,Fast food restaurant,"13749 US-183 Hwy, Austin, TX 78750, United States",30.460718,-97.792874,1240,3 weeks ago,I work for door dash and they locked us all ou...,1 star
6,7,McDonald's,Fast food restaurant,"13749 US-183 Hwy, Austin, TX 78750, United States",30.460718,-97.792874,1240,3 months ago,If I could give this location a zero on custo...,1 star
7,8,McDonald's,Fast food restaurant,"13749 US-183 Hwy, Austin, TX 78750, United States",30.460718,-97.792874,1240,a year ago,Came in and ordered a Large coffee w/no ice. T...,1 star
8,9,McDonald's,Fast food restaurant,"13749 US-183 Hwy, Austin, TX 78750, United States",30.460718,-97.792874,1240,3 months ago,Went thru drive thru. Ordered. Getting home no...,1 star
9,10,McDonald's,Fast food restaurant,"13749 US-183 Hwy, Austin, TX 78750, United States",30.460718,-97.792874,1240,3 months ago,"I'm not really a huge fan of fast food, but I ...",4 stars


In [5]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33396 entries, 0 to 33395
Data columns (total 10 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   reviewer_id    33396 non-null  int64  
 1   store_name     33396 non-null  object 
 2   category       33396 non-null  object 
 3   store_address  33396 non-null  object 
 4   latitude       32736 non-null  float64
 5   longitude      32736 non-null  float64
 6   rating_count   33396 non-null  object 
 7   review_time    33396 non-null  object 
 8   review         33396 non-null  object 
 9   rating         33396 non-null  object 
dtypes: float64(2), int64(1), object(7)
memory usage: 2.5+ MB


In [6]:
data.shape

(33396, 10)

In [7]:
data.nunique()

reviewer_id      33396
store_name           2
category             1
store_address       40
latitude            39
longitude           39
rating_count        51
review_time         39
review           22285
rating               5
dtype: int64

In [8]:
for col in data.columns:
    print(col,data[col].unique())

reviewer_id [    1     2     3 ... 33394 33395 33396]
store_name ["McDonald's" "ýýýMcDonald's"]
category ['Fast food restaurant']
store_address ['13749 US-183 Hwy, Austin, TX 78750, United States'
 '1698 US-209, Brodheadsville, PA 18322, United States'
 '72-69 Kissena Blvd, Queens, NY 11367, United States'
 '429 7th Ave, New York, NY 10001, United States'
 '724 Broadway, New York, NY 10003, United States'
 '160 Broadway, New York, NY 10038, United States'
 '555 13th St NW, Washington, DC 20004, United States'
 '10451 Santa Monica Blvd, Los Angeles, CA 90025, United States'
 '114 Delancey St, New York, NY 10002, United States'
 '5920 Balboa Ave, San Diego, CA 92111, United States'
 '262 Canal St, New York, NY 10013, United States'
 '490 8th Ave, New York, NY 10001, United States'
 '550 Lawrence Expy, Sunnyvale, CA 94086, United States'
 '11382 US-441, Orlando, FL 32837, United States'
 '210 5th S, Salt Lake City, UT 84106, United States'
 '1916 M St NW, Washington, DC 20036, United Stat

In [9]:
#### Sentiment Score Calculation:
#### I utilized the SentimentIntensityAnalyzer, a popular sentiment analysis tool, to calculate sentiment scores for each review. This tool assigns a sentiment score based on the text's positivity, negativity, and neutrality.

In [10]:
sia = SentimentIntensityAnalyzer()
# Performing sentiment analysis on each review
sentiments = []
for review in data['review']:
    sentiment = sia.polarity_scores(review)
    sentiments.append(sentiment)

#### Sentiment Classification:
Based on the compound score, I classified the reviews into different sentiment categories. For instance, if the compound score was above a certain threshold (e.g., 0.5), I labeled the review as positive. Conversely, if the compound score was below another threshold (e.g., -0.5), I labeled it as negative. Reviews with compound scores within the intermediate range were considered neutral.

In [11]:
sentiment_labels = []
for sentiment in sentiments:
    compound_score = sentiment['compound']
    if compound_score >= 0.05:
        sentiment_labels.append('Positive')
    elif compound_score <= -0.05:
        sentiment_labels.append('Negative')
    else:
        sentiment_labels.append('Neutral')

In [14]:
# Add the sentiment labels to the DataFrame
data['sentiment'] = sentiment_labels

In [15]:
data[['review', 'sentiment']]

Unnamed: 0,review,sentiment
0,Why does it look like someone spit on my food?...,Positive
1,It'd McDonalds. It is what it is as far as the...,Positive
2,Made a mobile order got to the speaker and che...,Negative
3,My mc. Crispy chicken sandwich was ï¿½ï¿½ï¿½ï¿...,Neutral
4,"I repeat my order 3 times in the drive thru, a...",Negative
...,...,...
33391,They treated me very badly.,Negative
33392,The service is very good,Positive
33393,To remove hunger is enough,Negative
33394,"It's good, but lately it has become very expen...",Positive


In [16]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report

Dataset Splitting:
I divided the dataset into training and test sets to evaluate the performance of my model on unseen data. The training set was used to train the machine learning model, while the test set served as a benchmark for assessing its accuracy.

In [17]:
X = data['review']
y = data['sentiment']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Vectorization:
I applied vectorization techniques to convert the textual data into a numerical representation suitable for machine learning algorithms. This process involved transforming the reviews into a format that captures their features and patterns effectively.

In [18]:
vectorizer = TfidfVectorizer()
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)

Model Training (Support Vector Classifier):
I utilized the Support Vector Classifier (SVC) algorithm to train my sentiment analysis model. SVC is a powerful machine learning algorithm commonly used for classification tasks. By training the model on the labeled training data, it learned to predict the sentiment of reviews based on their features.

In [19]:
model = SVC()
model.fit(X_train_tfidf, y_train)

In [20]:
y_pred = model.predict(X_test_tfidf)

In [21]:
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
print("Classification Report:")
print(classification_report(y_test, y_pred))

Accuracy: 0.8868263473053892
Classification Report:
              precision    recall  f1-score   support

    Negative       0.81      0.88      0.84      1937
     Neutral       0.88      0.82      0.85      1244
    Positive       0.93      0.91      0.92      3499

    accuracy                           0.89      6680
   macro avg       0.88      0.87      0.87      6680
weighted avg       0.89      0.89      0.89      6680



Sentiment Prediction Function:
To enhance usability, I created a function that takes a review as input and predicts its sentiment. This function utilizes the trained SVC model to analyze the input review's features and classify it as positive, negative, or neutral. The function provides the sentiment prediction as the output.

In [22]:
def predict_sentiment(review):
    review_tfidf = vectorizer.transform([review])
    sentiment = model.predict(review_tfidf)
    return sentiment[0]

Sample Testing:
To assess the model's performance, I conducted sample testing.

In [23]:
new_review = "This restaurant has excellent service and delicious food."
predicted_sentiment = predict_sentiment(new_review)
print("Predicted sentiment:", predicted_sentiment)

Predicted sentiment: Positive


In [24]:
new_review2 = "This restaurant staffs are rude."
predicted_sentiment = predict_sentiment(new_review2)
print("Predicted sentiment:", predicted_sentiment)

Predicted sentiment: Negative


In [25]:
new_review3 = "Food are delicious"
predicted_sentiment = predict_sentiment(new_review3)
print("Predicted sentiment:", predicted_sentiment)

Predicted sentiment: Positive


Thank you!!!