<a href="https://www.kaggle.com/code/nick08makwana/store-reviews-sentiment-analysis?scriptVersionId=140441041" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

![Sentiment Analysis](https://i.ytimg.com/vi/99ndyGYodSk/maxresdefault.jpg)

*Embark on your journey into the world of data analysis with this beginner-friendly Kaggle Notebook, where we delve into sentiment analysis of McD Store reviews. Designed for newcomers to the field, this notebook provides step-by-step guidance on how to process and analyze customer feedback, extracting valuable insights to understand the sentiment behind each review.*

*Learn how to pre-process text data, tokenize words, and employ powerful natural language processing techniques to classify reviews as positive, negative, or neutral. Follow along as we guide you through essential Python libraries like NLTK and Scikit-learn, demonstrating how to build and evaluate sentiment analysis models, even if you're new to coding.*

*By the end of this notebook, you'll have a foundational understanding of sentiment analysis and practical skills to apply to a real-world dataset. Uncover the sentiments driving customer opinions about McD Store, and gain confidence in your ability to extract meaningful insights from text data. Start your data science journey today with this hands-on exploration of sentiment analysis!**

**Sentiment Analysis**

*In the sentiment analysis segment, I kicked off by importing the dataset and delving into its columns, unveiling its initial rows, and extracting random samples for a clearer grasp. My next move was to leverage the SentimentIntensityAnalyzer, a potent tool renowned in sentiment analysis. Armed with this, I quantified sentiment scores for each text entry, akin to emotional fingerprints.*

*Harnessing these sentiment scores, I deftly categorized text entries into distinct sentiment domains. Positive sentiment scores became badges of honor for uplifting text, while negatives draped the text in a cloak of somberness. This elegant categorization lent an artful dimension to the dataset.*

# **Importing Libraries**

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings         #For Ignoring the Warnings
warnings.filterwarnings("ignore")



In [2]:
#Used for Sentiment Analysis
import pandas as pd
import nltk
from nltk.sentiment import SentimentIntensityAnalyzer

In [3]:
#Used for Model Building
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report

# **Importing the Dataset**

In [4]:
mcd = pd.read_csv("/kaggle/input/mcdonalds-store-reviews/McDonald_s_Reviews.csv", encoding="latin-1")

**Exploring the Dataset**

In [5]:
mcd.head(5)

Unnamed: 0,reviewer_id,store_name,category,store_address,latitude,longitude,rating_count,review_time,review,rating
0,1,McDonald's,Fast food restaurant,"13749 US-183 Hwy, Austin, TX 78750, United States",30.460718,-97.792874,1240,3 months ago,Why does it look like someone spit on my food?...,1 star
1,2,McDonald's,Fast food restaurant,"13749 US-183 Hwy, Austin, TX 78750, United States",30.460718,-97.792874,1240,5 days ago,It'd McDonalds. It is what it is as far as the...,4 stars
2,3,McDonald's,Fast food restaurant,"13749 US-183 Hwy, Austin, TX 78750, United States",30.460718,-97.792874,1240,5 days ago,Made a mobile order got to the speaker and che...,1 star
3,4,McDonald's,Fast food restaurant,"13749 US-183 Hwy, Austin, TX 78750, United States",30.460718,-97.792874,1240,a month ago,My mc. Crispy chicken sandwich was ï¿½ï¿½ï¿½ï¿...,5 stars
4,5,McDonald's,Fast food restaurant,"13749 US-183 Hwy, Austin, TX 78750, United States",30.460718,-97.792874,1240,2 months ago,"I repeat my order 3 times in the drive thru, a...",1 star


In [6]:
mcd.columns

Index(['reviewer_id', 'store_name', 'category', 'store_address', 'latitude ',
       'longitude', 'rating_count', 'review_time', 'review', 'rating'],
      dtype='object')

In [7]:
mcd.sample(5)

Unnamed: 0,reviewer_id,store_name,category,store_address,latitude,longitude,rating_count,review_time,review,rating
20874,20875,McDonald's,Fast food restaurant,"2400 Alliance Gateway Fwy, Fort Worth, TX 7617...",32.958041,-97.307652,957,a year ago,Excellent,5 stars
4021,4022,McDonald's,Fast food restaurant,"429 7th Ave, New York, NY 10001, United States",40.750506,-73.990583,2052,3 years ago,Good,4 stars
20066,20067,McDonald's,Fast food restaurant,"621 Broadway, Newark, NJ 07104, United States",40.77191,-74.161475,1564,4 years ago,Terrible,1 star
26887,26888,McDonald's,Fast food restaurant,"10901 Riverside Dr, North Hollywood, CA 91602,...",34.152507,-118.367904,1794,2 years ago,We got mcdonald's then we drove to San Diego a...,5 stars
24063,24064,McDonald's,Fast food restaurant,"1415 E State Rd, Fern Park, FL 32730, United S...",28.65535,-81.342692,1617,2 years ago,Great customer service!!,4 stars


In [8]:
mcd.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33396 entries, 0 to 33395
Data columns (total 10 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   reviewer_id    33396 non-null  int64  
 1   store_name     33396 non-null  object 
 2   category       33396 non-null  object 
 3   store_address  33396 non-null  object 
 4   latitude       32736 non-null  float64
 5   longitude      32736 non-null  float64
 6   rating_count   33396 non-null  object 
 7   review_time    33396 non-null  object 
 8   review         33396 non-null  object 
 9   rating         33396 non-null  object 
dtypes: float64(2), int64(1), object(7)
memory usage: 2.5+ MB


In [9]:
import nltk     #Imported the Natural Language Toolkit (NLTK) library.
nltk.download('vader_lexicon')   #Downloaded the VADER lexicon for sentiment analysis.
from nltk.sentiment.vader import SentimentIntensityAnalyzer   #Imported the SentimentIntensityAnalyzer class from the NLTK library's VADER sentiment analysis module.

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /usr/share/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


**I employed the widely-used SentimentIntensityAnalyzer to compute sentiment scores for every review. This tool gauges positivity, negativity, and neutrality within the text, unveiling its underlying sentiment.**

In [10]:
sia = SentimentIntensityAnalyzer()

In [11]:
# Performing sentiment analysis on each review
sentiments = []
for review in mcd['review']:
    sentiment = sia.polarity_scores(review)
    sentiments.append(sentiment)

**Sentiment Classification**

*Using the compound score, I sorted reviews into sentiment groups: above a set threshold (e.g., 0.5) as positive, below another (e.g., -0.5) as negative, and those in-between as neutral.*

In [12]:
sentiment_labels = []

for sentiment in sentiments:
    compound_score = sentiment['compound']
    if compound_score >= 0.05:
        sentiment_labels.append('Positive')
    elif compound_score <= -0.05:
        sentiment_labels.append('Negative')
    else:
        sentiment_labels.append('Neutral')

In [13]:
# Add the sentiment labels to the DataFrame
mcd['sentiment'] = sentiment_labels
mcd[['review', 'sentiment']]

Unnamed: 0,review,sentiment
0,Why does it look like someone spit on my food?...,Positive
1,It'd McDonalds. It is what it is as far as the...,Positive
2,Made a mobile order got to the speaker and che...,Negative
3,My mc. Crispy chicken sandwich was ï¿½ï¿½ï¿½ï¿...,Neutral
4,"I repeat my order 3 times in the drive thru, a...",Negative
...,...,...
33391,They treated me very badly.,Negative
33392,The service is very good,Positive
33393,To remove hunger is enough,Negative
33394,"It's good, but lately it has become very expen...",Positive


# **Splitted the Dataset**

In [14]:
X = mcd['review']
y = mcd['sentiment']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

**Vectorization**

In [15]:
vectorizer = TfidfVectorizer()
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)

# **Training the Model Using Support Vector Classifier (SVC)**

*I employed the Support Vector Classifier (SVC) algorithm, a robust machine learning technique widely employed for classification. Through training on labeled data, the model gained the ability to forecast review sentiment using their distinctive attributes.*

In [16]:
model = SVC()
model.fit(X_train_tfidf, y_train)

In [17]:
y_pred = model.predict(X_test_tfidf)

In [18]:
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
print("Classification Report:")
print(classification_report(y_test, y_pred))

Accuracy: 0.88937125748503
Classification Report:
              precision    recall  f1-score   support

    Negative       0.82      0.87      0.85      1922
     Neutral       0.88      0.83      0.86      1245
    Positive       0.93      0.92      0.93      3513

    accuracy                           0.89      6680
   macro avg       0.88      0.87      0.88      6680
weighted avg       0.89      0.89      0.89      6680



# **Sentiment Prediction Function**

*With user-friendliness in mind, I crafted a function that accepts a review as input and employs the trained SVC model to forecast its sentiment. By analyzing the review's unique attributes, the function assigns it to the positive, negative, or neutral sentiment category, yielding the sentiment prediction as its outcome.*

In [19]:
def predict_sentiment(review):
    review_tfidf = vectorizer.transform([review])
    sentiment = model.predict(review_tfidf)
    return sentiment[0]

# **Testing**

**Sample Testing 1**

In [20]:
new_review = "This restaurant has excellent service and delicious food."
predicted_sentiment = predict_sentiment(new_review)
print("Predicted sentiment:", predicted_sentiment)

Predicted sentiment: Positive


**Sample Testing 2**

In [21]:
new_review2 = "This restaurant sucks."
predicted_sentiment = predict_sentiment(new_review2)
print("Predicted sentiment:", predicted_sentiment)

Predicted sentiment: Negative


**Sample Testing 3**

In [22]:
new_review3 = "This is dull"
predicted_sentiment = predict_sentiment(new_review3)
print("Predicted sentiment:", predicted_sentiment)

Predicted sentiment: Neutral


**I extend my gratitude for your assistance! Your votes and recommendations hold immense value as we collaboratively enhance this endeavor.**