# Sentiment Analysis for Foody&Fooodie 

## Business Understanding

### Overview
Foody&Foodie is a family-run restaurant looking to expand from Nairobi, Kenya to the international market, Specifically San Francisco, CA. As a veteran in the food business they understand that they will need to modify their concept of authentic Kenyan food to fit in with the palette of the Californian city and give them a chance to hit the ground running as they open shop in the next quarter.

### Problem Statement
Foody&Foodie have to find reliable information to guide their decision-making while adjusting their business to match their new markets needs. We have been tasked with identifying and filtering out information from their new market that would produce a clear picture of what their new market expects overall in a restaurant

### Challenges
Within the Food industry, they're several measurable parameters to that determine the success of a restaurant and we must find a neutral data set where all these aspects can be fairly evaluated for an accurate result.

### Conclusion
We will need to create a model to analyze customer sentiments through reviews on restaurants within the target area using a single popular review site, Yelp, to pull the relevant data as it is the most comprehensive compilation of reviews in the target market.


### Objectives
- Analyze Yelp Review Data: Examine data from Yelp reviews to understand diners’ critical aspects when evaluating restaurants.
- Identify Recurring Opinions and Sentiments: Isolate and identify common opinions and sentiments expressed in the reviews.
- Interpret Insights: Interpret the data to provide a clear understanding of both negative and positive feedback.
- Understand Target Market Preferences: Help the client understand the preferences and dislikes of diners in the target market.
- Tailor Services to Positive Feedback: Focus on aligning the restaurant’s services with the positive feedback received.
- Avoid Disliked Traits: Identify and avoid traits that are commonly criticized by diners in the target market.














## Data Understanding
Our dataset is a compilation of writen reviews, ratings, review IDs, review date and business IDs from the Yelp website.


In [2]:
# importing necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


In [3]:
# Load the dataset
df = pd.read_csv("yelp.csv")

In [4]:
 df.head()
    

Unnamed: 0,business_id,date,review_id,stars,text,type,user_id,cool,useful,funny
0,9yKzy9PApeiPPOUJEtnvkg,2011-01-26,fWKvX83p0-ka4JS3dc6E5A,5,My wife took me here on my birthday for breakf...,review,rLtl8ZkDX5vH5nAx9C3q5Q,2,5,0
1,ZRJwVLyzEJq1VAihDhYiow,2011-07-27,IjZ33sJrzXqU-0X6U8NwyA,5,I have no idea why some people give bad review...,review,0a2KyEL0d3Yb1V6aivbIuQ,0,0,0
2,6oRAC4uyJCsJl1X0WZpVSA,2012-06-14,IESLBzqUCLdSzSqm0eCSxQ,4,love the gyro plate. Rice is so good and I als...,review,0hT2KtfLiobPvh6cDC8JQg,0,1,0
3,_1QQZuf4zZOyFCvXc0o6Vg,2010-05-27,G-WvGaISbqqaMHlNnByodA,5,"Rosie, Dakota, and I LOVE Chaparral Dog Park!!...",review,uZetl9T0NcROGOyFfughhg,1,2,0
4,6ozycU1RpktNG2-1BroVtw,2012-01-05,1uJFq2r5QfJG_6ExMRCaGw,5,General Manager Scott Petello is a good egg!!!...,review,vYmM4KTsC8ZfQBg-j5MWkw,0,0,0


In [5]:
df.describe()

Unnamed: 0,stars,cool,useful,funny
count,10000.0,10000.0,10000.0,10000.0
mean,3.7775,0.8768,1.4093,0.7013
std,1.214636,2.067861,2.336647,1.907942
min,1.0,0.0,0.0,0.0
25%,3.0,0.0,0.0,0.0
50%,4.0,0.0,1.0,0.0
75%,5.0,1.0,2.0,1.0
max,5.0,77.0,76.0,57.0


splitting the data

In [8]:
from sklearn.model_selection import train_test_split

# Define your features and target variable
X = df.drop('stars', axis=1)  # Features (all columns except 'stars')
y = df['stars']               # Target variable

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Check the shape of the resulting sets
print(X_train.shape, X_test.shape)
print(y_train.shape, y_test.shape)

(8000, 9) (2000, 9)
(8000,) (2000,)


processing the data 

In [10]:
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

# Download stopwords if you haven't already
nltk.download('stopwords')
nltk.download('punkt')

# Load the stopwords
stop_words = set(stopwords.words('english'))

def remove_stopwords(text):
    # Tokenize the text
    words = word_tokenize(text)
    # Remove stopwords
    filtered_words = [word for word in words if word.lower() not in stop_words]
    # Join the words back into a single string
    return ' '.join(filtered_words)


[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\User\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\User\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
