# FOOD SERVICE REVIEW SENTIMENT ANALYSIS

### Hello! Thanks for viewing my project.

#### Today we'll be working on a food service review sentiment analysis model.

#### This model will help us:

- Classify the reviews on an online food service app as good, bad, or neutral
- Help identify the major weaknesses and strengths based on the customer reviews

### Let's get started if you're ready!

##### For this project, I'd be using the Textblob library for the classification model.

It seems like a good choice because this project is not very complex, and therefore doesn't require complex algorithms.

In [3]:
!pip install -U textblob
!python -m textblob.download_corpora

[nltk_data] Downloading package brown to /root/nltk_data...
[nltk_data]   Unzipping corpora/brown.zip.
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
[nltk_data] Downloading package conll2000 to /root/nltk_data...
[nltk_data]   Unzipping corpora/conll2000.zip.
[nltk_data] Downloading package movie_reviews to /root/nltk_data...
[nltk_data]   Unzipping corpora/movie_reviews.zip.
Finished.


##### Now, we'll import the reuired libraries into our code this way:

> I imported the csv library because we'd be working with csv files for this project and we will require some csv functions as we go

In [6]:
import csv
from textblob import TextBlob
from textblob.classifiers import NaiveBayesClassifier

### My data wasn't perfect so I ran into some errors
#### The code below shows how I handled the errors but still, permit me to talk about it a bit here:

- So the code below normally should be as simple as  ABC. But the errors I encountered made it a bit of a hassle.

- The csv file 'foodreviewtrain.csv', as the name says, contains food reviews and the tags that classify them as either good, bad, or neutral.

- But some of the text contained comma and since the whole concept of CSV revolves around the use of comma to separate data. I got an error.

- So to read the CSV file, I had to open the file and change all the commas to semi colon and then indicate right here in our code that the delimiter is semi colon.

- Next, I appended the trainin data list with the content of the csv file using a for loop and I used a try-except block to handle any errors that may occur during the process.

In [13]:
train_data = []
with open("foodreviewtrain.csv", "r", encoding="latin-1") as file:
    reader = csv.reader(file, delimiter=";")
    next(reader) # this line skips the header row
    for row in reader:
        try:
            feedback = row[0]
            tag = row[1]
            train_data.append((feedback, tag))
        except IndexError:
            print("Invalid row")

Invalid row: []
Invalid row: []
Invalid row: []
Invalid row: []


### Glee! We've been able to preprocess our data

Now, we train the NaiveBayesClassifier with the preprocessed data

In [14]:
cl = NaiveBayesClassifier(train_data)

### Cool! We've trained our classifier.
#### Now, let's load our testing data the exact way we did for the training data (well, not entirely the same but you know, almost the same to avoid errors and headache)

In [15]:
test_data = []
with open('foodreviewtest.csv', 'r', encoding='latin-1') as file:
    read = csv.reader(file, delimiter=';') # we used read instead of reader
    next(read)  # Just like before, this line skips the header row
    for row in read:
        try:
            sentence = row[0]
            tag = row[1]
            test_data.append((sentence, tag))
        except IndexError:
            print("Invalid row")

### Yay! We did it!
#### Now let's see if our model is a good one.

> We'll calculate the accuracy score as done below:

In [16]:
# Calculate accuracy
accuracy = cl.accuracy(test_data)
print(f"Accuracy: {accuracy}")

Accuracy: 1.0


### Our model seems like a good one! Now we're ready to experiment with real world data

### We're going to test our model with the reviews on a food service app on playstore named 'Eden'

Now, I could just copy and paste a bunch of reviews from the playstore site but I decided to get my hands on beautifulsoup and requests libraries to do the job for me.

> P.S. By the job, I mean web scraping!

In [17]:
import requests
from bs4 import BeautifulSoup

url = "https://play.google.com/store/apps/details?id=com.ouredenlife.app&hl=en&gl=US&pli=1"

response = requests.get(url) # This line sends a GET request to the URL

# The line below creates a BeautifulSoup object to parse the HTML content
soup = BeautifulSoup(response.content, "html.parser")

# Prior to now, I had inspected the website
# i found that the review texts were in div tags with the class 'h3YV2d'
# They were all identical so I used the find_all function to 'find them all'
review_divs = soup.find_all("div", class_="h3YV2d")


#### Now, with a for loop, we'll extract all the text into a an empty list I created below 'review_list'

In [18]:
review_list = []

for div in review_divs:
    review = div.get_text(strip=True)
    review_list.append(review)

review_list

['Clean app. I just found it. But you guys lack a lot of food menu, and your cleaning service is so expensive. Maybe in the nearest future when more things have been added I will come back. Keep me up.',
 "The app is clean and simple to use. However, I've been getting daily notifications for meals I did not order. Sometimes I get notifications in the middle of the night when I should be sleeping. Please sort out your notification system.",
 'The app is perpetually problematic. It requires updates, and despite this, it is slow, erratic, and unreliable. Every month, like clockwork, it fails to deduct the monthly charges. Despite regular complaints, nothing is done.']

### Perfectos!
#### Now, let's see if our model can classify each of the reviews above as good, bad, or neutral effectively

In [21]:
classified_reviews = []

# Classify each review in the list
for review in review_list:
    classification = cl.classify(review)
    classified_reviews.append((review, classification))

classified_reviews

[('Clean app. I just found it. But you guys lack a lot of food menu, and your cleaning service is so expensive. Maybe in the nearest future when more things have been added I will come back. Keep me up.',
  'good'),
 ("The app is clean and simple to use. However, I've been getting daily notifications for meals I did not order. Sometimes I get notifications in the middle of the night when I should be sleeping. Please sort out your notification system.",
  'bad'),
 ('The app is perpetually problematic. It requires updates, and despite this, it is slow, erratic, and unreliable. Every month, like clockwork, it fails to deduct the monthly charges. Despite regular complaints, nothing is done.',
  'good')]

In [23]:
cl.accuracy(classified_reviews)

1.0

### Major words associated with the reviews that determine if they're good, bad, or neutral

In [28]:
cl.show_informative_features(40)

Most Informative Features
            contains(my) = True              bad : neutra =     50.4 : 1.0
           contains(had) = True             good : neutra =     39.5 : 1.0
  contains(expectations) = True             good : neutra =     30.7 : 1.0
          contains(than) = True           neutra : good   =     29.9 : 1.0
      contains(expected) = True           neutra : good   =     28.0 : 1.0
       contains(arrived) = True              bad : neutra =     24.8 : 1.0
           contains(and) = False          neutra : bad    =     24.5 : 1.0
            contains(no) = True           neutra : bad    =     22.1 : 1.0
        contains(person) = True           neutra : good   =     22.0 : 1.0
          contains(both) = True           neutra : bad    =     21.4 : 1.0
    contains(absolutely) = True             good : bad    =     20.3 : 1.0
      contains(friendly) = True           neutra : good   =     19.4 : 1.0
   contains(exceptional) = True             good : neutra =     18.0 : 1.0