# Yelp Sentiment Analysis with Python

## Part 1. Collecting Reviews

In [1]:
# Import Requests
import requests

# Import Beautiful Soup
from bs4 import BeautifulSoup

In [2]:
# Execute request
# If you’re using a different site just replace the url e.g. r=requests.get(‘put your url in here’)
r = requests.get('https://www.yelp.com/biz/tesla-san-francisco?osq=Tesla+Dealership')

In [3]:
# Check request status
print(r.status_code)

200


In [4]:
# Check result
r.text



In [5]:
# Make the soup
soup = BeautifulSoup(r.text, 'html.parser')

In [10]:
# First get all of the review-content divs
results = soup.findAll(class_='lemon--span__373c0__3997G raw__373c0__3rKqk', attrs={'lang':'en'})

In [11]:
# Loop through review-content divs and extract paragraph text
reviews = [result.text for result in results]

In [12]:
reviews

["Today was delivery day and we were pretty excited to collect our car. When we arrived we proceeded upstairs where the magic happens. Jessica H. and Alex were both helping other customers and we were third in line. Even though we waited less than 15 minutes we were told they'd be with us promptly and they thanked us for our patience and for waiting. They made us feel valued and the time passed quickly. When it was our turn, Jessica explained the process, walked us through the documents she'd prepared in advance of our arrival (and even colour coded to make everything both easy and efficient), and explained to us what to expect. The place was spotless. We saw the staff wearing masks, cleaning surfaces and office supplies after every use and they even had separate cups for clean versus dirty pens. These folks have this down to a science. Our car was ready for us and Alex gave us a mini-tutorial, asked if we needed help and was prepared to answer questions even though we had none. He add

## Part 2. Analysing the Reviews

In [13]:
# Import pandas
import pandas as pd

#Import numpy
import numpy as np

In [14]:
# Create a pandas dataframe from array
df = pd.DataFrame(np.array(reviews), columns=['review'])

In [15]:
# Calculate word count
df['word_count'] = df['review'].apply(lambda x: len(str(x).split(" ")))

In [16]:
# Calculate character count
df['char_count'] = df['review'].str.len()

In [18]:
def avg_word(review):
    words = review.split()
    return (sum(len(word) for word in words) / len(words))

# Calculate average words
df['avg_word'] = df['review'].apply(lambda x: avg_word(x))

In [21]:
# Import stopwords
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords

[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/nicholasrenotte/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


In [22]:
stop_words = stopwords.words('english')
df['stopword_coun'] = df['review'].apply(lambda x: len([x for x in x.split() if x in stop_words]))

In [23]:
df.head()

Unnamed: 0,review,word_count,char_count,avg_word,stopword_coun
0,Today was delivery day and we were pretty exci...,306,1730,4.656863,133
1,Don't take vehicle delivery at this SF service...,201,1122,4.587065,83
2,"Unfortunately, as a recent new Tesla owner, I ...",184,1079,4.86413,74
3,I had a bad experience. Technician names Adam ...,57,320,4.631579,26
4,Adding to the bad reviews of this location...I...,123,692,4.544715,46


## Part 3. Cleaning the Dataset

In [24]:
# Lower case all words
df['review_lower'] = df['review'].apply(lambda x: " ".join(x.lower() for x in x.split()))

In [25]:
# Remove Punctuation
df['review_nopunc'] = df['review_lower'].str.replace('[^\w\s]', '')

In [26]:
# Remove Stopwords
df['review_nopunc_nostop'] = df['review_nopunc'].apply(lambda x: " ".join(x for x in x.split() if x not in stop_words))

In [27]:
# Return frequency of values
freq= pd.Series(" ".join(df['review_nopunc_nostop']).split()).value_counts()[:30]

In [28]:
freq.head()

car        52
service    29
tesla      22
get        17
one        14
dtype: int64

In [29]:
other_stopwords = ['get', 'us', 'see', 'use', 'said', 'asked', 'day', 'go' \
  'even', 'ive', 'right', 'left', 'always', 'would', 'told', \
  'get', 'us', 'would', 'get', 'one', 'ive', 'go', 'even', \
  'also', 'ever', 'x', 'take', 'let' ]

In [30]:
df['review_nopunc_nostop_nocommon'] = df['review_nopunc_nostop'].apply(lambda x: "".join(" ".join(x for x in x.split() if x not in other_stopwords)))

In [31]:
df.head()

Unnamed: 0,review,word_count,char_count,avg_word,stopword_coun,review_lower,review_nopunc,review_nopunc_nostop,review_nopunc_nostop_nocommon
0,Today was delivery day and we were pretty exci...,306,1730,4.656863,133,today was delivery day and we were pretty exci...,today was delivery day and we were pretty exci...,today delivery day pretty excited collect car ...,today delivery pretty excited collect car arri...
1,Don't take vehicle delivery at this SF service...,201,1122,4.587065,83,don't take vehicle delivery at this sf service...,dont take vehicle delivery at this sf service ...,dont take vehicle delivery sf service center t...,dont vehicle delivery sf service center delive...
2,"Unfortunately, as a recent new Tesla owner, I ...",184,1079,4.86413,74,"unfortunately, as a recent new tesla owner, i ...",unfortunately as a recent new tesla owner i ha...,unfortunately recent new tesla owner agree maj...,unfortunately recent new tesla owner agree maj...
3,I had a bad experience. Technician names Adam ...,57,320,4.631579,26,i had a bad experience. technician names adam ...,i had a bad experience technician names adam w...,bad experience technician names adam rude arro...,bad experience technician names adam rude arro...
4,Adding to the bad reviews of this location...I...,123,692,4.544715,46,adding to the bad reviews of this location...i...,adding to the bad reviews of this locationim c...,adding bad reviews locationim considering buyi...,adding bad reviews locationim considering buyi...


## 4. Lemmatize the Reviews

In [33]:
!pip install textblob

Collecting textblob
[?25l  Downloading https://files.pythonhosted.org/packages/60/f0/1d9bfcc8ee6b83472ec571406bd0dd51c0e6330ff1a51b2d29861d389e85/textblob-0.15.3-py2.py3-none-any.whl (636kB)
[K     |████████████████████████████████| 645kB 2.4MB/s eta 0:00:01
Installing collected packages: textblob
Successfully installed textblob-0.15.3


In [35]:
# Import textblob
from textblob import Word
nltk.download('wordnet')

# Lemmatize final review format
df['cleaned_review'] = df['review_nopunc_nostop_nocommon']\
.apply(lambda x: " ".join([Word(word).lemmatize() for word in x.split()]))

[nltk_data] Downloading package wordnet to
[nltk_data]     /Users/nicholasrenotte/nltk_data...
[nltk_data]   Unzipping corpora/wordnet.zip.


In [36]:
print('Base review\n', df['review'][0])
print('\n------------------------------------\n')
print('Cleaned and lemmatized review\n', df['cleaned_review'][0])

Base review
 Today was delivery day and we were pretty excited to collect our car. When we arrived we proceeded upstairs where the magic happens. Jessica H. and Alex were both helping other customers and we were third in line. Even though we waited less than 15 minutes we were told they'd be with us promptly and they thanked us for our patience and for waiting. They made us feel valued and the time passed quickly. When it was our turn, Jessica explained the process, walked us through the documents she'd prepared in advance of our arrival (and even colour coded to make everything both easy and efficient), and explained to us what to expect. The place was spotless. We saw the staff wearing masks, cleaning surfaces and office supplies after every use and they even had separate cups for clean versus dirty pens. These folks have this down to a science. Our car was ready for us and Alex gave us a mini-tutorial, asked if we needed help and was prepared to answer questions even though we had n

## 5. Sentiment Analysis

In [37]:
# Calculate polarity
from textblob import TextBlob
df['polarity'] = df['cleaned_review'].apply(lambda x: TextBlob(x).sentiment[0])

In [38]:
# Calculate subjectivity
df['subjectivity'] = df['cleaned_review'].apply(lambda x: TextBlob(x).sentiment[1])

In [41]:
df[['cleaned_review', 'polarity', 'subjectivity']].head()

Unnamed: 0,cleaned_review,polarity,subjectivity
0,today delivery pretty excited collect car arri...,0.291106,0.667274
1,dont vehicle delivery sf service center delive...,0.114646,0.429747
2,unfortunately recent new tesla owner agree maj...,0.050758,0.294886
3,bad experience technician name adam rude arrog...,-0.28,0.386667
4,adding bad review locationim considering buyin...,-0.110417,0.500694
