python code 5

# Lexicon

The code below downloads a lexicon and saves it in a Python dictionary called `lexicon`.

In [None]:
import urllib.request, json
with urllib.request.urlopen("https://storage.googleapis.com/wd13/lexicon.txt") as url:
  lexicon_file = url.read().decode()
lexicon = {}
for line in lexicon_file.split('\n'):
  split_line = line.split('\t')
  token = split_line[0]
  score = float(split_line[1])
  lexicon[token] = score

The `lexicon` dictionary contains entries for approximately 7500 tokens that have either positive or negative sentiment. Each token is a key and the value is the sentiment score. Positive scores imply positive sentiment, negative scores imply negative sentiment. The further from zero the score, the more extreme the sentiment.

"good" has a score of 1.9.

In [None]:
lexicon['good']

1.9

"great" has a score of 3.1.

In [None]:
lexicon['great']

3.1

"bad" has a score of -2.5.

In [None]:
lexicon['bad']

-2.5

# Question 1

Describe in a sentence or two how you could build your own lexicon using Naive Bayes.

Label the data set with the sentiments that associate with words. Then, use Naive Bayes classification to analyze the frequency of words in each category and calculate probabilities, effectively creating a lexicon that assigns words to specific categories based on their likelihood in the training data.

# Question 2

Write a function that takes a string and returns a sentiment score based on the lexicon downloaded above.

In [None]:
# sentiment scoring function
def score_function (text,lexicon):
  if text is None:
    return 0

  words=text.lower().split()
  score=0

  for word in words:
    if word in lexicon:
      score+= lexicon[word]
  return score

# Question 3

Install the google-play-scraper library.

In [None]:
!pip install google-play-scraper

Collecting google-play-scraper
  Downloading google_play_scraper-1.2.4-py3-none-any.whl (28 kB)
Installing collected packages: google-play-scraper
Successfully installed google-play-scraper-1.2.4


# Question 4

Impor the google-play-scraper library.

In [None]:
import google_play_scraper

# Question 5

Find the app id for the RBC app on the Google Play Store and save it in the variable `appid`.

In [None]:
appid = 'com.rbc.mobile.android'

# Question 6

Download all available reviews and store them in the variable `rbc_reviews`.

In [None]:
rbc_reviews = google_play_scraper.reviews_all(
  appid,
  lang='en',
  country='ca')

# Question 7

Use the function from Question 2 to add a `sentiment_score` to each review.

In [None]:
for review in rbc_reviews:
  content= review['content']
  sentiment_score = score_function(content,lexicon)
  review['Sentiment_Score'] = sentiment_score

In [None]:
rbc_reviews

[{'reviewId': '4314388d-f571-4ff8-b03a-1ca4b0e15995',
  'userName': 'Inna Titova',
  'userImage': 'https://play-lh.googleusercontent.com/a/ACg8ocKQYb1ui51HGEsaVT_CoMwY0u9MXIUErhsyUHDpGDfN=mo',
  'content': 'Very convenient',
  'score': 5,
  'thumbsUpCount': 0,
  'reviewCreatedVersion': '4.34',
  'at': datetime.datetime(2023, 10, 8, 22, 29, 43),
  'replyContent': None,
  'repliedAt': None,
  'appVersion': '4.34',
  'Sentiment_Score': 0},
 {'reviewId': 'ed6870f5-d0bb-44f8-9180-e4f07b908fc7',
  'userName': 'Ben Brash',
  'userImage': 'https://play-lh.googleusercontent.com/a/ACg8ocJ_CMptSeSyuMvCdDBz23oBhZiJbwDHhsIsNCGWJi83=mo',
  'content': "The app mostly works I guess. But it annoys me how this bank treats me like a jerk when I just want basic customer service. Been a customer since the 90's. This is absurd, I can't be this stupid forever.",
  'score': 2,
  'thumbsUpCount': 0,
  'reviewCreatedVersion': '4.34',
  'at': datetime.datetime(2023, 10, 7, 21, 0, 6),
  'replyContent': None,
  'r

# Question 8

Add a `sentiment_flag` variable to each review. It should be equal to 'pos' if the `sentiment_score` is greater than 0, 'neg' if the `sentiment_score` is less than 0, and 'neu' if the `sentiment_score` is equal to 0.

In [None]:
for review in rbc_reviews:
  sentiment_score = review['Sentiment_Score']
  if sentiment_score > 0:
    review['sentiment_flag'] = 'pos'
  elif sentiment_score < 0:
    review['sentiment_flag'] = 'neg'
  else:
    review['sentiment_flag'] = 'neu'


# Question 9

Add a year variable that indicates what `year` the review is from.

In [None]:
for review in rbc_reviews:
  review['year'] = review.get('at').year if review.get('at') is not None else None

# Question 10

Convert `rbc_reviews` into a Pandas dataframe.

In [None]:
import pandas as pd
df = pd.DataFrame(rbc_reviews)

In [None]:
df.head(10)

Unnamed: 0,reviewId,userName,userImage,content,score,thumbsUpCount,reviewCreatedVersion,at,replyContent,repliedAt,appVersion,Sentiment_Score,sentiment_flag,year
0,4314388d-f571-4ff8-b03a-1ca4b0e15995,Inna Titova,https://play-lh.googleusercontent.com/a/ACg8oc...,Very convenient,5,0,4.34,2023-10-08 22:29:43,,NaT,4.34,0.0,neu,2023
1,ed6870f5-d0bb-44f8-9180-e4f07b908fc7,Ben Brash,https://play-lh.googleusercontent.com/a/ACg8oc...,The app mostly works I guess. But it annoys me...,2,0,4.34,2023-10-07 21:00:06,,NaT,4.34,-3.8,neg,2023
2,1feb549b-9f9c-43d0-9ec8-70917e05c953,Lisa Carlini,https://play-lh.googleusercontent.com/a/ACg8oc...,The app is fine except for the fact that you c...,3,0,4.34,2023-10-07 14:50:09,,NaT,4.34,1.3,pos,2023
3,e7809eef-d692-4bb6-9d07-5dd5c67ce995,Al Fred,https://play-lh.googleusercontent.com/a/ACg8oc...,It works as expected. Only a couple of times c...,5,0,4.34,2023-10-07 04:38:48,,NaT,4.34,0.0,neu,2023
4,dd582b7e-525b-42a6-8473-f0b8401e9d75,Sophie Frenkel,https://play-lh.googleusercontent.com/a/ACg8oc...,Deleted this app from my S22 and now from my o...,1,0,,2023-10-06 21:45:06,,NaT,,0.7,pos,2023
5,349d11b2-47e6-403c-9c82-0ef33a2f3be7,Austin Short,https://play-lh.googleusercontent.com/a-/ALV-U...,budgets are useless and do not work. etransfer...,1,4,4.34,2023-10-06 15:56:30,"Hi Austin, we appreciate you taking the time t...",2023-10-06 21:05:50,4.34,-0.1,neg,2023
6,4f1928d4-8adf-4f83-84be-2ade15c6544a,A Google user,https://play-lh.googleusercontent.com/EGemoI2N...,Hello top looking for bottom,5,0,4.34,2023-10-06 10:46:38,,NaT,4.34,0.8,pos,2023
7,ef45e0e6-c0a9-4473-bdae-18e6e412d34b,Jason Rapsey,https://play-lh.googleusercontent.com/a-/ALV-U...,I have been an RBC customer for a whole 6 days...,1,0,4.34,2023-10-05 16:18:55,Thank you for your time and feedback. Please r...,2023-10-05 19:50:32,4.34,-1.1,neg,2023
8,2fb629f0-f378-44aa-b133-eef558d893f1,Srivick Donepudi,https://play-lh.googleusercontent.com/a/ACg8oc...,Really helpful,5,0,4.34,2023-10-05 14:53:40,Thank you for the awesome review! -- Ray,2023-10-05 19:45:38,4.34,1.8,pos,2023
9,60826c6d-9071-4011-9143-28c9a5491202,Lambert M.J. Andre,https://play-lh.googleusercontent.com/a-/ALV-U...,+++++ROYAL SAYS IT ALL+++++ *****,5,0,4.34,2023-10-05 04:58:30,Thank you for the awesome review!! -- Ray,2023-10-05 19:45:04,4.34,0.0,neu,2023


# Question 11

Calculate the percentage of reviews that are positive, negative, and neutral for each year: 2019, 2020, 2021, 2022, 2023.

In [None]:
years_to_filter = [2019, 2020, 2021, 2022, 2023]

# filter the DataFrame
filtered_df = df[df['year'].isin(years_to_filter)]

#Using group by to group reviews on year and get the total count review for that year
sentiment_total_review=filtered_df.groupby('year')['sentiment_flag'].count()

#Using group by to group reviews on year and get the count of different reviews for that year
sentiment_value_review=filtered_df.groupby('year')['sentiment_flag'].value_counts()

#Calculate percentage
sentiment_review_percentage = (sentiment_value_review / sentiment_total_review) * 100
print(sentiment_review_percentage)

year  sentiment_flag
2019  pos               59.197908
      neg               20.401046
      neu               20.401046
2020  pos               57.385399
      neu               21.392190
      neg               21.222411
2021  pos               52.237136
      neg               25.055928
      neu               22.706935
2022  pos               47.635934
      neg               28.605201
      neu               23.758865
2023  pos               48.878205
      neg               28.205128
      neu               22.916667
Name: sentiment_flag, dtype: float64
