<a href="https://colab.research.google.com/github/saffarizadeh/INSY4054/blob/main/Automation_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img src="http://saffarizadeh.com/Logo.png" width="300px"/>

# *INSY 4054: Emerging Technologies*

# **Automation Project**

Instructor: Dr. Kambiz Saffarizadeh

---

## Please read carefully

In this project, we want to learn how to scrape and analyze user reviews from a specific webpage.

The target webpage is http://saffarizadeh.com/ET/reviews.html.

Please open and view the webpage.

In next steps, after importing all needed libraries, we first download the webpage. Then using BeautifulSoup, we extract titles, reviews, and ratings from the webpage. We then create a table to keep these data. Next, we pass the reviews to a sentiment analysis model and store the sentiments in a new column in the table. Finally we create a few reports based on the sentiment analysis and store all tables in an Excel file.

## Insert all needed libraries here

In [None]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

## Crawl the website

Use the `get` method to download the following webpage:

`http://saffarizadeh.com/ET/reviews.html`

In [None]:
url = 'http://saffarizadeh.com/ET/reviews.html'
response = requests.get(url)

## Make a soup

Use Beautiful Soup to create/make an HTML soup!

In [None]:
soup = BeautifulSoup(response.content, 'lxml')

## Using the soup, extract titles, reviews, and ratings

Note: do this in 3 separate steps. These steps are extremely similar. So after writing the first one, the next ones should be very easy.

Note: use `attrs` to find all relevant elements for each step.

Note: to figure out which attribute(s) and attribute values you need to use, open http://saffarizadeh.com/ET/reviews.html in Chrome or FireFox, right click on the element you want to extract, and select `Inspect` or `Inspect Element`. This way you can see the HTML code for this specific element. Using slides #7 and #8 of Week 8-1, you should be able to identify the attribute name and attribute values needed.

Titles: Store all review titles in a list named `list1`

In [None]:
review_titles = soup.find_all(attrs={"class": "reviewTitle"})
review_titles_text = []

for review_title in review_titles:
  review_titles_text.append(review_title.text)

review_titles_text

['Yeaaa USB C!!!!! But the dongle? Why? Why not a USB C dongle!?!?!? WHY Logitech! Why?',
 'Logitech - how many tries do you need to get it just right?',
 'Worthy Upgrade, Too Bad It’s Not In White',
 'Terrible scroll wheel issues',
 'Logitech made a great mouse even better',
 'Unconfortable downgrade from the Performance MX 1',
 'Not for gamers',
 'More compatible with Mac than I expected',
 'Improvement over the last gen. Worth the upgrade',
 'The best scroll wheel ever.']

Reviews: Store all review body texts in a list named `list2`

In [None]:
review_bodies = soup.find_all(attrs={"class": "reviewBody"})
review_bodies_text = []

for review_body in review_bodies:
  review_bodies_text.append(review_body.text)

review_bodies_text

['WHY Logitech! Why? Lets finally go FULL USB C not this half butt effort. I want to see a full butt USB C effort.',
 'The scroll wheel is awesome. The fit and finish is awesome though it should have a model slippery rubber coating when I can get around that problem. The buttons are decent enough. The scroll wheel is awesome. But......\n                Damn it damn it damn it. How many times is it going to take for you to get the damned forward and back buttons right!\n                Version one sucked. Version 2 sucked. Version 3 walad admittedly slightly improved, STILL STUCK!\n                They are too small , too stiff, and meld in to everything around them.\n                They need to be made much larger and they need to be place much lower.\n                Oh well. Maybe version 4 will get them right...\n                TWO stars off for little or no useability testing with REAL customers instead of devs and managers.\n                Damn it.',
 'Purchased this product af

Ratings: Store all review ratings in a list named `list3`

Note: the extracted ratings will be in `str` (text) format; convert them to `int` before storing them in the list

In [None]:
review_ratings = soup.find_all(attrs={"class": "rating"})
review_ratings_text = []

for review_rating in review_ratings:
  review_ratings_text.append(float(review_rating.get("rating")))

review_ratings_text

[3.0, 3.0, 4.0, 1.0, 5.0, 2.0, 3.0, 5.0, 5.0, 5.0]

## Create a `pandas` data frame and store the three lists that you created for titles, body texts, and ratings

Hint: you can first create a dictionary with `title`, `body`, and `rating` as keys and list1, list2, and list 3 and values. Then you can create a data frame from this dictionary.

In [None]:
df = pd.DataFrame({"title": review_titles_text, "body": review_bodies_text, "rating": review_ratings_text})

Show the data frame:

In [None]:
df

Unnamed: 0,title,body,rating
0,Yeaaa USB C!!!!! But the dongle? Why? Why not ...,WHY Logitech! Why? Lets finally go FULL USB C ...,3.0
1,Logitech - how many tries do you need to get i...,The scroll wheel is awesome. The fit and finis...,3.0
2,"Worthy Upgrade, Too Bad It’s Not In White",Purchased this product after accidentally purc...,4.0
3,Terrible scroll wheel issues,I have the former MX Master 2S and upgraded to...,1.0
4,Logitech made a great mouse even better,"The MX Master 2s was a fantastic mouse, but I ...",5.0
5,Unconfortable downgrade from the Performance MX 1,Cons in comparison to the original Performance...,2.0
6,Not for gamers,Many Youtubers recommend this mouse as their a...,3.0
7,More compatible with Mac than I expected,If you work with a Mac and are wondering if th...,5.0
8,Improvement over the last gen. Worth the upgrade,I have two of the previous generation and this...,5.0
9,The best scroll wheel ever.,The best Mx Master yet. And this time with rea...,5.0


In [None]:
#@title Run this cell to train a sentiment analysis model. This model directly comes from Activity 6 in Week 5-1 slides. Running this cell takes 1-2 minutes.
%%capture
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_datasets as tfds

train_data, validation_data, test_data = tfds.load(name="imdb_reviews", split=('train[:60%]', 'train[60%:]', 'test'), as_supervised=True)
model = tf.keras.models.Sequential([
    hub.KerasLayer("https://tfhub.dev/google/nnlm-en-dim50/2", input_shape=[], dtype=tf.string, trainable=True),
    tf.keras.layers.Dense(16, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam', loss=tf.keras.losses.BinaryCrossentropy(from_logits=True), metrics=['accuracy'])
model.fit(train_data.shuffle(1).batch(512), epochs=10, verbose=1)

After running the previous cell, pass the column containing the reviews to the model. To do so, run the following code after replacing `column_placeholder` with the actual column from the data frame:

`sentiment = model(column_placeholder).numpy()`

In [None]:
sentiment = model(df["body"]).numpy()

Store `sentiment` as a new column in the data frame.

In [None]:
df["sentiment"] = sentiment

Show the data frame:

In [None]:
df

Unnamed: 0,title,body,rating,sentiment
0,Yeaaa USB C!!!!! But the dongle? Why? Why not ...,WHY Logitech! Why? Lets finally go FULL USB C ...,3.0,0.046181
1,Logitech - how many tries do you need to get i...,The scroll wheel is awesome. The fit and finis...,3.0,0.367792
2,"Worthy Upgrade, Too Bad It’s Not In White",Purchased this product after accidentally purc...,4.0,0.588653
3,Terrible scroll wheel issues,I have the former MX Master 2S and upgraded to...,1.0,0.269465
4,Logitech made a great mouse even better,"The MX Master 2s was a fantastic mouse, but I ...",5.0,0.993866
5,Unconfortable downgrade from the Performance MX 1,Cons in comparison to the original Performance...,2.0,0.707362
6,Not for gamers,Many Youtubers recommend this mouse as their a...,3.0,0.620746
7,More compatible with Mac than I expected,If you work with a Mac and are wondering if th...,5.0,0.997792
8,Improvement over the last gen. Worth the upgrade,I have two of the previous generation and this...,5.0,0.833942
9,The best scroll wheel ever.,The best Mx Master yet. And this time with rea...,5.0,0.995888


Select the rows with sentiment values above average.

In [None]:
df[df["sentiment"]>=0.5]

Unnamed: 0,title,body,rating,sentiment
2,"Worthy Upgrade, Too Bad It’s Not In White",Purchased this product after accidentally purc...,4.0,0.588653
4,Logitech made a great mouse even better,"The MX Master 2s was a fantastic mouse, but I ...",5.0,0.993866
5,Unconfortable downgrade from the Performance MX 1,Cons in comparison to the original Performance...,2.0,0.707362
6,Not for gamers,Many Youtubers recommend this mouse as their a...,3.0,0.620746
7,More compatible with Mac than I expected,If you work with a Mac and are wondering if th...,5.0,0.997792
8,Improvement over the last gen. Worth the upgrade,I have two of the previous generation and this...,5.0,0.833942
9,The best scroll wheel ever.,The best Mx Master yet. And this time with rea...,5.0,0.995888


Select the rows with sentiment values below average.

In [None]:
df[df["sentiment"]<0.5]

Unnamed: 0,title,body,rating,sentiment
0,Yeaaa USB C!!!!! But the dongle? Why? Why not ...,WHY Logitech! Why? Lets finally go FULL USB C ...,3.0,0.046181
1,Logitech - how many tries do you need to get i...,The scroll wheel is awesome. The fit and finis...,3.0,0.367792
3,Terrible scroll wheel issues,I have the former MX Master 2S and upgraded to...,1.0,0.269465


Create an Excel file with three sheets showing: 1) the main data frame, 2) the rows with sentiment values above average, and 3) the rows with sentiment values below average.