# SCRAPING FROM AMAZON

#### **This is a code that gets the name of the product, the title of the review, the number of stars given, and the full comment from the review page of a product you want from Amazon.com. I took airpods 2 comments as an example here**


#### Required libraries

- requests
- BeautifulSoup from bs4
- pandas


##### After importing our libraries, we define an empty list to convert to dataframe structure after receiving our comments.
##### We use the header structure (User-Agent) so that the Amazon site does not consider us as robots and prevent us from pulling data.
##### We define the function that we send a request to the site (get_soup). We send a request by typing the link of the product we want into "requests.get" and adding "headers" to the end. With BeautifulSoup, we split the data from the lxml method. (can also be done in html)
##### Then, with the "get_reviews" function, we select the previously obtained data as product, title, rating and body part according to the html structure.
##### Finally, we take the product parts in each comment and put it in the empty directory (reviewlist) we defined at the beginning and save it as an excel file. Using the for loop with range, we determine from the beginning how many pages of comments we want to receive. The if part at the end of the for loop is to avoid an error when the last page is reached.

In [20]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
from google_trans_new import google_translator
import pandas as pd
from textblob import TextBlob
import warnings
warnings.filterwarnings("ignore")

In [8]:
reviewlist = []

In [9]:
header = {
    'Host': 'www.amazon.com',
    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Connection': 'keep-alive',
    'Upgrade-Insecure-Requests': '1',
    'TE': 'Trailers'
}

In [10]:
def get_soup(url):
    req = requests.get("https://www.amazon.com/Logitech-Lightspeed-PowerPlay-Compatible-Lightsync/product-reviews/B07L4BM851/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviews", headers = header)
    soup = BeautifulSoup(req.content, "lxml")
    return soup

In [11]:
def get_reviews(soup):
    reviews = soup.find_all('div', {'data-hook': 'review'})
    try:
        for item in reviews:
            review = {
            'product' : soup.title.text.replace('Amazon.com: Customer reviews:', '').strip(),
            'title': item.find('a', {'data-hook': 'review-title'}).text.strip(),
            'rating':  float(item.find('i', {'data-hook': 'review-star-rating'}).text.replace('out of 5 stars', '').strip()),
            'body': item.find('span', {'data-hook': 'review-body'}).text.strip(),
            }
            reviewlist.append(review)
    except:
        pass

In [12]:
for x in range(1,20):
    soup = get_soup(f'https://www.amazon.com/Logitech-Lightspeed-PowerPlay-Compatible-Lightsync/product-reviews/B07L4BM851/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviews&pageNumber={x}')
    print(f'Getting page: {x}')
    get_reviews(soup)
    print(len(reviewlist))
    if not soup.find('li', {'class': 'a-disabled a-last'}):
        pass
    else:
        break

df = pd.DataFrame(reviewlist)
df.to_excel('reviews.xlsx', index=False)
print('May the force be with you!')

Getting page: 1
10
Getting page: 2
20
Getting page: 3
30
Getting page: 4
40
Getting page: 5
50
Getting page: 6
60
Getting page: 7
70
Getting page: 8
80
Getting page: 9
90
Getting page: 10
100
Getting page: 11
110
Getting page: 12
120
Getting page: 13
130
Getting page: 14
140
Getting page: 15
150
Getting page: 16
160
Getting page: 17
170
Getting page: 18
180
Getting page: 19
190
May the force be with you!


In [23]:
translator = google_translator()

In [13]:
#reading files
rev = pd.read_excel("reviews.xlsx")
rev

Unnamed: 0,product,title,rating,body
0,Logitech G502 Lightspeed Wireless Gaming Mouse...,bad,5,poop
1,Logitech G502 Lightspeed Wireless Gaming Mouse...,"As good as the original G502, but Logitech nee...",3,The mouse itself is incredible. Absolutely zer...
2,Logitech G502 Lightspeed Wireless Gaming Mouse...,Excited to swap out my G900 for this as a gene...,2,"This Review was done after hours of testing, I..."
3,Logitech G502 Lightspeed Wireless Gaming Mouse...,Stopped working less than two days after recei...,1,"If this mouse works, it's pretty fantastic. I ..."
4,Logitech G502 Lightspeed Wireless Gaming Mouse...,Mouse is great. Software is absolute trash.,1,I don't really have a single complaint about t...
...,...,...,...,...
185,Logitech G502 Lightspeed Wireless Gaming Mouse...,Favorite mouse ever but wireless,5,I always said only way I’m getting a new mouse...
186,Logitech G502 Lightspeed Wireless Gaming Mouse...,Save yourself and do not buy this mouse for now.,1,"First of all, I will say this was a nice addit..."
187,Logitech G502 Lightspeed Wireless Gaming Mouse...,Solid gaming mouse and so far no double clicki...,5,I was worried after experiencing the G Pro Wir...
188,Logitech G502 Lightspeed Wireless Gaming Mouse...,*UPDATED* Don't buy it!,1,As fast or faster than the wired version. I us...


In [15]:
#for displaying whole comments
with pd.option_context("display.max_colwidth", None):
    display(rev)

Unnamed: 0,product,title,rating,body
0,"Logitech G502 Lightspeed Wireless Gaming Mouse with Hero 25K Sensor, PowerPlay Compatible, Tunable Weights and Lightsync RGB - Black",bad,5,poop
1,"Logitech G502 Lightspeed Wireless Gaming Mouse with Hero 25K Sensor, PowerPlay Compatible, Tunable Weights and Lightsync RGB - Black","As good as the original G502, but Logitech needs to learn a few things",3,"The mouse itself is incredible. Absolutely zero lag or delay, top tier optical sensor, extremely good build quality and shape. However, charging 150 for this thing feels like Logitech is only pricing it that way because they know people will pay for such a legendary mouse. However, the g502 Hero can usually be found for around 60 dollars. How does Logitech get away with charging an extra 100 dollars to remove a wire? It’s absurd. Even the g602, which is an older wireless mouse by them, can be had for 40 dollars. The mouse is overpriced, it should be retailing for 110 at the most, not 150. Other options such as the dark core is usually around 70 dollars, the rival 650 can be had for around 100 dollars and even razer, who is known for overpricing things, is selling their mamba wireless for 99. That mouse goes on sale for around 80. Logitech needs to realize people bought their stuff because it was priced fairly and performed high end. Charging more than everyone else, is a great way to start looking like the bad guy, especially when your mice still have double clicking issues. Great mouse, bad price."
2,"Logitech G502 Lightspeed Wireless Gaming Mouse with Hero 25K Sensor, PowerPlay Compatible, Tunable Weights and Lightsync RGB - Black",Excited to swap out my G900 for this as a general use mouse... and then not. So close,2,"This Review was done after hours of testing, I really wanted to like this mouse. I feel like I need to make this review with some VERY important points for future buyers to take into account until they are remedied by Logitech:The R&D that went into this is impressive, to completely redesign a loved mouse, make it wireless with an amazing battery and the new hero sensors and keep the exact same shape? Awesome job.That's where it ends for me. I have around 10 high end mice, and have slowly been letting them go to minimize clutter. The G Pro Wireless is my new forever-until-a-better mouse for games.When I'm not gaming, I stay with my classic G900 for it's free-scroll wheel (and a left-right mouse wheel click that G-Pro lacks). But it's battery life is no where enough with the options Logitech currently offers. So I saw this G502 as an amazing box checker, as well as more buttons.Where to start. The most important thing for a mouse is 'feel'. I grabbed it and loved the shape, I began to move it around--... what's that noise? I moved it more. Something... feels off. I moved it around, and realized.. The mouse is scratching on my mousepad. I checked the feet/stickers... nope. they are level. I checked for any imperfections in the plastic from manufacturing.. still no. I tilted the mouse slightly askew, to see which edge was catching. I was absolutely pissed off, that stupid triangle-esque texture on the pinky side? It continues Allll along to the underbelly of the mouse, and it's prominent enough to scrape the mousepad below it.I tried 5 other types of mousepad, and everything other than my metal mousepad scratches and rubs the pads.Reader, Move any mouse you own, especially a 'gaming' mouse, you know you want it to be smooth as butter. Nope, the texture printed/molded onto the mouse itself is too intense. like, lift the mouse with your thumb a milimeter and move the mouse, it will leave a gigantic scrape/rub mark.So disappointing. I am keeping the mouse, I have an order of various sandpaper grits to remove that horrible design oversight.Next... Logitech, look, your software ""G-Hub"" you are forcing me to now use. It is so so bad, and so overdersigned. It makes me hate it even more you forced me to download in order to even use the mouse. Nice. I know there isn't just an incompatibility issue of the hardware on the classic Logitech mouse software, because even your more recent Hero sensor mice work on it. This is just a clear attempt at outmoding your older software support, and you forced $150 mouse buyers into a beta.The software feels like you hired Twice as many graphic designers than you did programmers. You know what people are ok with in software? Simplicity, not a rubix cube draped in fine linen.What's G-hub like?Want to change DPI? Of course just, input your dpi- just kidding, you have to take each individual dot on a slider, (it will change you mouse to that when you click-hold it have fun on the higher dpi options) next, try to figure out which way to drag the dot aaaaallll the way off the slider, and you have to FLICK the dot UP off the dpi line. Yes, they made dpi options in a C-tier mobile game.Second, the different mouse profiles? Good luck, apparently if you set it to ""on board memory"" the mouse sticks with just 1 profile. so you can't take multiples profiles as far as I've found.And third reason G-Hub is incredible. It crashes games. Yeah, I've had 2 games with no recent updates this week, which have never crashed since I've owned them until today (weird, the day I installed G-Hub and nothing else) Once I killed G-hub in Task manager, I played the same games for a few hours, no more crashes. Nice.So leave G-hub off... OH RIGHT! Your mouse forgets everything without it! So after gaming, turned on my G502, it turns into a default RGB nightmare. You have to restart G-Hub in order to get your mouse back on track.I'm going to hold onto this mouse, I'll make it work, but the fact I'm considering using sandpaper on a cosmetic non-defect, and hating every moment on your forced G-hub utilization is such a dissapointment I have never felt towards Logitech.I expected for the price to be treated like a premium user, instead I feel treated like a beta tester who had to pay to use it.UPDATE EDIT 2 weeks later: To fix the horrible sratching/rubbing, its a combo of the texture (pinky-side) and the mouse feet are FAR to thin, and the engraved grooves for the mouse feet rake against your mouse pad /surface. I purchased third party mouse feet, and sanded the texturing off on the bottom. No more rubbing. THIS is the mouse I paid for, its smooth as butter now, and doesn't scratch at my desk.Logitech, Come on, put some better mouse skates/feet on and TEST your product on multiple surfaces so I don't have to do that. It's not even a faulty product I got, it was 6 different points of contact I had to take into my hands. It was just poor shell moulding and mouse feet that were FAR to thin."
3,"Logitech G502 Lightspeed Wireless Gaming Mouse with Hero 25K Sensor, PowerPlay Compatible, Tunable Weights and Lightsync RGB - Black",Stopped working less than two days after receiving it,1,"If this mouse works, it's pretty fantastic. I like the layout, it's quick as heck, fairly lightweight for a wireless mouse, and the software, though not what I prefer, works fine for me.What killed the mouse for me was the dreaded Logitech double-click issue. Less than two days after the item arrived every click was detected as a double-click. It'd click once on the downstroke, and click again on the release. It wasn't dropped, and is in a clean environment. I thought it might be software related, so I removed all Logitech software, uninstalled/cleaned the drivers, and even tried in Linux and in Safe Mode, the problem was still there.So, it's getting returned, and for $150 I'm not going to gamble on another one.UpdateOk, so I didn't take my own advice, and I gambled on a second one. I loved this mouse while it was working, it did everything I wanted extremely well, and I figured maybe I just got a lemon. The second one arrived last week, and stopped working last night. Same issues with the left click always detecting as a double-click. You can fix it with new switches, but that involves soldering, and ordering more parts for a $150 mouse. Strongly recommend just staying away entirely, don't gamble like I did."
4,"Logitech G502 Lightspeed Wireless Gaming Mouse with Hero 25K Sensor, PowerPlay Compatible, Tunable Weights and Lightsync RGB - Black",Mouse is great. Software is absolute trash.,1,"I don't really have a single complaint about the mouse itself, ***but*** you need software with it to disable the stupid battery-draining LEDs, as well as the profile switching (disabling it), adjusting DPI, etc. Without GHub, you cannot adjust the DPI.With that said, Logitech Gaming Software has been excellent for many, many years. It has a great tuning feature. It was clean. It did exactly what I wanted. It worked with the G502 (and every other Logitech I had). Unfortunately, the idiots at Logitech decided to not support this mouse with it. After an hour or two of frustration, I found out that they want you to use a new piece of garbage program called ""GHub"".So far I hate GHub. It frequently locks up and closing it won't fix the problem. All of a sudden I'll notice my DPI is way too slow and have to restart GHub, but the problem is, closing it from the Task Tray doesn't close it fully when it's locked up. I have to go to the Task Manager and find the couple GHub programs and crash them all. Then it will restart fine and my settings will be saved.Also, either the program or the mouse goes into a sleep mode state quite often when not in use, so this means every time you grab your mouse to use it, it will be really slow until the GHub (or mouse) kicks back into gear and applies your settings. I'm not quite sure why my DPI is tied to GHub running. LGS wasn't like this.Good job, Logitech."
...,...,...,...,...
185,"Logitech G502 Lightspeed Wireless Gaming Mouse with Hero 25K Sensor, PowerPlay Compatible, Tunable Weights and Lightsync RGB - Black",Favorite mouse ever but wireless,5,I always said only way I’m getting a new mouse is if they make a G502 wireless and here it is. I love it
186,"Logitech G502 Lightspeed Wireless Gaming Mouse with Hero 25K Sensor, PowerPlay Compatible, Tunable Weights and Lightsync RGB - Black",Save yourself and do not buy this mouse for now.,1,"First of all, I will say this was a nice addition to the Logitech G Products. I currently own the G502, G502 Lightspeed, G910 keyboard, G933 headset, and Powerplay Mousepad. The only reason I am giving this a one star is because of the software needed to download to program the mouse. I have a pretty dang good computer but the second I open the software my CPU maxes out to 99% usage, everything in the program is non-responsive and it will just infuriate you. If G-hub sticks aroud I will never by a Logitech accessory again. If they make G-hub not pull your hair out bad I would switch this to 5 stars."
187,"Logitech G502 Lightspeed Wireless Gaming Mouse with Hero 25K Sensor, PowerPlay Compatible, Tunable Weights and Lightsync RGB - Black",Solid gaming mouse and so far no double clicking issues,5,"I was worried after experiencing the G Pro Wireless' double clicking, but so far after using the G502 for a few weeks I've not experienced the double clicking at all. I have run into another issue where the G502 misreads the battery and always displayed ""low battery."" This happened the entire first week so I let the battery fully die (so the battery's internal clock would ""reset"") and haven't had that issue since.Manufacturer's packing is great. It comes with a premium braided USB cable, wireless receiver, and an hard shell case that holds the weights. I love the design of this mouse over the G Pro Wireless especially since I have big hands as well as use ""claw"" style so having a slightly bigger mouse that my hands can wrap around feels better. Since it is heavier it does move noticeably slower across surfaces than lighter mouses.The 16K DPI Hero sensor is just superb and the extra battery life is tremendous. It can even be used on glass surfaces with no tracking issues. It even has a button for a scroll wheel mode (continuous scroll or standard click step up.) There is also textured grips on the side. My only gripes are that for the price it could use a bit more RGB and the 2 middle buttons are flimsy. The small G logos and battery indicator only lighting up is kind of disappointing. The G Hub software allows you to map any command to any button. There's a database of hundreds of commands to choose from including basic windows commands like copy/paste.Overall, I would definitely recommend this mouse. Performance and ergonomic wise it doesn't get much better than this. The 5 minute charge giving it 2.5 hours is not only amazing, but a lifesaver for those that frequently forget to charge their mouse! If you're thinking about cutting the cord and getting rid of your wired mouse this would be the perfect replacement. Hopefully a revised G903 is coming down the pipeline as well."
188,"Logitech G502 Lightspeed Wireless Gaming Mouse with Hero 25K Sensor, PowerPlay Compatible, Tunable Weights and Lightsync RGB - Black",*UPDATED* Don't buy it!,1,"As fast or faster than the wired version. I use the Logitech charging mouse pad. The only issue is I have to turn it off and on someones to fix the LEDs from not matching or loading the profile normally fixes it.When it comes to the Logitech Program... its annoying AF.Update:12/5/2019The double-clicking has gotten really noticeable. Not bad enough it affects me in FPS games cause I am just shootings. I do notice it when using google and I double click things. Happens 1/20 or sometimes more or less. Its sad the mouse is not as good as the wired version. I have 5 of the wired versions for my desktop, laptop, traveling, work, and at my other home. The mouse otherwise is great other than sometimes it losing the profile I am on or not being detected in the GHUB which is a GHUB issue and always has been unless you use wired. I take that back cause it has even lost my speakers and my keyboard that is wired. The GHUB program is a terrible man. I wouldn't use it but for the G502 Light Speed you cant use the old software that is simple and has no issues. I have thought about switching back to the old wired G502 and gaming back to the old program, but I will wait until I get the point where I can't stand the double-clicking. Most other games from FPS I normally use a controller for RPG, sports, or racing games. GHUB is terrible. Don't even get me started about that stupid program or just google all the issues people have had with it. I once had to reformat my computer to get it uninstalled so I could reinstall it to get it to update because the update froze as it does for a lot of people and the ways to fix it were not working like normal. I would not use this mouse. The price is high but so are all really nice things and the double-clicking and software which I think is the worse part I would not buy right now."


In [24]:
#We limited the comments to 5000 characters. The program allows this much.
rev["body"]=rev.body.str[:4999]

In [25]:
#punctuation
rev["body"] = rev["body"].str.replace('[^\w\s]','')

#numbers
rev["body"]= rev["body"].str.replace('\d','')

#stopwords
import nltk
#nltk.download('stopwords')
from nltk.corpus import stopwords
sw = stopwords.words('english')
rev["body"]= rev["body"].apply(lambda x: " ".join(x for x in x.split() if x not in sw))

#lemmi
from textblob import Word
#nltk.download('wordnet')
rev["body"] = rev["body"].apply(lambda x: " ".join([Word(word).lemmatize() for word in x.split()])) 

#bupper-lower 
rev["body"]= rev["body"].apply(lambda x: " ".join(x.lower() for x in x.split()))

In [26]:
#We translate reviews from any language to English
rev["body_en"] = rev["body"].map(lambda x:translator.translate(x,lang_src="auto", lang_tgt="en"))

In [None]:
rev

In [None]:
#we add a columns as polaritiy for showing reviews' polarity score
rev["sentiment"] = rev["body_en"].map(lambda x: TextBlob(x).sentiment.polarity)
rev

In [None]:
#we named as positive,negative and notr for polarity scores
for i in rev["sentiment"]:
    if i < 0:
        rev["sentiment"].replace(to_replace = i, value = "Negatif", inplace = True)
    elif i > 0:
        rev["sentiment"].replace(to_replace = i, value = "Positive", inplace = True)
    else:
        rev["sentiment"].replace(to_replace = i, value = "Notr", inplace = True)

In [None]:
rev

In [None]:
#we observed how many reviews we have as positive,negative and notr and we visualized below
rev["sentiment"].value_counts()

In [None]:
rev["sentiment"].value_counts().plot(kind="bar")