# Web Scraping with [Python](https://www.python.org/) using [`BeautifulSoup`](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) and [`requests`](https://2.python-requests.org/en/master/)

The task is to scrape user reviews of a Beiersdorf product and analyse its content by generating a wordcloud.

__Set up__

In [None]:
%load_ext autoreload
%autoreload 2

__Import of Python modules__

In [None]:
import sys
import time
import random
from bs4 import BeautifulSoup
import requests

**Add path to look for modules**

In [None]:
import sys
sys.path.append("../src")

In [None]:
import helper_functions as hf

## Task 1: Set the URL to get scraped
Please be aware that in order to reuse our functions we have to fulfil some assumption:
1. We want to scrape a website that shows a product
2. We would expect to have at least one review.

This is a good starting point to look for a product: https://www.makeupalley.com/search?q=nivea


In [None]:
# example: https://www.makeupalley.com/product/showreview.asp/ItemId=148506/Visage-Q10-Plus-Anti-Wrinkle-Night-Cream/NIVEA/Moisturizers
url = None # pick your URL

## Task 2: Generate the URLs you want to scrape
In our current approach this information needs to be retrived manually. Go to your URL of choice and try to figure out that number.

In [None]:
max_pagination = 1 # set that value as needed
urls = hf.generate_urls(url=url, pages=(1,max_pagination))
urls

## Task 2: Retrieve the reviews

In [None]:
headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64;     x64; rv:66.0) Gecko/20100101 Firefox/66.0", "Accept-Encoding":"gzip, deflate",     "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "DNT":"1","Connection":"close", "Upgrade-Insecure-Requests":"1"}
reviews = ''
for url in urls:
    print(f'Processing url {url}')
    page = requests.get(url, headers=headers)
    soup = BeautifulSoup(page.content, 'html.parser')
    text = hf.extract_text_from_soup(soup)
    reviews = reviews + " " + text
    print(f'There are {len(text.split())} words extracted.\n')
    time.sleep(random.randint(0,1) + random.random())

## Task 3: Generate a word cloud

Feel free to experiment with arguments of the `WordCloud` function. Look up section 3 for inspiration. 

In [None]:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
# Create and generate a word cloud image:
wordcloud = WordCloud().generate(reviews)

# Display the generated image:
fig, ax = plt.subplots(figsize=(12,6))
ax.imshow(wordcloud, interpolation='bilinear')
ax.axis("off");

## _One potential solution_

In [None]:
# %load ../src/_solutions/free_try.py

***