## In this tutorial we're going to scrape quotes from [Quotes to Scrape](http://quotes.toscrape.com/) 

This will be done with the [requests](https://3.python-requests.org/) and [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) libraries.

Take a quick look at their Quickstart sections to get a feel for their purpose.

Write your code in the cells under 'Your code', solutions are under the questions

## Scraping always starts with examining the website

Go to http://quotes.toscrape.com/tag/humor/ and open the developer tools (F12).

Right-click on items to get a feeling for the structure of the HTML.


## Assign the URL of the site we want to scrape to a variable

```
url = "http://quotes.toscrape.com/tag/humor/"
```

## Your code

## Retrieve the page with the requests library, assign the result to a variable

```
import requests  
response = requests.get(url)
```

## Your code

## Examine the result, extract the usefull part

This is what we see with 'View page source' or 'Inspect'  
```
text = response.text
```

## Your code

## Hint : the header is not needed

Only print the <body> of the HTML
```
print(text[text.index("<body>"):])
```

## Your code

## Find the structure of a single quote

```
<span class="text" itemprop="text">
    “A day without sunshine is like, you know, night.”
</span>
<span>
    by
    <small class="author" itemprop="author">
        Steve Martin
    </small>
</span>
```



If we look closely, we can see that each quote is contained in a **span** with the attribute **itemprop** set to **text**.  
The author is in a **small** with **itemprop** set to **author**.

## Import the BeautifulSoup from the bs4 package and parse the text from the document

```
from bs4 import BeautifulSoup  
soup = BeautifulSoup(text,'html.parser')
```

## Your code

## Loop through all the quotes and print the author and the quote



```
for quote in soup.find_all('span', {'itemprop': 'text'}):
    author = quote.findNext('small', {'itemprop': 'author'}).text
    
    print()
    print(author, "-", quote.text)
```

## Hint : look at the BeautifulSoup documentation

https://www.crummy.com/software/BeautifulSoup/bs4/doc/#calling-a-tag-is-like-calling-find-all



## Your code

## Advanced : get the URL of the next page

As you can see, there are 10 quotes per page, bit there are 12 quotes available.  
Luckily, the page has a *next* button.

## Find the URL for the next page by inspecting the HTML

- Find the element that contains the link to the next page
- Extract the path from that element (hint : see documentation)
- Create a URL from the path (hint : a URL consists of a domain and a path)



Find link to the next page
```
li = soup.find('li', {"class": "next"})
```

Get the URL for the next page

```
path = li.findNext('a')['href']
print(path)
url = "http://quotes.toscrape.com%s" % path
print(url)

```

## Your code

## Retrieve the remaining quotes



This is similar to the code above
```
response = requests.get(url)
text = response.text
soup = BeautifulSoup(text,'html.parser')
for quote in soup.find_all('span', {'itemprop': 'text'}): 
     author = quote.findNext('small', {'itemprop': 'author'}).text
     
     print() 
     print(author, "-", quote.text) 

```

## Your code