### Find all the quotes on a page with pagination

<a href="http://quotes.toscrape.com">http://quotes.toscrape.com</a>

If you scroll down, you'll notice the 'Next' button.  
This takes us to http://quotes.toscrape.com/page/2/  
On the second page there's also a 'Previous' button.  
That takes us to http://quotes.toscrape.com/page/1/

Apparently page is identified by the path (**page/X/**) after the hostname (**quotes.toscrape.com**)

---

### define a function that print the quotes from a soup object

This is explained in [01_Scraping](01_Scraping.ipynb)

In [1]:
from bs4 import BeautifulSoup
def print_quotes(soup):
    for quote in soup.find_all('span', {'itemprop': 'text'}):
        author = quote.findNext('small', {'itemprop': 'author'}).text
        print("%-20s" % author, quote.text[:50])

Check if the function works.

In [2]:
import requests
url = "http://quotes.toscrape.com"
response = requests.get(url)
text = response.text
soup = BeautifulSoup(text,'html.parser')
print_quotes(soup)

Albert Einstein      “The world as we have created it is a process of o
J.K. Rowling         “It is our choices, Harry, that show what we truly
Albert Einstein      “There are only two ways to live your life. One is
Jane Austen          “The person, be it gentleman or lady, who has not 
Marilyn Monroe       “Imperfection is beauty, madness is genius and it'
Albert Einstein      “Try not to become a man of success. Rather become
André Gide           “It is better to be hated for what you are than to
Thomas A. Edison     “I have not failed. I've just found 10,000 ways th
Eleanor Roosevelt    “A woman is like a tea bag; you never know how str
Steve Martin         “A day without sunshine is like, you know, night.”


---

### define a function that retrieves and parses a page

Take a path as parameter, return a 'soup'

In [3]:
def get_page(path):
    url = "http://quotes.toscrape.com%s" % path
    response = requests.get(url)
    text = response.text
    return BeautifulSoup(text,'html.parser')

Check if the function works.

In [4]:
soup = get_page("/page/1/")
print_quotes(soup)

Albert Einstein      “The world as we have created it is a process of o
J.K. Rowling         “It is our choices, Harry, that show what we truly
Albert Einstein      “There are only two ways to live your life. One is
Jane Austen          “The person, be it gentleman or lady, who has not 
Marilyn Monroe       “Imperfection is beauty, madness is genius and it'
Albert Einstein      “Try not to become a man of success. Rather become
André Gide           “It is better to be hated for what you are than to
Thomas A. Edison     “I have not failed. I've just found 10,000 ways th
Eleanor Roosevelt    “A woman is like a tea bag; you never know how str
Steve Martin         “A day without sunshine is like, you know, night.”


---

### define a function that finds the next page
return None if there is no next page

If we inspect the Next-button, we'll see that next page is in the **href** attribute of the a-tag.  
And this contained in a li-tag with a CSS-class **next**.  

```<li class="next">
    <a href="/page/2/">
        Next
        <span aria-hidden="true">
            →
        </span>
    </a>
</li>```

### define a function that finds the next page
return None if there is no next page

In [6]:
def get_next_page(soup):
    # Find link to the next page
    li = soup.find('li', {"class": "next"})
    
    if li is None:
        # No next page
        return None
    else:
        # Get the path for the next page
        return li.findNext('a')['href']

Again, check if this works.

In [9]:
soup = get_page("/page/1/")
print(get_next_page(soup))

# We happen to know there are only 10 pages.
soup = get_page("/page/10/")
print(get_next_page(soup))

/page/2/
None


---

### Putting it all together

Get a page, print all the quotes.
Get the next page, terminate when there's no next page.

<pre>
page = "/page/1/"

while True:
    # Get the 'soup'
    soup = get_page(page)
    
    # Print the quotes
    print_quotes(soup)
    
    # Get the next page
    page = get_next_page(soup)
    
    # No new page, we're done
    if page is None:
        break   
        
</pre>

In [10]:
page = "/page/1/"
while True:
    soup = get_page(page)
    print_quotes(soup)
    page = get_next_page(soup)
    if page is None:
        break   

Albert Einstein      “The world as we have created it is a process of o
J.K. Rowling         “It is our choices, Harry, that show what we truly
Albert Einstein      “There are only two ways to live your life. One is
Jane Austen          “The person, be it gentleman or lady, who has not 
Marilyn Monroe       “Imperfection is beauty, madness is genius and it'
Albert Einstein      “Try not to become a man of success. Rather become
André Gide           “It is better to be hated for what you are than to
Thomas A. Edison     “I have not failed. I've just found 10,000 ways th
Eleanor Roosevelt    “A woman is like a tea bag; you never know how str
Steve Martin         “A day without sunshine is like, you know, night.”
Marilyn Monroe       “This life is what you make it. No matter what, yo
J.K. Rowling         “It takes a great deal of bravery to stand up to o
Albert Einstein      “If you can't explain it to a six year old, you do
Bob Marley           “You may not be her first, her last, or her