# Handling Pagination and Navigation

<div style="text-align:center">
    <img src="https://www.seoptimer.com/storage/images/2019/08/image3-3.png" width=600>
</div>

It is common to have to navigate through a series of pages to scrape data from a website. Perhaps you want to scrape the results of a search, or the items in a category. Most websites use pagination to handle this, and the pagination is often handled by a series of links to the next page, previous page, first page, last page, etc.

This notebook will help you understand how to handle pagination and navigation using Beautiful Soup and Python.

## Why is there a need for pagination?

Pagination is used to break up a large set of data into smaller, more manageable chunks. This is done to improve the user experience by making it easier to read and navigate through the data. It also helps to reduce the load on the server by only loading a small amount of data at a time.

## How is pagination implemented?

There are several ways to navigate through several pages. The most common methods are:

1. **Next and Previous Buttons**: This is the most common method of pagination. It is used to navigate through a series of pages by clicking on the next or previous button.

2. **Infinite Scroll**: This method is used to load more data as the user scrolls down the page.

3. **Load More Button**: This method is used to load more data by clicking on a load more button.

In this notebook, we will focus on the first method. The last two methods requires JavaScript to be executed, which is not possible with Beautiful Soup. We will explore scraping dynamic websites in the other notebooks.


## Next and Previous Buttons

This case is straightforward. Let's begin by fetching and parsing the HTML content of the first page of this [website](https://quotes.toscrape.com/).

In [18]:
import requests
from bs4 import BeautifulSoup

base_url = "https://quotes.toscrape.com/"
response = requests.get(base_url)
soup = BeautifulSoup(response.text, "html.parser")

quotes = soup.find_all("div", class_="quote")
quotes[0]

<div class="quote" itemscope="" itemtype="http://schema.org/CreativeWork">
<span class="text" itemprop="text">“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”</span>
<span>by <small class="author" itemprop="author">Albert Einstein</small>
<a href="/author/Albert-Einstein">(about)</a>
</span>
<div class="tags">
            Tags:
            <meta class="keywords" content="change,deep-thoughts,thinking,world" itemprop="keywords"/>
<a class="tag" href="/tag/change/page/1/">change</a>
<a class="tag" href="/tag/deep-thoughts/page/1/">deep-thoughts</a>
<a class="tag" href="/tag/thinking/page/1/">thinking</a>
<a class="tag" href="/tag/world/page/1/">world</a>
</div>
</div>

Now, let's check if there's indeed a link to next page.

In [12]:
next_tag = soup.find(class_="next")
next_tag

<li class="next">
<a href="/page/2/">Next <span aria-hidden="true">→</span></a>
</li>

In [17]:
next_link = next_tag.find("a")["href"]
next_link

'/page/2/'

There is! I am able to identify the correct tag class by inspecting the HTML content of the page. For some websites, you might need to use the `find_all` method to find all the links and then filter the one you need.

Let's fetch the content of the next page and check if it worked. Since the `href` attribute of the link is a relative path, we need to concatenate it with the base URL of the website.

In [19]:
response = requests.get(base_url + next_link)
soup = BeautifulSoup(response.text, "html.parser")

quotes = soup.find_all("div", class_="quote")
quotes[0]

<div class="quote" itemscope="" itemtype="http://schema.org/CreativeWork">
<span class="text" itemprop="text">“This life is what you make it. No matter what, you're going to mess up sometimes, it's a universal truth. But the good part is you get to decide how you're going to mess it up. Girls will be your friends - they'll act like it anyway. But just remember, some come, some go. The ones that stay with you through everything - they're your true best friends. Don't let go of them. Also remember, sisters make the best friends in the world. As for lovers, well, they'll come and go too. And baby, I hate to say it, most of them - actually pretty much all of them are going to break your heart, but you can't give up because if you give up, you'll never find your soulmate. You'll never find that half who makes you whole and that goes for everything. Just because you fail once, doesn't mean you're gonna fail at everything. Keep trying, hold on, and always, always, always believe in yourself, 

Notice the quote we scraped in the first page is different from the one in the second page. This means we successfully navigated to the next page. Putting it all together:

In [24]:
import requests
from bs4 import BeautifulSoup

BASE_URL = "https://quotes.toscrape.com"


def get_soup(url):
    response = requests.get(url)
    return BeautifulSoup(response.text, "html.parser")


def get_quote_info(quote):
    text = quote.find("span", class_="text").get_text()
    author = quote.find("small", class_="author").get_text()
    author_link = BASE_URL + quote.find("a")["href"]
    return {
        "text": text,
        "author": author,
        "author_link": author_link,
    }


def scrape_quotes(soup):
    quotes = soup.find_all("div", class_="quote")
    return [get_quote_info(quote) for quote in quotes]


def get_next_link(soup):
    next_tag = soup.find(class_="next")
    return next_tag.find("a")["href"] if next_tag else None


soup = get_soup(BASE_URL)
quote_info = scrape_quotes(soup)

next_link = get_next_link(soup)
while next_link:
    soup = get_soup(BASE_URL + next_link)
    quote_info += scrape_quotes(soup)
    next_link = get_next_link(soup)

len(quote_info), quote_info[:3]

(100,
 [{'text': '“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”',
   'author': 'Albert Einstein',
   'author_link': 'https://quotes.toscrape.com/author/Albert-Einstein'},
  {'text': '“It is our choices, Harry, that show what we truly are, far more than our abilities.”',
   'author': 'J.K. Rowling',
   'author_link': 'https://quotes.toscrape.com/author/J-K-Rowling'},
  {'text': '“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”',
   'author': 'Albert Einstein',
   'author_link': 'https://quotes.toscrape.com/author/Albert-Einstein'}])