## Problem 1
Using the code from the Basic Python Roundup lecture notebook, create three functions:

1. `scrape_book_results_page(page_num, headers)`: This function takes a page number and a headers dictionary as arguments and returns a dictionary with the following keys:
    - `page_url`: The URL of the books results page
    - `response`: The Response object of that page
    - `soup`: The BeautifulSoup object created from the source code
    - `book_urls`: A list of the URLs for each book on this page


2. `scrape_book_product_page(book_product_url, headers)`: This function takes a book product URL (the URL for the book product page) and a headers dictionary as arguments and returns a dictionary with the following keys:
    - `book_url`: The URL of the book product page 
    - `response`: The Response object of that page
    - `soup`: The BeautifulSoup object created from the source code


3. `scrape_book_range(page_range, filename, headers)`: This function takes a page range (`range` object), a filename for the a CSV file, and a headers dictionary as arguments and will use the other two functions to scrape the book information for every book found in the specified page range. This book information should be saved as separate rows in a CSV file (see if you can include the CSV file writing code in this function).

Make sure to include proper documentation (docstring) for your code.

**Before writing to CSV**, make the following changes to the book data:

1. Convert `price_in_pounds` value to `float` type.
2. Convert `avg_rating` to `int` type.
3. Extract the number of available books from the `num_books_available` string and convert to `int` type.

In [25]:
import requests
from bs4 import BeautifulSoup as soup
import csv
import re

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36'}

def scrape_book_results_page(page_num, headers=headers):
    """scrape_book_results_page docstring"""
    
    book_page_urls = []
    
    page_url = f'https://books.toscrape.com/catalogue/page-{page_num}.html'
    response = requests.get(page_url, headers = headers)
    text = soup(response.text, 'html.parser')
    
    if response.status_code != 200:
            raise Exception(f'The status code is not 200! It is {response.status_code}.')
    
    book_divs = text.find_all('div', attrs = {'class': 'image_container'})
    book_page_urls = [tag.find('a').get('href') for tag in book_divs]
    
    return book_page_urls

def scrape_book_product_page(book_url, headers=headers):
    """scrape_book_product_page docstring"""
    
    book_dict = {}
    
    book_url = f'https://books.toscrape.com/catalogue/{book_url}'
    response = requests.get(book_url, headers = headers)
    text = soup(response.text, 'html.parser')
    
    if response.status_code != 200:
            raise Exception(f'The status code is not 200! It is {response.status_code}.')
    
    title = text.find('div', attrs = {'class': 'col-sm-6 product_main'}).find('h1').string
    price_in_pounds = text.find('p', attrs = {'class':'price_color'}).string
    avg_rating_tag = text.find(lambda tag: 'star-rating' in tag.get('class') if tag.get('class') else False)
    avg_rating = avg_rating_tag.get('class')[1]
    li_tag = text.find('ul', attrs={'class':'breadcrumb'}).find_all('li')[2]
    genre = li_tag.find('a').string
    tr_tag = text.find('table', attrs = {'class':'table table-striped'}).find_all('tr')[0]
    upc = tr_tag.find('td').string
    num_books_available = text.find('p', attrs = {'class':'instock availability'}).get_text()

    book_dict['title'] = title
    book_dict['price_in_pounds'] = price_in_pounds
    book_dict['avg_rating'] = avg_rating
    book_dict['genre'] = genre
    book_dict['upc'] = upc
    book_dict['num_books_available'] = num_books_available.replace('\n', '').strip()

    return book_dict

def scrape_book_range(page_range, filename, headers=headers):
    """scrape_book_range docstring"""

    book_descriptions = []
    
    for page in range(1,page_range+1):
        book_urls = scrape_book_results_page(page)
        for book in range(len(book_urls)):
            title = scrape_book_product_page(book_urls[book])
            book_descriptions.append(title)
    
    import csv

    with open(filename, 'w', encoding = 'utf-8', newline='') as csvfile:
        book_writer = csv.writer(csvfile)
        headers = ['title', 'price_in_pounds', 'avg_rating', 'genre', 'upc', 'num_books_available']
        book_writer.writerow(headers)
        for book_dict in book_descriptions:
            book_writer.writerow(book_dict[col] for col in headers)

## Problem 2
Take the code you've written above and create a module called `books2scrape`. This is just a file called `books2scrape` with a `py` extension. Make sure the module is located in your homework notebook working directory. Once you've created this module, import it and try to run the final function again.

What is the benefit of moving this code to a module? How does the functionality of a module compare to a class?

In [27]:
import books2scrape

books2scrape.scrape_book_range(page_range, filename)

### Written Answer

It far reduces the clutter of code present in the compiler and works just as efficiently as a class given its complexity. However, if it were to employ more functions being called by the main function (scrape_book_range), then creating a class would be beneficial.