## Problem 1
Using the code from the Basic Python Roundup lecture notebook, create three functions:

1. `scrape_book_results_page(page_num, headers)`: This function takes a page number and a headers dictionary as arguments and returns a dictionary with the following keys:
    - `page_url`: The URL of the books results page
    - `response`: The Response object of that page
    - `soup`: The BeautifulSoup object created from the source code
    - `book_urls`: A list of the URLs for each book on this page


2. `scrape_book_product_page(book_product_url, headers)`: This function takes a book product URL (the URL for the book product page) and a headers dictionary as arguments and returns a dictionary with the following keys:
    - `book_url`: The URL of the book product page 
    - `response`: The Response object of that page
    - `soup`: The BeautifulSoup object created from the source code


3. `scrape_book_range(page_range, filename, headers)`: This function takes a page range (`range` object), a filename for the a CSV file, and a headers dictionary as arguments and will use the other two functions to scrape the book information for every book found in the specified page range. This book information should be saved as separate rows in a CSV file (see if you can include the CSV file writing code in this function).

Make sure to include proper documentation (docstring) for your code.

**Before writing to CSV**, make the following changes to the book data:

1. Convert `price_in_pounds` value to `float` type.
2. Convert `avg_rating` to `int` type.
3. Extract the number of available books from the `num_books_available` string and convert to `int` type.

In [63]:
import requests
from bs4 import BeautifulSoup as soup
import csv
import re

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36'}

def scrape_book_results_page(page_num, headers=headers):
    """scrape_book_results_page scrapes a results page for individual books' urls"""
    
    results_page = f'https://books.toscrape.com/catalogue/page-{page_num}.html'
    response = requests.get(results_page, headers)
    text = soup(response.text, 'html.parser')
    book_divs = text.find_all('div', attrs={'class': 'image_container'})
    book_urls = [tag.find('a').get('href') for tag in book_divs]
    complete_urls = [f'https://books.toscrape.com/catalogue/{end_url}' for end_url in book_urls]
    
    
    return {"page_url": results_page, "response": response, "soup": soup, "book_urls": complete_urls}

def scrape_book_product_page(book_url, headers=headers):
    """scrape_book_product_page scrapes each book's product page"""
    response = requests.get(book_url, headers)
    return {
        "book_url": book_url,
        "response": response,
        "soup": soup
    }

def remaining_availability_converter(avail_str):
    match = re.findall('[0-9+]', avail_str)
    joined_ints = ''.join(match)
    return int(joined_ints)

def rating_helper(rating_str):
    """takes a rating string and converts to integer"""
    if rating_str == "Zero":
        return 0
    elif rating_str == "One":
        return 1
    elif rating_str == "Two":
        return 2
    elif rating_str == "Three":
        return 3
    elif rating_str == "Four":
        return 4
    elif rating_str == "Five":
        return 5

def scrape_book_range(page_range, filename, headers=headers):
    """scrape_book_range uses scrape_book_product_range and scrape_book_results_page to scrape over a given page_range. page_range must start at 1"""
    book_page_urls = []
    book_list = []
    [book_page_urls.extend(scrape_book_results_page(i)["book_urls"])for i in page_range]
    for url in book_page_urls:
        book_dict = {}
        book = soup(scrape_book_product_page(url)["response"].text, 'html.parser')
#         find properties and store
        title = book.find('div', attrs = {'class': 'col-sm-6 product_main'}).find('h1').string
        price_in_pounds = book.find('p', attrs = {'class':'price_color'}).string[2:]
        avg_rating_tag = book.find(lambda tag: 'star-rating' in tag.get('class') if tag.get('class') else False)
        avg_rating = avg_rating_tag.get('class')[1]
        li_tag = book.find('ul', attrs={'class':'breadcrumb'}).find_all('li')[2]
        genre = li_tag.find('a').string
        tr_tag = book.find('table', attrs = {'class':'table table-striped'}).find_all('tr')[0]
        upc = tr_tag.find('td').string
        num_books_avail = book.find('p', attrs = {'class':'instock availability'}).get_text()
        book_dict['title'] = title
        book_dict['price_in_pounds'] = float(price_in_pounds)
        book_dict['avg_rating'] = rating_helper(avg_rating)
        book_dict['genre'] = genre
        book_dict['upc'] = upc
        book_dict['num_books_available'] = remaining_availability_converter(num_books_avail)
        book_list.append(book_dict)
        
    with open (filename, 'w', encoding = 'utf-8', newline='') as csvfile:
        book_writer = csv.writer(csvfile)
        table_headers = ['title', 'price_in_pounds', 'avg_rating', 'genre', 'upc', 'num_books_available']
        book_writer.writerow(table_headers)
        for book_dict in book_list:
            book_writer.writerow(book_dict.values())
        
        
scrape_book_range(range(1, 3), 'results.csv')

False

## Problem 2
Take the code you've written above and create a module called `books2scrape`. This is just a file called `books2scrape` with a `py` extension. Make sure the module is located in your homework notebook working directory. Once you've created this module, import it and try to run the final function again.

What is the benefit of moving this code to a module? How does the functionality of a module compare to a class?

In [67]:
# Your code here
import books2scrape

books2scrape.scrape_book_range(range(1, 5), 'results.csv')

# Written Answer

don't have to see all the ugly work on how it works - just call the methods we need