# Web Scrapping lab

In this lab you will scrappe this [website](https://books.toscrape.com/) of books.

You have to create a Pandas DataFrame with all the books listed in the page. Each row of the DataFrame should contain information of each book. In particular, the DataFrmae must contain:

* category
* title
* price
* stock availability
* star rating (number of stars)
* description
* UPC

Happy scrapping!



# Server verification

Load the needed libraries, and make sure thar you can obtain the correct status code.

In [346]:
import requests
from bs4 import BeautifulSoup

url = "https://books.toscrape.com/"
response = requests.get(url)

if response.status_code == 200:
        print("All good!")

All good!


# Book categories

Create the code to collect the **relative urls** from the left panel to obtain a list with all the book categories.

In [347]:
soup = BeautifulSoup(response.content, "html.parser")

categories = []

for url in soup.find(class_='nav nav-list').find('li').find_all('a'):
    category = url.get('href')
    categories.append(category)
categories.remove('catalogue/category/books_1/index.html')
categories

['catalogue/category/books/travel_2/index.html',
 'catalogue/category/books/mystery_3/index.html',
 'catalogue/category/books/historical-fiction_4/index.html',
 'catalogue/category/books/sequential-art_5/index.html',
 'catalogue/category/books/classics_6/index.html',
 'catalogue/category/books/philosophy_7/index.html',
 'catalogue/category/books/romance_8/index.html',
 'catalogue/category/books/womens-fiction_9/index.html',
 'catalogue/category/books/fiction_10/index.html',
 'catalogue/category/books/childrens_11/index.html',
 'catalogue/category/books/religion_12/index.html',
 'catalogue/category/books/nonfiction_13/index.html',
 'catalogue/category/books/music_14/index.html',
 'catalogue/category/books/default_15/index.html',
 'catalogue/category/books/science-fiction_16/index.html',
 'catalogue/category/books/sports-and-games_17/index.html',
 'catalogue/category/books/add-a-comment_18/index.html',
 'catalogue/category/books/fantasy_19/index.html',
 'catalogue/category/books/new-adul

# Books in a given category

Use. web scrapping and list comprehension to obtain the **absolute** url of each book to be scraped

In [348]:
absolute_url = []

for url in categories:
    absolute = "https://books.toscrape.com/" + url
    absolute_url.append(absolute)
absolute_url

['https://books.toscrape.com/catalogue/category/books/travel_2/index.html',
 'https://books.toscrape.com/catalogue/category/books/mystery_3/index.html',
 'https://books.toscrape.com/catalogue/category/books/historical-fiction_4/index.html',
 'https://books.toscrape.com/catalogue/category/books/sequential-art_5/index.html',
 'https://books.toscrape.com/catalogue/category/books/classics_6/index.html',
 'https://books.toscrape.com/catalogue/category/books/philosophy_7/index.html',
 'https://books.toscrape.com/catalogue/category/books/romance_8/index.html',
 'https://books.toscrape.com/catalogue/category/books/womens-fiction_9/index.html',
 'https://books.toscrape.com/catalogue/category/books/fiction_10/index.html',
 'https://books.toscrape.com/catalogue/category/books/childrens_11/index.html',
 'https://books.toscrape.com/catalogue/category/books/religion_12/index.html',
 'https://books.toscrape.com/catalogue/category/books/nonfiction_13/index.html',
 'https://books.toscrape.com/catalogue

# Book details

Create a Python function that given a book_url as an input returns a dictionary with the following structure:

```Python
{"Title": title, "Price": price, "Availability": availability, "Rating": rating, "Description": description, "UPC": upc}
```

where `description` should contain the book's summary given in the Product description, and the values are the book's associated information.

In [349]:
import pandas as pd

def book_dictionary(url): 
    response = requests.get(url)
    soup = BeautifulSoup(response.content, "html.parser")
    
    title = []
    price = []
    availability = []
    rating = []
    description = []
    upc = []
    
    for component in components:
        # Extract title name
        title_name = soup.find('h1').text
        title.append(title_name)
    
        #Extract price name
        book_price = soup.find(class_="price_color").text.strip()
        price.append(book_price)
    
        #Extract availability
        available = soup.find(class_="instock availability").text.strip()
        availability.append(available)

        #Extract rating
        rating_tag = soup.find('p', class_='star-rating')
        rate = rating_tag['class'][1] if rating_tag else None
        rating.append(rate)

        #Extract description
        product = soup.find(class_="product_page")
        desc = product.select('p')[3]
        description.append(desc)

        #Extract upc
        code = soup.find(class_='table table-striped').find('td').text
        upc.append(code)
        
    books = {"Title": title, "Price": price, "Availability": availability, "Rating": rating, "Description": description, "UPC": upc}

    return books

In [298]:
# Check the function

book_url = "https://books.toscrape.com/catalogue/tipping-the-velvet_999/index.html"
book_dictionary(book_url)

{'Title': ['Tipping the Velvet'],
 'Price': ['£53.74'],
 'Availability': ['In stock (20 available)'],
 'Rating': ['One'],
 'Description': [<p>"Erotic and absorbing...Written with starling power."--"The New York Times Book Review " Nan King, an oyster girl, is captivated by the music hall phenomenon Kitty Butler, a male impersonator extraordinaire treading the boards in Canterbury. Through a friend at the box office, Nan manages to visit all her shows and finally meet her heroine. Soon after, she becomes Kitty's "Erotic and absorbing...Written with starling power."--"The New York Times Book Review " Nan King, an oyster girl, is captivated by the music hall phenomenon Kitty Butler, a male impersonator extraordinaire treading the boards in Canterbury. Through a friend at the box office, Nan manages to visit all her shows and finally meet her heroine. Soon after, she becomes Kitty's dresser and the two head for the bright lights of Leicester Square where they begin a glittering career as m

# Collect and store all the information from the books in a Pandas DataFrame

Start with the following dictionary:

```python
books_dict = {"Title": [], "Price": [], "Availability": [], "Rating": [], "Description": [], "UPC": [], "Category": [] }
```

Then, iterate over all the categories and all the books in a given category to collect any book information using the previous function. Fill the previous dictionary with the information about each book.

Show the first five rows of the previous final Pandas DataFrame.

Tip: You can use the function `tqdm` from the library `tqdm` to show a progress bar if in iterable of a for loop as shown below :wink: :

```python
from tqdm import tqdm

for elem in tqdm(iterable):
    # some code
```





In [289]:
url = "https://books.toscrape.com/catalogue/category/books/index.html"
    
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")

In [351]:
books_dict = {"Title": [], "Price": [], "Availability": [], "Rating": [], "Description": [], "UPC": [], "Category": [] }

book_url = []
category = []

#absolute_url = "https://books.toscrape.com/catalogue/category/books/travel_2/index.html"
# Collecting book urls from absolute_url categories
for categories in absolute_url:
    response = requests.get(categories)
    soup = BeautifulSoup(response.content, "html.parser")
    url = soup.find_all(class_="product_pod")
    category = soup.find('h1').text
    print(category)
    #book_url = "https://books.toscrape.com/catalogue/" + soup.find(class_="product_pod").find('a')['href'][9:]
    for i in url:
        book = i.find('a')['href'][9:]
        book_url = "https://books.toscrape.com/catalogue/" + book
        book_dict = book_dictionary(book_url)
        book_dict['Category'] = category
        for key in books_dict:
            books_dict[key] += book_dict[key]

Travel
Mystery
Historical Fiction
Sequential Art
Classics
Philosophy
Romance
Womens Fiction
Fiction
Childrens
Religion
Nonfiction
Music
Default
Science Fiction
Sports and Games
Add a comment
Fantasy
New Adult
Young Adult
Science
Poetry
Paranormal
Art
Psychology
Autobiography
Parenting
Adult Fiction
Humor
Horror
History
Food and Drink
Christian Fiction
Business
Biography
Thriller
Contemporary
Spirituality
Academic
Self Help
Historical
Christian
Suspense
Short Stories
Novels
Health
Politics
Cultural
Erotica
Crime
