## Module 3 Project: Web Scraping Bob's Bookstore
By: Ken Wilson

Bob's Bookstore is a fake business that sells books to customers via an online platform. The store mostly deals with books that have animal themes, and all of the information about the store's books can be found on the store's website located at the URL [https://btech-data-analytics.github.io/bridgerland-technical-college/home.html](https://btech-data-analytics.github.io/bridgerland-technical-college/home.html)

Bob has hired you to perform an analysis on the books that he sells in his store. However, Bob doesn't keep any CSV or Excel files on his computer, but simply adds and removes books from his webpage as needed.

Create a web scraper to gather the data from the bookstore section of Bob's Bookstore into a pandas dataframe. Then, use the dataframe to answer the four questions below:

    Which author has the most books listed at Bob's Bookstore?
    Which is the most popular topic among books at Bob's Bookstore?
    Which topic of books is the most expensive, on average?
    Which topic of books has the most pages, on average?

In [1]:
import pandas as pd
import requests
from bs4 import BeautifulSoup

In [2]:
response = requests.get('https://btech-data-analytics.github.io/bridgerland-technical-college/bookstore.html').text

In [3]:
soup = BeautifulSoup(response, 'html.parser')

In [4]:
ISBN = []
Title = []
Author = []
Language = []
Pages = []
Topic = []
Price = []

for book in soup.find_all('tr', class_="book"):
    ISBN.append(book.find_all('td')[0].text)
    Title.append(book.find_all('td')[1].text)
    Author.append(book.find_all('td')[2].text)
    Language.append(book.find_all('td')[3].text)
    Pages.append(int(book.find_all('td')[4].text))   ### Convert to int before appending
    Topic.append(book.find_all('td')[5].text)
    Price.append(float(book.find_all('td')[6].text.replace('$','')))   ### Convert to float before appending


In [5]:
df = pd.DataFrame({
    'ISBN' : ISBN,
    'Title' : Title,
    'Author' : Author,
    'Language' : Language,
    'Pages' : Pages,
    'Topic' : Topic,
    'Price' : Price
})

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15 entries, 0 to 14
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   ISBN      15 non-null     object 
 1   Title     15 non-null     object 
 2   Author    15 non-null     object 
 3   Language  15 non-null     object 
 4   Pages     15 non-null     int64  
 5   Topic     15 non-null     object 
 6   Price     15 non-null     float64
dtypes: float64(1), int64(1), object(5)
memory usage: 972.0+ bytes


In [7]:
df

Unnamed: 0,ISBN,Title,Author,Language,Pages,Topic,Price
0,978-1234567890,Whiskers of Wisdom: Tales from Feline Philosop...,Penelope Wainwright,English,256,Cats,19.99
1,978-2345678901,Purrfectly Pawesome: A Cat's Life,Jasper Sterling,English,192,Cats,15.99
2,978-3456789012,Cat Tales: Adventures in Whiskerland,Penelope Wainwright,English,320,Cats,21.99
3,978-4567890123,The Enigmatic Paws: Mysteries of Meowville,Maximilian Thorne,English,288,Cats,17.99
4,978-5678901234,Cats in Wonderland,Isadora Harrington,English,224,Cats,16.99
5,978-6789012345,Whisker Wisdom: Life Lessons from Feline Sages,Penelope Wainwright,English,288,Cats,20.99
6,978-7890123456,Catnip Chronicles: A Purrfect Journey,Jasper Sterling,English,192,Cats,14.99
7,978-8901234567,Cat-astrophe: Tales of Misadventures,Celeste Nightshade,English,240,Cats,18.99
8,978-9012345678,The Cat's Whisker: A Feline Fantasy,Penelope Wainwright,English,208,Cats,16.99
9,978-0123456789,Fur and Friendship: Stories of Feline Companions,Seraphina Montague,English,176,Cats,12.99


In [8]:
df.describe()

Unnamed: 0,Pages,Price
count,15.0,15.0
mean,244.266667,20.723333
std,48.17745,5.091543
min,176.0,12.99
25%,200.0,16.99
50%,240.0,19.99
75%,288.0,23.99
max,320.0,29.99


***

 
### **Question 1**
Which author has the most books listed at Bob's Bookstore?
Group of answer choices
* Penelope Wainwright
* Seraphina Montague
* Benjamin Barkley
* Sophie Shepherd

In [9]:
df['Author'].value_counts()

Author
Penelope Wainwright    4
Jasper Sterling        2
Benjamin Barkley       2
Maximilian Thorne      1
Celeste Nightshade     1
Isadora Harrington     1
Seraphina Montague     1
Sophie Shepherd        1
Oliver Obedience       1
Ruby Ruffington        1
Name: count, dtype: int64

### Answer: **Penelope Wainwright**
***

### **Question 2**
Which is the most popular topic among books at Bob's Bookstore (which topic occurs most frequently)?
Group of answer choices
* Horses
* Cats
* Dogs
* Rabbits

In [10]:
df['Topic'].value_counts()

Topic
Cats    10
Dogs     5
Name: count, dtype: int64

### Answer: **Cats**
***

### **Question 3**
Which topic of books is the most expensive, on average?
Group of answer choices
* Cows
* Dogs
* Cats
* Horses

In [11]:
df[['Topic', 'Price']].groupby('Topic').agg(avgPrice = ('Price', 'mean')).sort_values(by = 'avgPrice', ascending = False).reset_index()

Unnamed: 0,Topic,avgPrice
0,Dogs,26.59
1,Cats,17.79


### Answer: **Dogs**
***

### **Question 4**
Which topic of book has the most pages, on average?
Group of answer choices
* Dogs
* Rabbits
* Cats
* Pigs

In [12]:
df[['Topic', 'Pages']].groupby('Topic').agg(avgPages = ('Pages', 'mean')).sort_values(by = 'avgPages', ascending = False).reset_index()

Unnamed: 0,Topic,avgPages
0,Dogs,256.0
1,Cats,238.4


### Answer: **Dogs**
***