# EDA on Bob's Bookstore

Jessica Felts

### Overview & About the Data
Bob's Bookstore is a store that deals mostly with books that have animal themes. Bob, the owner, has hired an analyst to perform an analysis on the books that he sells in his store. However, Bob doesn't keep any CSV or Excel files on his computer, but simply adds and removes books from his webpage as needed. Therefore, web scraping will need to be performed, followed by the data analysis.

### Analysis
The following questions will be answered through this EDA:
1. Which author has the most books listed at Bob's Bookstore?
2. Which is the most popular topic among books at Bob's Bookstore?
3. Which topic of books is the most expensive, on average?
4. Which topic of books has the most pages, on average?

The questions delineated below will in turn answer the questions above, and are a series of steps through the analysis. A final analysis will be given at the end to summarize the findings.

In [1]:
# Here I am importing the libraries that will be necessary to scrape the website for data, as well as to analyze the data.

import pandas as pd
import requests
from bs4 import BeautifulSoup

In [2]:
response = requests.get('https://btech-data-analytics.github.io/bridgerland-technical-college/bookstore.html').text

In [3]:
soup = BeautifulSoup(response, 'html.parser')

In [9]:
# In this code, I am looking at the first "set" of data to determine what information is in the table, 
    # and how I want to therefore save it in a new Pandas dataframe.

soup.find('table').find_all('tr', class_='book')[0]

<tr class="book">
<td>978-1234567890</td>
<td>Whiskers of Wisdom: Tales from Feline Philosophers</td>
<td>Penelope Wainwright</td>
<td>English</td>
<td>256</td>
<td>Cats</td>
<td>$19.99</td>
<td><button>Buy now</button></td>
</tr>

In [10]:
# In this code, I am using a for-loop to start the process of creading a Pandas dataframe.

ISBN = []
Title = []
Author = []
Language = []
Pages = []
Topic = []
Price = []

for book in soup.find('table').find_all('tr', class_='book'):
    ISBN.append(book.find_all('td')[0].text)
    Title.append(book.find_all('td')[1].text)
    Author.append(book.find_all('td')[2].text)
    Language.append(book.find_all('td')[3].text)
    Pages.append(book.find_all('td')[4].text)
    Topic.append(book.find_all('td')[5].text)
    Price.append(book.find_all('td')[6].text)

In [12]:
# The code below will take the data I collected from the code above and turn it into a Pandas dataframe.

df = pd.DataFrame({
    'ISBN': ISBN,
    'Title': Title,
    'Author': Author,
    'Language': Language,
    'Pages': Pages,
    'Topic': Topic,
    'Price': Price
})

In [13]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15 entries, 0 to 14
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   ISBN      15 non-null     object
 1   Title     15 non-null     object
 2   Author    15 non-null     object
 3   Language  15 non-null     object
 4   Pages     15 non-null     object
 5   Topic     15 non-null     object
 6   Price     15 non-null     object
dtypes: object(7)
memory usage: 972.0+ bytes


In [21]:
# The code below will turn the appropriate columns from "object" types intot the appropriate "integer" or "float"
    # data types. Additionally, I removed the dollar sign from the Price column in order to turn the data in that
    # column into a "float" data type.

df['Pages'] = df['Pages'].astype('int64')
df['Price'] = df['Price'].replace(r'[\$,]', '', regex=True).astype(float)

In [20]:
df.head()

Unnamed: 0,ISBN,Title,Author,Language,Pages,Topic,Price
0,978-1234567890,Whiskers of Wisdom: Tales from Feline Philosop...,Penelope Wainwright,English,256,Cats,19.99
1,978-2345678901,Purrfectly Pawesome: A Cat's Life,Jasper Sterling,English,192,Cats,15.99
2,978-3456789012,Cat Tales: Adventures in Whiskerland,Penelope Wainwright,English,320,Cats,21.99
3,978-4567890123,The Enigmatic Paws: Mysteries of Meowville,Maximilian Thorne,English,288,Cats,17.99
4,978-5678901234,Cats in Wonderland,Isadora Harrington,English,224,Cats,16.99


In [29]:
# Question 1: Which author has the most books listed at Bob's Bookstore?
    # Penelope Wainwright has the most books listed at 4 books.

df['Author'].value_counts()

Author
Penelope Wainwright    4
Jasper Sterling        2
Benjamin Barkley       2
Maximilian Thorne      1
Celeste Nightshade     1
Isadora Harrington     1
Seraphina Montague     1
Sophie Shepherd        1
Oliver Obedience       1
Ruby Ruffington        1
Name: count, dtype: int64

In [30]:
df[df['Author'] == 'Penelope Wainwright']

Unnamed: 0,ISBN,Title,Author,Language,Pages,Topic,Price
0,978-1234567890,Whiskers of Wisdom: Tales from Feline Philosop...,Penelope Wainwright,English,256,Cats,19.99
2,978-3456789012,Cat Tales: Adventures in Whiskerland,Penelope Wainwright,English,320,Cats,21.99
5,978-6789012345,Whisker Wisdom: Life Lessons from Feline Sages,Penelope Wainwright,English,288,Cats,20.99
8,978-9012345678,The Cat's Whisker: A Feline Fantasy,Penelope Wainwright,English,208,Cats,16.99


In [25]:
# Question 2: Which is the most popular topic among books at Bob's Bookstore?
    # Assuming the question is referring to what is the most common topic, the answer is "Cats" at 10 listings.
    # Conversely, the topic "Dogs" only have 5 listings.

df['Topic'].value_counts()

Topic
Cats    10
Dogs     5
Name: count, dtype: int64

In [26]:
# Question 3: Which topic of books is the most expensive, on average?
    # The most expensive topic on average is "Dogs" at $26.59. 
    # Conversely, the least expensive topic is "Cats" at $17.79. 

df.groupby('Topic')['Price'].mean()

Topic
Cats    17.79
Dogs    26.59
Name: Price, dtype: float64

In [27]:
# Question 4: Which topic of books has the most pages, on average?
    # The topic with the most number of pages on average is "Dogs" at 256 pages. 
    # Conversely, the topic with the least number of pages on average is "Cats" at 238.4 pages.

df.groupby('Topic')['Pages'].mean()

Topic
Cats    238.4
Dogs    256.0
Name: Pages, dtype: float64

# Final analysis

By looking over all the data above, and referring to the questions in the intro, a number of trends can be observed: 
1. The author with the most books listed--a total of 4 books--is Penelope Wainwright. These titles are "Whiskers of Wisdom: Tales from Feline Philosophers", "Cat Tales: Adventures in Whiskerland", "Whisker Wisdom: Life Lessons from Feline Sages", and "The Cat's Whisker: A Feline Fantasy".
2. The most common book topic at the bookstore is "Cats" at 10 listings. The only other topic of "Dogs" has 5 listings.
3. The most expensive topic on average is "Dogs" at $26.59. Conversely, the least expensive topic is "Cats" at $17.79.
4. The topic with the most number of pages on average is "Dogs" at 256 pages. Conversely, the topic with the least number of pages on average is "Cats" at 238.4 pages.

By reviewing the data above, it seems that the most common and affordable topic at Bob's Bookstore is "Cats", with books by Penelope Wainwright to be the most common as well. However, without sales information from Bob's Bookstore, it is hard to make management decisions regarding what types of books should be stocked. However, to have a simple dataset of what books are in stock as well as more specific data about each book, this dataset will suffice.