**Data Analysis**

In [1]:
import pandas as pd
import numpy as np
import plotly.express as px
from jupyter_dash import JupyterDash
from dash import dcc
from dash import html
from dash.dependencies import Input, Output
from dash import Dash, dash_table
from dash import dash_table
import dash_table_experiments as dt

In [2]:
df=pd.read_csv('books_scraped.csv')
df.drop(columns=['Unnamed: 0'], inplace=True)

In [3]:
df.head(5)

Unnamed: 0,title,availability,price,url,ups,category
0,A Light in the Attic,Instock,£51.77,https://books.toscrape.com/catalogue/a-light-i...,a897fe39b1053632,Poetry
1,Tipping the Velvet,Instock,£53.74,https://books.toscrape.com/catalogue/tipping-t...,90fa61229261140a,Historical Fiction
2,Soumission,Instock,£50.10,https://books.toscrape.com/catalogue/soumissio...,6957f44c3847a760,Fiction
3,Sharp Objects,Instock,£47.82,https://books.toscrape.com/catalogue/sharp-obj...,e00eb4fd7b871a48,Mystery
4,Sapiens: A Brief History of Humankind,Instock,£54.23,https://books.toscrape.com/catalogue/sapiens-a...,4165285e1663650f,History


In [4]:
top_categories=pd.DataFrame(df['category'].value_counts().head(15)).sort_values(by='category',ascending=True)
fig1 = px.bar(top_categories, orientation='h', title='Top 15 categories by total number of books')
fig1

In [5]:
fig2 = px.box(df, x='category',y="price", orientation='v', title='Price distrivution by category')
fig2.show()

In [6]:
most_expensive=df.sort_values(by='price', ascending=False).head(5)
most_expensive

Unnamed: 0,title,availability,price,url,ups,category
648,The Perfect Play (Play by Play #1),Instock,£59.99,https://books.toscrape.com/catalogue/the-perfe...,9cc207168a03470d,Romance
617,Last One Home (New Beginnings #1),Instock,£59.98,https://books.toscrape.com/catalogue/last-one-...,07e6810fd3236bda,Fiction
860,Civilization and Its Discontents,Instock,£59.95,https://books.toscrape.com/catalogue/civilizat...,396385e3de5d18c3,Psychology
560,The Barefoot Contessa Cookbook,Instock,£59.92,https://books.toscrape.com/catalogue/the-baref...,6478ccb4416e6a5d,Food and Drink
366,The Diary of a Young Girl,Instock,£59.90,https://books.toscrape.com/catalogue/the-diary...,54fc03f1e1d355db,Nonfiction


In [7]:
least_expensive=df.sort_values(by='price', ascending=True).head(5)
least_expensive

Unnamed: 0,title,availability,price,url,ups,category
638,An Abundance of Katherines,Instock,£10.00,https://books.toscrape.com/catalogue/an-abunda...,f36d24c309e87e5b,Young Adult
501,The Origin of Species,Instock,£10.01,https://books.toscrape.com/catalogue/the-origi...,0345872b14f9e774,Science
716,The Tipping Point: How Little Things Can Make ...,Instock,£10.02,https://books.toscrape.com/catalogue/the-tippi...,224fa77d4b248046,Add a comment
84,Patience,Instock,£10.16,https://books.toscrape.com/catalogue/patience_...,9429b4d59c537af5,Sequential Art
302,Greek Mythic History,Instock,£10.23,https://books.toscrape.com/catalogue/greek-myt...,f201f263d8c23f97,Default


**Dash Application**

In [8]:
app = JupyterDash(__name__)

app.layout = html.Div([
    html.H1("Analysis of Webscraped Books"),
    html.H4("Our team worked on a Book Scraper project. First of all, we decided to implement web scraping with the Python library BeautifulSoap to collect information efficiently. Our next step was inspecting the page of the provided book library. We extracted such book's parameters as Name, Availability, Price, URL, UPS, and Category in the 'div' tag. After this, in the Python file, we coded steps for extracting data and storing results into a CSV file. Furthermore, our main purpose is to present our work in a convenient way for users. Exactly that's why we also analyzed provided data and created several graphs."),
    html.H4("In the first figure top 15 books categories were presented. Ignorining 'default' category, we can see that the highest numnber of books are in nonficntion catgeory."),
    dcc.Graph(figure=fig1),
    html.H4("But what might be the reason why there is biggest supply of non-fiction books? Might it be the price? To address this question, let;s take a look of price distribution for different book categories. We can clearly see that non-fiction books are not the cheapest ones. The Spirtuality books seem to have the 'cheapest' curve, whereas Sports and Games seem to be the most expensive ones. "),
    dcc.Graph(figure=fig2),
    html.H4("To go even more deeper than categories, we can also have information about the titles of the most expensive and least expesnive book. The most expensice books are:"),
    dash_table.DataTable(most_expensive.to_dict('records'),[{"name": i, "id": i} for i in df.columns], id='tbl'),
    html.H4("There are 5 the cheapest books:"),
    dash_table.DataTable(least_expensive.to_dict('records'),[{"name": i, "id": i} for i in df.columns], id='tbl'),


])

if __name__ == '__main__':
    app.run_server(debug=True)

Dash app running on:


<IPython.core.display.Javascript object>