# Data Analysis with Python
- Basic data analysis using python programming language.

## Data Types
### Qualitative Data
- represents the characteristics of the value.
- doesn't have mathematical meaning.
#### Types of Qualitative Data
- Nominal : No quantitative value and meaning wouldn't change.
- Ordinal : Has order and it's usually measured by non-numeric features.
### Quantitative Data
- numerical data
- value is measured
#### Types of Quantitative Data
- Discrete : values are distinct and seperated. This data can be only counted.
- Continuous : values are continuous and can only be measured.

In [2]:
import pandas as pd

# importing csv with pd
mv_data = pd.read_csv('bestsellers.csv')

# show first 10 data
# mv_data.head(10)

# show last 10 data
# mv_data.tail(10)

# display shape, no. of col. and rows in the data
mv_data.shape

(375, 8)

In [2]:
# display null values

# mv_data[mv_data['Date'].isna()]
mv_data = mv_data.dropna()
mv_data.isnull().sum()

Name           0
Author         0
User Rating    0
Reviews        0
Price          0
Year           0
Genre          0
Type           0
dtype: int64

In [3]:
# summary statistics for all numeric columns
mv_data.describe()

Unnamed: 0,User Rating,Reviews,Price,Year
count,347.0,347.0,347.0,347.0
mean,4.607493,9709.674352,13.095101,2013.786744
std,0.227582,10831.206514,10.093187,3.360868
min,3.3,37.0,0.0,2009.0
25%,4.5,3384.5,8.0,2011.0
50%,4.6,6310.0,12.0,2014.0
75%,4.8,11259.5,16.0,2017.0
max,4.9,87841.0,105.0,2019.0


In [4]:
# Sorting the dataframe.
mv_data = mv_data.sort_values(by='User Rating', ascending=False)
mv_data.head(10)

Unnamed: 0,Name,Author,User Rating,Reviews,Price,Year,Genre,Type
128,Hamilton: The Revolution,Lin-Manuel Miranda,4.9,5867.0,54.0,2016.0,Non Fiction,Non Fiction
216,Rush Revere and the Brave Pilgrims: Time-Trave...,Rush Limbaugh,4.9,7150.0,12.0,2013.0,Fiction,Fiction
135,Harry Potter and the Sorcerer's Stone: The Ill...,J.K. Rowling,4.9,10052.0,22.0,2016.0,Fiction,Fiction
134,Harry Potter and the Prisoner of Azkaban: The ...,J.K. Rowling,4.9,3146.0,30.0,2017.0,Fiction,Fiction
133,Harry Potter and the Goblet of Fire: The Illus...,J. K. Rowling,4.9,7758.0,18.0,2019.0,Fiction,Fiction
130,Harry Potter and the Chamber of Secrets: The I...,J.K. Rowling,4.9,19622.0,30.0,2016.0,Fiction,Fiction
80,Dog Man and Cat Kid: From the Creator of Capta...,Dav Pilkey,4.9,5062.0,6.0,2018.0,Fiction,Fiction
43,"Brown Bear, Brown Bear, What Do You See?",Bill Martin Jr.,4.9,14344.0,5.0,2017.0,Fiction,Fiction
124,"Goodnight, Goodnight Construction Site (Hardco...",Sherri Duskey Rinker,4.9,7038.0,7.0,2012.0,Fiction,Fiction
217,Rush Revere and the First Patriots: Time-Trave...,Rush Limbaugh,4.9,3836.0,12.0,2014.0,Fiction,Fiction


In [5]:
# aggregating the dataframe - series

mv_data.groupby('Name')['Genre'].sum()

Name
10-Day Green Smoothie Cleanse                                                             Non Fiction
11/22/63: A Novel                                                                             Fiction
12 Rules for Life: An Antidote to Chaos                                                   Non Fiction
1984 (Signet Classics)                                                                        Fiction
5,000 Awesome Facts (About Everything!) (National Geographic Kids)                        Non Fiction
                                                                                             ...     
Winter of the World: Book Two of the Century Trilogy                                          Fiction
Women Food and God: An Unexpected Path to Almost Everything                               Non Fiction
Wonder                                                                                        Fiction
Wrecking Ball (Diary of a Wimpy Kid Book 14)                                 

In [6]:
# aggregating the dataframe - dataframe

mv_data.groupby('Name', as_index=False).agg({'Genre':'sum'})

Unnamed: 0,Name,Genre
0,10-Day Green Smoothie Cleanse,Non Fiction
1,11/22/63: A Novel,Fiction
2,12 Rules for Life: An Antidote to Chaos,Non Fiction
3,1984 (Signet Classics),Fiction
4,"5,000 Awesome Facts (About Everything!) (Natio...",Non Fiction
...,...,...
342,Winter of the World: Book Two of the Century T...,Fiction
343,Women Food and God: An Unexpected Path to Almo...,Non Fiction
344,Wonder,Fiction
345,Wrecking Ball (Diary of a Wimpy Kid Book 14),Fiction


In [7]:
# Creating new colum

mv_data['User Rating %'] = mv_data['User Rating'] / 5 * 100
mv_data.head()

Unnamed: 0,Name,Author,User Rating,Reviews,Price,Year,Genre,Type,User Rating %
128,Hamilton: The Revolution,Lin-Manuel Miranda,4.9,5867.0,54.0,2016.0,Non Fiction,Non Fiction,98.0
216,Rush Revere and the Brave Pilgrims: Time-Trave...,Rush Limbaugh,4.9,7150.0,12.0,2013.0,Fiction,Fiction,98.0
135,Harry Potter and the Sorcerer's Stone: The Ill...,J.K. Rowling,4.9,10052.0,22.0,2016.0,Fiction,Fiction,98.0
134,Harry Potter and the Prisoner of Azkaban: The ...,J.K. Rowling,4.9,3146.0,30.0,2017.0,Fiction,Fiction,98.0
133,Harry Potter and the Goblet of Fire: The Illus...,J. K. Rowling,4.9,7758.0,18.0,2019.0,Fiction,Fiction,98.0


In [4]:
mv_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 375 entries, 0 to 374
Data columns (total 8 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Name         347 non-null    object 
 1   Author       347 non-null    object 
 2   User Rating  347 non-null    float64
 3   Reviews      347 non-null    float64
 4   Price        347 non-null    float64
 5   Year         347 non-null    float64
 6   Genre        347 non-null    object 
 7   Type         347 non-null    object 
dtypes: float64(4), object(4)
memory usage: 23.6+ KB


In [5]:
mv_data.isnull().sum()

Name           28
Author         28
User Rating    28
Reviews        28
Price          28
Year           28
Genre          28
Type           28
dtype: int64

In [9]:
mv_data[mv_data['Type'].isna()]

Unnamed: 0,Name,Author,User Rating,Reviews,Price,Year,Genre,Type
1,,,,,,,,
11,,,,,,,,
15,,,,,,,,
22,,,,,,,,
26,,,,,,,,
33,,,,,,,,
40,,,,,,,,
46,,,,,,,,
50,,,,,,,,
53,,,,,,,,
