# DataFrame Basics Exercise

## Part 1
* Use pandas to read the `bestsellers` dataset into a DataFrame 
* Once you've done that, use Pandas to figure out how many rows and columns the DF has
* Inspect the first 5 rows
* Inspect the first 19 rows
* Inspect the last 5 rows
* Inspect the last 2 rows 
* Which columns (if any) are missing values?
* What datatype did Pandas assign to "User Rating"?
* How many integer columns are in the DataFrame?

In [3]:
import pandas as pd

bestsellers = pd.read_csv("data/bestsellers.csv")

In [6]:
bestsellers.shape

(550, 7)

In [8]:
bestsellers.head()

Unnamed: 0,Name,Author,User Rating,Reviews,Price,Year,Genre
0,10-Day Green Smoothie Cleanse,JJ Smith,4.7,17350,8,2016,Non Fiction
1,11/22/63: A Novel,Stephen King,4.6,2052,22,2011,Fiction
2,12 Rules for Life: An Antidote to Chaos,Jordan B. Peterson,4.7,18979,15,2018,Non Fiction
3,1984 (Signet Classics),George Orwell,4.7,21424,6,2017,Fiction
4,"5,000 Awesome Facts (About Everything!) (Natio...",National Geographic Kids,4.8,7665,12,2019,Non Fiction


In [10]:
bestsellers.tail()

Unnamed: 0,Name,Author,User Rating,Reviews,Price,Year,Genre
545,Wrecking Ball (Diary of a Wimpy Kid Book 14),Jeff Kinney,4.9,9413,8,2019,Fiction
546,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2016,Non Fiction
547,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2017,Non Fiction
548,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2018,Non Fiction
549,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2019,Non Fiction


In [11]:
bestsellers.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 550 entries, 0 to 549
Data columns (total 7 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Name         550 non-null    object 
 1   Author       550 non-null    object 
 2   User Rating  550 non-null    float64
 3   Reviews      550 non-null    int64  
 4   Price        550 non-null    int64  
 5   Year         550 non-null    int64  
 6   Genre        550 non-null    object 
dtypes: float64(1), int64(3), object(3)
memory usage: 30.2+ KB


## Part 2

* The `mount_everest_deaths` dataset has its own index column provided in the dataset.  When importing it, use the existing index column.
* Which columns have zero null values?
* Which column has the most null values?


In [34]:
mount_everest_deaths = pd.read_csv("data/mount_everest_deaths.csv", index_col=0)

In [36]:
mount_everest_deaths.shape

(310, 7)

In [37]:
mount_everest_deaths.info()

<class 'pandas.core.frame.DataFrame'>
Index: 310 entries, 1 to 310
Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Name            310 non-null    object 
 1   Date            310 non-null    object 
 2   Age             160 non-null    float64
 3   Expedition      271 non-null    object 
 4   Nationality     309 non-null    object 
 5   Cause of death  296 non-null    object 
 6   Location        291 non-null    object 
dtypes: float64(1), object(6)
memory usage: 19.4+ KB


## Part 3
* Import the `movie_titles.tsv` dataset
* You'll notice that it is not comma-separated! You'll need to tell `read_csv` what the separator actually is.
* The dataset does not come with its own column headings, so you'll need to provide those as well.  The columns are, in order, `id`, `title`, `year`, `imdb_rating`, `imdb_id`, and `genres`
* Once you have successfully read the dataset into a DataFrame, inspect the last 7 rows!

In [47]:
names = ["id", "title", "year", "imdb_rating", "imdb_id", "genres"]
movie_titles = pd.read_csv("data/movie_titles.tsv", sep="\t", names=names, index_col=0)

In [48]:
movie_titles.head()

Unnamed: 0_level_0,title,year,imdb_rating,imdb_id,genres
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
m0,10 things i hate about you,1999,6.9,62847.0,['comedy' 'romance']
m1,1492: conquest of paradise,1992,6.2,10421.0,['adventure' 'biography' 'drama' 'history']
m2,15 minutes,2001,6.1,25854.0,['action' 'crime' 'drama' 'thriller']
m3,2001: a space odyssey,1968,8.4,163227.0,['adventure' 'mystery' 'sci-fi']
m4,48 hrs.,1982,6.9,22289.0,['action' 'comedy' 'crime' 'drama' 'thriller']


In [49]:
movie_titles.tail(7)

Unnamed: 0_level_0,title,year,imdb_rating,imdb_id,genres
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
m610,the wizard of oz,1939,8.3,104873.0,['adventure' 'family' 'fantasy' 'musical']
m611,the world is not enough,1999,6.3,60047.0,['action' 'adventure' 'thriller']
m612,watchmen,2009,7.8,135229.0,['action' 'crime' 'fantasy' 'mystery' 'sci-fi'...
m613,xxx,2002,5.6,53505.0,['action' 'adventure' 'crime']
m614,x-men,2000,7.4,122149.0,['action' 'sci-fi']
m615,young frankenstein,1974,8.0,57618.0,['comedy' 'sci-fi']
m616,zulu dawn,1979,6.4,1911.0,['action' 'adventure' 'drama' 'history' 'war']
