# DataFrame Basics Exercise

## Part 1
* Use pandas to read the `bestsellers` dataset into a DataFrame 
* Once you've done that, use Pandas to figure out how many rows and columns the DF has
* Inspect the first 5 rows
* Inspect the first 19 rows
* Inspect the last 5 rows
* Inspect the last 2 rows 
* Which columns (if any) are missing values?
* What datatype did Pandas assign to "User Rating"?
* How many integer columns are in the DataFrame?

In [30]:
import pandas as pd

In [37]:
df = pd.read_csv("mount_everest_deaths.csv")

In [38]:
df

Unnamed: 0,No.,Name,Date,Age,Expedition,Nationality,Cause of death,Location
0,1,Dorje,"June 7, 1922",,1922 British Mount Everest Expedition,Nepal,Avalanche,Below North Col
1,2,Lhakpa,"June 7, 1922",,1922 British Mount Everest Expedition,Nepal,Avalanche,Below North Col
2,3,Norbu,"June 7, 1922",,1922 British Mount Everest Expedition,Nepal,Avalanche,Below North Col
3,4,Pasang,"June 7, 1922",,1922 British Mount Everest Expedition,Nepal,Avalanche,Below North Col
4,5,Pema,"June 7, 1922",,1922 British Mount Everest Expedition,Nepal,Avalanche,Below North Col
...,...,...,...,...,...,...,...,...
305,306,Christopher Jon Kulish,"May 27, 2019",62.0,Climbing the Seven Summits,United States,Cardiac event during descent,South Col
306,307,Puwei Liu,"May 12, 2021",55.0,Seven Summit Treks,United States,Exhaustion,Near South Summit
307,308,Abdul Waraich,"May 12, 2021",41.0,Seven Summit Treks,Switzerland,Exhaustion,Near South Summit
308,309,Pemba Tashi Sherpa,"May 18, 2021",28.0,Climbing the Seven Summits,Nepal,Fall into a crevasse,Between Camp I & Camp II


## Part 2

* The `mount_everest_deaths` dataset has its own index column provided in the dataset.  When importing it, use the existing index column.
* Which columns have zero null values?
* Which column has the most null values?


In [41]:
deaths = pd.read_csv('mount_everest_deaths.csv', index_col=0)

In [42]:
deaths

Unnamed: 0_level_0,Name,Date,Age,Expedition,Nationality,Cause of death,Location
No.,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,Dorje,"June 7, 1922",,1922 British Mount Everest Expedition,Nepal,Avalanche,Below North Col
2,Lhakpa,"June 7, 1922",,1922 British Mount Everest Expedition,Nepal,Avalanche,Below North Col
3,Norbu,"June 7, 1922",,1922 British Mount Everest Expedition,Nepal,Avalanche,Below North Col
4,Pasang,"June 7, 1922",,1922 British Mount Everest Expedition,Nepal,Avalanche,Below North Col
5,Pema,"June 7, 1922",,1922 British Mount Everest Expedition,Nepal,Avalanche,Below North Col
...,...,...,...,...,...,...,...
306,Christopher Jon Kulish,"May 27, 2019",62.0,Climbing the Seven Summits,United States,Cardiac event during descent,South Col
307,Puwei Liu,"May 12, 2021",55.0,Seven Summit Treks,United States,Exhaustion,Near South Summit
308,Abdul Waraich,"May 12, 2021",41.0,Seven Summit Treks,Switzerland,Exhaustion,Near South Summit
309,Pemba Tashi Sherpa,"May 18, 2021",28.0,Climbing the Seven Summits,Nepal,Fall into a crevasse,Between Camp I & Camp II


In [44]:
deaths.info()

<class 'pandas.core.frame.DataFrame'>
Index: 310 entries, 1 to 310
Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Name            310 non-null    object 
 1   Date            310 non-null    object 
 2   Age             160 non-null    float64
 3   Expedition      271 non-null    object 
 4   Nationality     309 non-null    object 
 5   Cause of death  296 non-null    object 
 6   Location        291 non-null    object 
dtypes: float64(1), object(6)
memory usage: 19.4+ KB


## Part 3
* Import the `movie_titles.tsv` dataset
* You'll notice that it is not comma-separated! You'll need to tell `read_csv` what the separator actually is.
* The dataset does not come with its own column headings, so you'll need to provide those as well.  The columns are, in order, `id`, `title`, `year`, `imdb_rating`, `imdb_id`, and `genres`
* Once you have successfully read the dataset into a DataFrame, inspect the last 7 rows!

In [53]:
movies = pd.read_csv('movie_titles.tsv', sep='\t', names=['id', 'title', 'year', 'imdb_rating', 'imdb_id', 'genres'])

In [54]:
movies

Unnamed: 0,id,title,year,imdb_rating,imdb_id,genres
0,m0,10 things i hate about you,1999,6.9,62847.0,['comedy' 'romance']
1,m1,1492: conquest of paradise,1992,6.2,10421.0,['adventure' 'biography' 'drama' 'history']
2,m2,15 minutes,2001,6.1,25854.0,['action' 'crime' 'drama' 'thriller']
3,m3,2001: a space odyssey,1968,8.4,163227.0,['adventure' 'mystery' 'sci-fi']
4,m4,48 hrs.,1982,6.9,22289.0,['action' 'comedy' 'crime' 'drama' 'thriller']
...,...,...,...,...,...,...
612,m612,watchmen,2009,7.8,135229.0,['action' 'crime' 'fantasy' 'mystery' 'sci-fi'...
613,m613,xxx,2002,5.6,53505.0,['action' 'adventure' 'crime']
614,m614,x-men,2000,7.4,122149.0,['action' 'sci-fi']
615,m615,young frankenstein,1974,8.0,57618.0,['comedy' 'sci-fi']


In [55]:
movies.tail(7)

Unnamed: 0,id,title,year,imdb_rating,imdb_id,genres
610,m610,the wizard of oz,1939,8.3,104873.0,['adventure' 'family' 'fantasy' 'musical']
611,m611,the world is not enough,1999,6.3,60047.0,['action' 'adventure' 'thriller']
612,m612,watchmen,2009,7.8,135229.0,['action' 'crime' 'fantasy' 'mystery' 'sci-fi'...
613,m613,xxx,2002,5.6,53505.0,['action' 'adventure' 'crime']
614,m614,x-men,2000,7.4,122149.0,['action' 'sci-fi']
615,m615,young frankenstein,1974,8.0,57618.0,['comedy' 'sci-fi']
616,m616,zulu dawn,1979,6.4,1911.0,['action' 'adventure' 'drama' 'history' 'war']
