## TSV with The Simpsons episodes

Read and store in a `simpsons` DataFrame the data within the `simpsons-episodes.tsv` TSV (Tabular Separated Values) file. This file contains information about all The Simpsons episodes.

Take a look at the file before you read it into a DataFrame and see what will be necessary to parse it correctly.

#### **Instructions**

* Use correct separator as data is tabular separated.
* Use the following `col_names` list as column names.
* Load just `Title`, `Air date`, `Production code` and `IMDB rating`.
* Don't load the first empty columns.
* Set `Production Code` code as index.
* Null values are encoded as `no_val` values, be careful with that when loading the data.
* Parse the `Air date` columns as Date.

In [40]:
import pandas as pd

In this example we can use the `sep` parameter to specify the separator, which is a tab character (`\t`), `use_cols` to select the specific columns was expected, `header` set to `None` since this file does not have a header row. Also, we are setting the index column to `Production Code` using the `index_col` parameter and dropping the empty columns with `dropna` method with the `how` parameter set to `all`, because we want to drop columns that are completely empty. Since ther are null values encoded as `no_val`, we can use the `na_values` parameter to specify that `no_val` should be treated as a null value. Finally, we can parse the `Air date` column as a date using the `parse_dates` parameter.

In [75]:
col_names = ['Title', 'Air date', 'Production code', 'Season', 'Number in season',
             'Number in series', 'US viewers (million)', 'Views', 'IMDB rating']    

simpsons_df = pd.read_csv('files/simpsons-episodes.tsv', sep='\t', names=col_names, header=None, usecols=["Title", "Air date", "Production code", "IMDB rating"], index_col="Production code", na_values="no_val", 
                            parse_dates=["Air date"], encoding="utf-8",
                                ).dropna(how='all')


  simpsons_df = pd.read_csv('files/simpsons-episodes.tsv', sep='\t', names=col_names, header=None, usecols=["Title", "Air date", "Production code", "IMDB rating"], index_col="Production code", na_values="no_val",


In [76]:
simpsons_df.head(10)

Unnamed: 0_level_0,Title,Air date,IMDB rating
Production code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
7F01,Two Cars in Every Garage and Three Eyes on Eve...,1990-01-11,8.1
7F08,,1990-11-15,8.0
7F06,Bart the Daredevil,1990-06-12,
,Bart Gets Hit by a Car,1991-10-01,7.8
7F13,Homer vs. Lisa and the 8th Commandment,1991-07-02,8.0
7F16,"Oh Brother, Where Art Thou?",1991-02-21,8.2
7F17,Old Money,1991-03-28,7.6
7F19,Lisa's Substitute,1991-04-25,8.5
7F22,Blood Feud,1991-11-07,8.0
8F01,Mr. Lisa Goes to Washington,1991-09-26,7.7


In [69]:
simpsons_df.dtypes

Title                  object
Air date       datetime64[ns]
IMDB rating           float64
dtype: object

We can check that the `Air date` column after the parsing date, now haves the rigth data type format has `datetime64[ns]`.