## Load movies dataset

Read and store in a movies `DataFrame` the data within the `movies.csv` file.

Take a look at the file before you read it into a `DataFrame` and see what will be necessary to parse it correctly.


#### Instructions

* Use the appropriate separator.
* The given data doesn't have a defined header. Use the `column names` given in the column_names variable.
* Skip the first 3 rows.
* Handling the missing values, by just replacing them with `NaN`.

In [255]:
import pandas as pd

In [256]:
column_names = ['color', 'director_name', 'num_critic_for_reviews', 'duration',
                'gross', 'movie_title', 'num_user_for_reviews', 'country',
                'cotent_rating', 'budget', 'title_year', 'imdb_score', 'genre']

movies = pd.read_csv('files/movies.csv',
                     sep='|',
                     header=None,
                     names=column_names,
                     skiprows=3)

In [257]:
movies.head() # first 5 rows

Unnamed: 0,color,director_name,num_critic_for_reviews,duration,gross,movie_title,num_user_for_reviews,country,cotent_rating,budget,title_year,imdb_score,genre
0,Color,Christopher Nolan,813.0,164.0,448130642.0,The Dark Knight Rises,2701.0,USA,PG-13,250000000.0,2012.0,8.5,Action
1,?,Doug Walker,?,?,?,Star Wars: Episode VII - The Force Awakens ...,?,?,?,?,?,7.1,Documentary
2,Color,Andrew Stanton,462.0,132.0,73058679.0,John Carter,738.0,USA,PG-13,263700000.0,2012.0,6.6,Action
3,Color,Sam Raimi,392.0,156.0,336530303.0,Spider-Man 3,1902.0,USA,PG-13,258000000.0,2007.0,6.2,Action
4,Color,Nathan Greno,324.0,100.0,200807262.0,Tangled,387.0,USA,PG,260000000.0,2010.0,7.8,Adventure


In [258]:
movies.tail() # last 5 rows

Unnamed: 0,color,director_name,num_critic_for_reviews,duration,gross,movie_title,num_user_for_reviews,country,cotent_rating,budget,title_year,imdb_score,genre
92,Color,James Gunn,653.0,121.0,333130696.0,Guardians of the Galaxy,1097.0,USA,PG-13,170000000.0,2014.0,8.1,Action
93,Color,Christopher Nolan,712.0,169.0,187991439.0,Interstellar,2725.0,USA,PG-13,165000000.0,2014.0,8.6,Adventure
94,Color,Christopher Nolan,642.0,148.0,292568851.0,Inception,2803.0,USA,PG-13,160000000.0,2010.0,8.8,Action
95,Color,Hideaki Anno,1.0,120.0,?,Godzilla Resurgence,13.0,Japan,?,?,2016.0,8.2,Action
96,Color,Peter Jackson,645.0,182.0,303001229.0,The Hobbit: An Unexpected Journey,1367.0,USA,PG-13,180000000.0,2012.0,7.9,Adventure


We can noticed that some columns have missing values with `?`, so we can replace them with `NaN` using the `na_values` parameter and also the `budget` column haves comma `,` s, so we can use the `thousands` parameter to remove them.

In [259]:
new_movies_df = pd.read_csv('files/movies.csv',
                           sep='|',
                           header=None,
                           names=column_names,
                           skiprows=3,
                           na_values='?',
                           thousands=',',
                            # skip_blank_lines=True,
                           )

In [261]:
new_movies_df.head() # first 5 rows

Unnamed: 0,color,director_name,num_critic_for_reviews,duration,gross,movie_title,num_user_for_reviews,country,cotent_rating,budget,title_year,imdb_score,genre
0,Color,Christopher Nolan,813.0,164.0,448130642.0,The Dark Knight Rises,2701.0,USA,PG-13,250000000.0,2012.0,8.5,Action
1,,Doug Walker,,,,Star Wars: Episode VII - The Force Awakens ...,,,,,,7.1,Documentary
2,Color,Andrew Stanton,462.0,132.0,73058679.0,John Carter,738.0,USA,PG-13,263700000.0,2012.0,6.6,Action
3,Color,Sam Raimi,392.0,156.0,336530303.0,Spider-Man 3,1902.0,USA,PG-13,258000000.0,2007.0,6.2,Action
4,Color,Nathan Greno,324.0,100.0,200807262.0,Tangled,387.0,USA,PG,260000000.0,2010.0,7.8,Adventure
