# Project - Process IMDB Data

The file `movie_metadata.csv` contains over 5000 rows of data from IMDB.

There is more metadata (data about data) than needed. The job is to process it and create a new `csv` file with less metadata and only for movies rated above `imdb_score` 7.

### Step 1: Read the data

- Use the `csv` `DictReader` to read the content.
- Transform it into a list.

In [1]:
import csv

In [2]:
with open('movie_metadata.csv') as f:
    csv_reader = csv.DictReader(f)
    records = list(csv_reader)

In [3]:
records[0]

{'color': 'Color',
 'director_name': 'James Cameron',
 'num_critic_for_reviews': '723',
 'duration': '178',
 'director_facebook_likes': '0',
 'actor_3_facebook_likes': '855',
 'actor_2_name': 'Joel David Moore',
 'actor_1_facebook_likes': '1000',
 'gross': '760505847',
 'genres': 'Action|Adventure|Fantasy|Sci-Fi',
 'actor_1_name': 'CCH Pounder',
 'movie_title': 'Avatar',
 'num_voted_users': '886204',
 'cast_total_facebook_likes': '4834',
 'actor_3_name': 'Wes Studi',
 'facenumber_in_poster': '0',
 'plot_keywords': 'avatar|future|marine|native|paraplegic',
 'movie_imdb_link': 'http://www.imdb.com/title/tt0499549/?ref_=fn_tt_tt_1',
 'num_user_for_reviews': '3054',
 'language': 'English',
 'country': 'USA',
 'content_rating': 'PG-13',
 'budget': '237000000',
 'title_year': '2009',
 'actor_2_facebook_likes': '936',
 'imdb_score': '7.9',
 'aspect_ratio': '1.78',
 'movie_facebook_likes': '33000'}

### Step 2: Convert IMDB score to float

- Process all data and convert he `imdb_score` to `float`.

In [4]:
for record in records:
    record['imdb_score'] = float(record['imdb_score'])

In [5]:
records[0]

{'color': 'Color',
 'director_name': 'James Cameron',
 'num_critic_for_reviews': '723',
 'duration': '178',
 'director_facebook_likes': '0',
 'actor_3_facebook_likes': '855',
 'actor_2_name': 'Joel David Moore',
 'actor_1_facebook_likes': '1000',
 'gross': '760505847',
 'genres': 'Action|Adventure|Fantasy|Sci-Fi',
 'actor_1_name': 'CCH Pounder',
 'movie_title': 'Avatar',
 'num_voted_users': '886204',
 'cast_total_facebook_likes': '4834',
 'actor_3_name': 'Wes Studi',
 'facenumber_in_poster': '0',
 'plot_keywords': 'avatar|future|marine|native|paraplegic',
 'movie_imdb_link': 'http://www.imdb.com/title/tt0499549/?ref_=fn_tt_tt_1',
 'num_user_for_reviews': '3054',
 'language': 'English',
 'country': 'USA',
 'content_rating': 'PG-13',
 'budget': '237000000',
 'title_year': '2009',
 'actor_2_facebook_likes': '936',
 'imdb_score': 7.9,
 'aspect_ratio': '1.78',
 'movie_facebook_likes': '33000'}

### Step 3: Process data

- You will only need to keep the `movie_title` and the `imdb_score` if the score is above 7.
- To process that you could make a new list and add new dicts if the score is above 7.

In [6]:
records_processed = []

for record in records:
    if record['imdb_score'] > 7:
        new_record = {
            'movie_title': record['movie_title'],
            'imdb_score': record['imdb_score']
        }
        records_processed.append(new_record)

### Step 4: Write Data

- Write the new data to a CSV file `best_movies.csv`
- Remember to write the header.

In [7]:
with open('best_movies.csv', 'w') as f:
    csv_writer = csv.DictWriter(f, fieldnames=['movie_title', 'imdb_score'])
    csv_writer.writeheader()
    csv_writer.writerows(records_processed)

In [8]:
for rec in records_processed[:5]:
    print(rec)

{'movie_title': 'Avatar', 'imdb_score': 7.9}
{'movie_title': "Pirates of the Caribbean: At World's End", 'imdb_score': 7.1}
{'movie_title': 'The Dark Knight Rises', 'imdb_score': 8.5}
{'movie_title': 'Star Wars: Episode VII - The Force Awakens            ', 'imdb_score': 7.1}
{'movie_title': 'Tangled', 'imdb_score': 7.8}
