# IMDB ranting classifier - Data Exploration 

## Overview

The notebook contains some code for exploring the IMDB rating classifier application using the IMDB charts API.

We read both datasets (raw and refined) are written in the `data` directory and then
We then, preprocess and feature engineer the data however we want. :)


The schema for the IMDB movie chart data is as follows:

```python
schema = {
    "rank": {
        "type": "int",
        "min": 1,
        "max": 250,
        "required": True,
    },
    "title": {
        "type": "str",
        "required": True,
    },
    "year": {
        "type": "int",
        "min": 1900,
        "max": 2023,
        "required": True,
    },
    "rating": {
        "type": "float",
        "min": 0.0,
        "max": 10.0,
        "required": True,
    },
    "votes": {
        "type": "int",
        "min": 0,
        "required": True,
    },
    "url": {
        "type": "str",
        "required": True,
    },
    "poster_url": {
        "type": "str",
        "required": True,
    },
    "penalized": {
        "type": "bool",
        "required": True,
    },
}
```

In [121]:
import pandas as pd

In [122]:
# read the csv file for the penalized movies and their adjusted ratings
penalized_df = pd.read_csv('data/movies_penalized.csv')

In [123]:
# show the penalized dataframe - set some styling
penalized_df.style.set_properties(
    **{
        'text-align': 'left',
        'font-size': '10pt',
        'font-family': 'monospace',
    },
)

Unnamed: 0,rank,title,year,rating,votes,url,poster_url,penalized
0,2,A keresztapa,1972,8.37628,1863994,/title/tt0068646/,"https://m.media-amazon.com/images/M/MV5BM2MyNjYxNmUtYTAwNi00MTYxLWJmNWYtYzZlODY3ZTk3OTFlXkEyXkFqcGdeQXVyNzkwMjQ5NzM@._V1_UY67_CR1,0,45,67_AL_.jpg",True
1,3,A sötét lovag,2008,8.973369,2661083,/title/tt0468569/,"https://m.media-amazon.com/images/M/MV5BMTMxNTMwODM0NF5BMl5BanBnXkFtZTcwODAyMTk2Mw@@._V1_UY67_CR0,0,45,67_AL_.jpg",True
2,4,A keresztapa II,1974,7.587516,1275230,/title/tt0071562/,"https://m.media-amazon.com/images/M/MV5BMWMwMGQzZTItY2JlNC00OWZiLWIyMDctNDk2ZDQ2YjRjMWQ0XkEyXkFqcGdeQXVyNzkwMjQ5NzM@._V1_UY67_CR1,0,45,67_AL_.jpg",True
3,5,Tizenkét dühös ember,1957,7.106212,793926,/title/tt0050083/,"https://m.media-amazon.com/images/M/MV5BMWU4N2FjNzYtNTVkNC00NzQ0LTg0MjAtYTJlMjFhNGUxZDFmXkEyXkFqcGdeQXVyNjc1NTYyMjg@._V1_UX45_CR0,0,45,67_AL_.jpg",True
4,6,Schindler listája,1993,7.571722,1359436,/title/tt0108052/,"https://m.media-amazon.com/images/M/MV5BNDE4OTMxMTctNmRhYy00NWE2LTg3YzItYTk3M2UwOTU5Njg4XkEyXkFqcGdeQXVyNjU0OTQ0OTY@._V1_UX45_CR0,0,45,67_AL_.jpg",True
5,7,A Gyűrűk Ura: A király visszatér,2003,8.064031,1851745,/title/tt0167260/,"https://m.media-amazon.com/images/M/MV5BNzA5ZDNlZWMtM2NhNS00NDJjLTk4NDItYTRmY2EwMWZlMTY3XkEyXkFqcGdeQXVyNzkwMjQ5NzM@._V1_UY67_CR0,0,45,67_AL_.jpg",True
6,8,Ponyvaregény,1994,8.174505,2062219,/title/tt0110912/,"https://m.media-amazon.com/images/M/MV5BNGNhMDIzZTUtNTBlZi00MTRlLWFjM2ItYzViMjE3YzI5MjljXkEyXkFqcGdeQXVyNzkwMjQ5NzM@._V1_UY67_CR0,0,45,67_AL_.jpg",True
7,9,A Gyűrűk Ura: A gyűrű szövetsége,2001,7.993635,1881349,/title/tt0120737/,"https://m.media-amazon.com/images/M/MV5BN2EyZjM3NzUtNWUzMi00MTgxLWI0NTctMzY4M2VlOTdjZWRiXkEyXkFqcGdeQXVyNDUzOTQ5MjY@._V1_UY67_CR0,0,45,67_AL_.jpg",True
8,10,"A Jó, a Rossz és a Csúf",1966,6.876936,764650,/title/tt0060196/,"https://m.media-amazon.com/images/M/MV5BNjJlYmNkZGItM2NhYy00MjlmLTk5NmQtNjg1NmM2ODU4OTMwXkEyXkFqcGdeQXVyMjUzOTY1NTc@._V1_UX45_CR0,0,45,67_AL_.jpg",True
9,11,Forrest Gump,1994,8.198654,2086368,/title/tt0109830/,"https://m.media-amazon.com/images/M/MV5BNWIwODRlZTUtY2U3ZS00Yzg1LWJhNzYtMmZiYmEyNmU1NjMzXkEyXkFqcGdeQXVyMTQxNzMzNDI@._V1_UY67_CR0,0,45,67_AL_.jpg",True


In [124]:
# double check the data types
penalized_df.dtypes

rank            int64
title          object
year            int64
rating        float64
votes           int64
url            object
poster_url     object
penalized        bool
dtype: object

In [125]:
# read the csv file for the original movies and their ratings
orginal_df = pd.read_csv('data/movies.csv')

In [126]:
# show the original dataframe - set some styling
orginal_df.style.set_properties(
    **{
        'text-align': 'left',
        'font-size': '10pt',
        'font-family': 'monospace',
    },
)

Unnamed: 0,rank,title,year,rating,votes,url,poster_url,penalized
0,1,A remény rabjai,1994,9.2,2687714,/title/tt0111161/,"https://m.media-amazon.com/images/M/MV5BMDFkYTc0MGEtZmNhMC00ZDIzLWFmNTEtODM1ZmRlYWMwMWFmXkEyXkFqcGdeQXVyMTMxODk2OTU@._V1_UY67_CR0,0,45,67_AL_.jpg",False
1,2,A keresztapa,1972,9.2,1863994,/title/tt0068646/,"https://m.media-amazon.com/images/M/MV5BM2MyNjYxNmUtYTAwNi00MTYxLWJmNWYtYzZlODY3ZTk3OTFlXkEyXkFqcGdeQXVyNzkwMjQ5NzM@._V1_UY67_CR1,0,45,67_AL_.jpg",False
2,3,A sötét lovag,2008,9.0,2661083,/title/tt0468569/,"https://m.media-amazon.com/images/M/MV5BMTMxNTMwODM0NF5BMl5BanBnXkFtZTcwODAyMTk2Mw@@._V1_UY67_CR0,0,45,67_AL_.jpg",False
3,4,A keresztapa II,1974,9.0,1275230,/title/tt0071562/,"https://m.media-amazon.com/images/M/MV5BMWMwMGQzZTItY2JlNC00OWZiLWIyMDctNDk2ZDQ2YjRjMWQ0XkEyXkFqcGdeQXVyNzkwMjQ5NzM@._V1_UY67_CR1,0,45,67_AL_.jpg",False
4,5,Tizenkét dühös ember,1957,9.0,793926,/title/tt0050083/,"https://m.media-amazon.com/images/M/MV5BMWU4N2FjNzYtNTVkNC00NzQ0LTg0MjAtYTJlMjFhNGUxZDFmXkEyXkFqcGdeQXVyNjc1NTYyMjg@._V1_UX45_CR0,0,45,67_AL_.jpg",False
5,6,Schindler listája,1993,8.9,1359436,/title/tt0108052/,"https://m.media-amazon.com/images/M/MV5BNDE4OTMxMTctNmRhYy00NWE2LTg3YzItYTk3M2UwOTU5Njg4XkEyXkFqcGdeQXVyNjU0OTQ0OTY@._V1_UX45_CR0,0,45,67_AL_.jpg",False
6,7,A Gyűrűk Ura: A király visszatér,2003,8.9,1851745,/title/tt0167260/,"https://m.media-amazon.com/images/M/MV5BNzA5ZDNlZWMtM2NhNS00NDJjLTk4NDItYTRmY2EwMWZlMTY3XkEyXkFqcGdeQXVyNzkwMjQ5NzM@._V1_UY67_CR0,0,45,67_AL_.jpg",False
7,8,Ponyvaregény,1994,8.8,2062219,/title/tt0110912/,"https://m.media-amazon.com/images/M/MV5BNGNhMDIzZTUtNTBlZi00MTRlLWFjM2ItYzViMjE3YzI5MjljXkEyXkFqcGdeQXVyNzkwMjQ5NzM@._V1_UY67_CR0,0,45,67_AL_.jpg",False
8,9,A Gyűrűk Ura: A gyűrű szövetsége,2001,8.8,1881349,/title/tt0120737/,"https://m.media-amazon.com/images/M/MV5BN2EyZjM3NzUtNWUzMi00MTgxLWI0NTctMzY4M2VlOTdjZWRiXkEyXkFqcGdeQXVyNDUzOTQ5MjY@._V1_UY67_CR0,0,45,67_AL_.jpg",False
9,10,"A Jó, a Rossz és a Csúf",1966,8.8,764650,/title/tt0060196/,"https://m.media-amazon.com/images/M/MV5BNjJlYmNkZGItM2NhYy00MjlmLTk5NmQtNjg1NmM2ODU4OTMwXkEyXkFqcGdeQXVyMjUzOTY1NTc@._V1_UX45_CR0,0,45,67_AL_.jpg",False


In [127]:
# double check the data types
orginal_df.dtypes

rank            int64
title          object
year            int64
rating        float64
votes           int64
url            object
poster_url     object
penalized        bool
dtype: object

In [128]:
# drop the penalized column from both dataframes
orginal_df.drop('penalized', axis=1, inplace=True)
penalized_df.drop('penalized', axis=1, inplace=True)

In [129]:
# round the ratings to 2 decimal places
orginal_df['rating'] = orginal_df['rating'].round(2)
penalized_df['rating'] = penalized_df['rating'].round(2)

In [130]:
# rename the rating column in the refined dataframe to penalized_rating
penalized_df.rename(columns={'rating': 'penalized_rating'}, inplace=True)
penalized_df.head()

Unnamed: 0,rank,title,year,penalized_rating,votes,url,poster_url
0,2,A keresztapa,1972,8.38,1863994,/title/tt0068646/,https://m.media-amazon.com/images/M/MV5BM2MyNj...
1,3,A sötét lovag,2008,8.97,2661083,/title/tt0468569/,https://m.media-amazon.com/images/M/MV5BMTMxNT...
2,4,A keresztapa II,1974,7.59,1275230,/title/tt0071562/,https://m.media-amazon.com/images/M/MV5BMWMwMG...
3,5,Tizenkét dühös ember,1957,7.11,793926,/title/tt0050083/,https://m.media-amazon.com/images/M/MV5BMWU4N2...
4,6,Schindler listája,1993,7.57,1359436,/title/tt0108052/,https://m.media-amazon.com/images/M/MV5BNDE4OT...


In [131]:
cols_to_merge = list(penalized_df).append('penalized_rating')
print(cols_to_merge)

None


In [134]:
# join the two dataframes on the movie title column
merged_df = orginal_df.merge(penalized_df, on=None, how='left')

In [133]:
display(merged_df)

Unnamed: 0,rank,title,year,rating,votes,url,poster_url,penalized_rating
0,1,A remény rabjai,1994,9.2,2687714,/title/tt0111161/,https://m.media-amazon.com/images/M/MV5BMDFkYT...,
1,2,A keresztapa,1972,9.2,1863994,/title/tt0068646/,https://m.media-amazon.com/images/M/MV5BM2MyNj...,8.38
2,3,A sötét lovag,2008,9.0,2661083,/title/tt0468569/,https://m.media-amazon.com/images/M/MV5BMTMxNT...,8.97
3,4,A keresztapa II,1974,9.0,1275230,/title/tt0071562/,https://m.media-amazon.com/images/M/MV5BMWMwMG...,7.59
4,5,Tizenkét dühös ember,1957,9.0,793926,/title/tt0050083/,https://m.media-amazon.com/images/M/MV5BMWU4N2...,7.11
5,6,Schindler listája,1993,8.9,1359436,/title/tt0108052/,https://m.media-amazon.com/images/M/MV5BNDE4OT...,7.57
6,7,A Gyűrűk Ura: A király visszatér,2003,8.9,1851745,/title/tt0167260/,https://m.media-amazon.com/images/M/MV5BNzA5ZD...,8.06
7,8,Ponyvaregény,1994,8.8,2062219,/title/tt0110912/,https://m.media-amazon.com/images/M/MV5BNGNhMD...,8.17
8,9,A Gyűrűk Ura: A gyűrű szövetsége,2001,8.8,1881349,/title/tt0120737/,https://m.media-amazon.com/images/M/MV5BN2EyZj...,7.99
9,10,"A Jó, a Rossz és a Csúf",1966,8.8,764650,/title/tt0060196/,https://m.media-amazon.com/images/M/MV5BNjJlYm...,6.88
