# Análise Explorátoria - Classificação de Filmes TMBd

Este projeto consiste em analisar as bases de dados de classificação de filmes da IMBb e TMBd com a linguagem de programação Python e suas principais bibliotecas para análise de dados.
Todos os dados utilizados no projeto podem ser encontrados [aqui](https://github.com/rafaelladuarte/film_rating_exploratory_analysis/tree/main/Dados).

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

In [2]:
import json

Agora vamos tratar e analisar conhecida a classificação de filmes da TMBd, também conhecida como The Movie Database, é uma base de dados grátis e de código aberto, sobre filmes e seriados, criado por Travis Bell em 2008. Atualizado constantemente através do apoio da comunidade. Inicialmente, era apenas uma base de dados sobre filmes, mas em 2013 foi adicionada a seção de séries.

## TMBd

### Tratamento

In [3]:
path_tmbd = "https://raw.githubusercontent.com/rafaelladuarte/film_rating_exploratory_analysis/main/Dados/tmdb_5000_movies.csv"

In [4]:
df_tmbd= pd.read_csv(path_tmbd)
df_tmbd.sample(3)

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count
126,170000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://marvel.com/thor,76338,"[{""id"": 8828, ""name"": ""marvel comic""}, {""id"": ...",en,Thor: The Dark World,Thor fights to restore order across the cosmos...,99.499595,"[{""name"": ""Marvel Studios"", ""id"": 420}]","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2013-10-29,644571402,112.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,Delve into the darkness,Thor: The Dark World,6.8,4755
355,90000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 53, ""nam...",,1572,"[{""id"": 258, ""name"": ""bomb""}, {""id"": 444, ""nam...",en,Die Hard: With a Vengeance,New York detective John McClane is back and ki...,51.881077,"[{""name"": ""Twentieth Century Fox Film Corporat...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",1995-05-19,366101666,128.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Think fast. Look alive. Die hard.,Die Hard: With a Vengeance,6.9,2066
3096,0,"[{""id"": 18, ""name"": ""Drama""}]",,91076,"[{""id"": 10965, ""name"": ""playwright""}, {""id"": 1...",en,Illuminata,"It's the start of the 20th century, and Tuccio...",0.111313,"[{""name"": ""Overseas FilmGroup"", ""id"": 888}, {""...","[{""iso_3166_1"": ""ES"", ""name"": ""Spain""}, {""iso_...",1998-05-21,0,119.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,,Illuminata,6.3,7


In [5]:
df_tmbd.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4803 entries, 0 to 4802
Data columns (total 20 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   budget                4803 non-null   int64  
 1   genres                4803 non-null   object 
 2   homepage              1712 non-null   object 
 3   id                    4803 non-null   int64  
 4   keywords              4803 non-null   object 
 5   original_language     4803 non-null   object 
 6   original_title        4803 non-null   object 
 7   overview              4800 non-null   object 
 8   popularity            4803 non-null   float64
 9   production_companies  4803 non-null   object 
 10  production_countries  4803 non-null   object 
 11  release_date          4802 non-null   object 
 12  revenue               4803 non-null   int64  
 13  runtime               4801 non-null   float64
 14  spoken_languages      4803 non-null   object 
 15  status               

In [7]:
df_tmbd = df_tmbd.drop(
    columns=[
        "homepage",
        "id",
        "keywords",
        "original_title",
        "overview",
        "spoken_languages",
        "tagline",
    ]
)
df_tmbd

Unnamed: 0,budget,genres,original_language,popularity,production_companies,production_countries,release_date,revenue,runtime,status,title,vote_average,vote_count
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",en,150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2009-12-10,2787965087,162.0,Released,Avatar,7.2,11800
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",en,139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2007-05-19,961000000,169.0,Released,Pirates of the Caribbean: At World's End,6.9,4500
2,245000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",en,107.376788,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...","[{""iso_3166_1"": ""GB"", ""name"": ""United Kingdom""...",2015-10-26,880674609,148.0,Released,Spectre,6.3,4466
3,250000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",en,112.312950,"[{""name"": ""Legendary Pictures"", ""id"": 923}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-07-16,1084939099,165.0,Released,The Dark Knight Rises,7.6,9106
4,260000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",en,43.926995,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}]","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-03-07,284139100,132.0,Released,John Carter,6.1,2124
...,...,...,...,...,...,...,...,...,...,...,...,...,...
4798,220000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",es,14.269792,"[{""name"": ""Columbia Pictures"", ""id"": 5}]","[{""iso_3166_1"": ""MX"", ""name"": ""Mexico""}, {""iso...",1992-09-04,2040920,81.0,Released,El Mariachi,6.6,238
4799,9000,"[{""id"": 35, ""name"": ""Comedy""}, {""id"": 10749, ""...",en,0.642552,[],[],2011-12-26,0,85.0,Released,Newlyweds,5.9,5
4800,0,"[{""id"": 35, ""name"": ""Comedy""}, {""id"": 18, ""nam...",en,1.444476,"[{""name"": ""Front Street Pictures"", ""id"": 3958}...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2013-10-13,0,120.0,Released,"Signed, Sealed, Delivered",7.0,6
4801,0,[],en,0.857008,[],"[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-05-03,0,98.0,Released,Shanghai Calling,5.7,7


In [8]:
df_tmbd.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4803 entries, 0 to 4802
Data columns (total 13 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   budget                4803 non-null   int64  
 1   genres                4803 non-null   object 
 2   original_language     4803 non-null   object 
 3   popularity            4803 non-null   float64
 4   production_companies  4803 non-null   object 
 5   production_countries  4803 non-null   object 
 6   release_date          4802 non-null   object 
 7   revenue               4803 non-null   int64  
 8   runtime               4801 non-null   float64
 9   status                4803 non-null   object 
 10  title                 4803 non-null   object 
 11  vote_average          4803 non-null   float64
 12  vote_count            4803 non-null   int64  
dtypes: float64(3), int64(3), object(7)
memory usage: 487.9+ KB


In [9]:
df_tmbd.dropna()

Unnamed: 0,budget,genres,original_language,popularity,production_companies,production_countries,release_date,revenue,runtime,status,title,vote_average,vote_count
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",en,150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2009-12-10,2787965087,162.0,Released,Avatar,7.2,11800
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",en,139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2007-05-19,961000000,169.0,Released,Pirates of the Caribbean: At World's End,6.9,4500
2,245000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",en,107.376788,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...","[{""iso_3166_1"": ""GB"", ""name"": ""United Kingdom""...",2015-10-26,880674609,148.0,Released,Spectre,6.3,4466
3,250000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",en,112.312950,"[{""name"": ""Legendary Pictures"", ""id"": 923}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-07-16,1084939099,165.0,Released,The Dark Knight Rises,7.6,9106
4,260000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",en,43.926995,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}]","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-03-07,284139100,132.0,Released,John Carter,6.1,2124
...,...,...,...,...,...,...,...,...,...,...,...,...,...
4798,220000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",es,14.269792,"[{""name"": ""Columbia Pictures"", ""id"": 5}]","[{""iso_3166_1"": ""MX"", ""name"": ""Mexico""}, {""iso...",1992-09-04,2040920,81.0,Released,El Mariachi,6.6,238
4799,9000,"[{""id"": 35, ""name"": ""Comedy""}, {""id"": 10749, ""...",en,0.642552,[],[],2011-12-26,0,85.0,Released,Newlyweds,5.9,5
4800,0,"[{""id"": 35, ""name"": ""Comedy""}, {""id"": 18, ""nam...",en,1.444476,"[{""name"": ""Front Street Pictures"", ""id"": 3958}...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2013-10-13,0,120.0,Released,"Signed, Sealed, Delivered",7.0,6
4801,0,[],en,0.857008,[],"[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-05-03,0,98.0,Released,Shanghai Calling,5.7,7


In [10]:
df_tmbd["genres"][0]

'[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}]'

In [11]:
json.loads(df_tmbd["genres"][0])[0]["name"]

'Action'

In [12]:
def get_main_genre(genres):
    if genres != None and genres != '[]':
        genres_json = json.loads(genres)
        if len(genres) > 0:
            genre = genres_json[0]["name"]
            return genre
    else:
        return 'No genre list'

In [13]:
get_main_genre(df_tmbd["genres"][0])

'Action'

In [14]:
get_main_genre('[]')

'No genre list'

In [15]:
df_tmbd["main_genre"] = df_tmbd["genres"].apply(lambda x: get_main_genre(x))
df_tmbd["main_genre"]

0              Action
1           Adventure
2              Action
3              Action
4              Action
            ...      
4798           Action
4799           Comedy
4800           Comedy
4801    No genre list
4802      Documentary
Name: main_genre, Length: 4803, dtype: object

In [17]:
df_tmbd.sample(3)

Unnamed: 0,budget,genres,original_language,popularity,production_companies,production_countries,release_date,revenue,runtime,status,title,vote_average,vote_count,main_genre
1578,37000000,"[{""id"": 35, ""name"": ""Comedy""}, {""id"": 14, ""nam...",en,41.569541,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2004-04-13,96455697,98.0,Released,13 Going on 30,6.3,1204,Comedy
3357,0,"[{""id"": 18, ""name"": ""Drama""}, {""id"": 27, ""name...",en,16.754452,"[{""name"": ""Lions Gate Films"", ""id"": 35}, {""nam...","[{""iso_3166_1"": ""DE"", ""name"": ""Germany""}, {""is...",2005-07-22,0,107.0,Released,The Devil's Rejects,6.6,322,Drama
2908,0,"[{""id"": 18, ""name"": ""Drama""}]",ja,2.930853,"[{""name"": ""Tokuma Shoten"", ""id"": 1779}, {""name...","[{""iso_3166_1"": ""JP"", ""name"": ""Japan""}]",1993-04-17,0,134.0,Released,Madadayo,7.5,20,Drama


In [18]:
df_tmbd["production_companies"][0]

'[{"name": "Ingenious Film Partners", "id": 289}, {"name": "Twentieth Century Fox Film Corporation", "id": 306}, {"name": "Dune Entertainment", "id": 444}, {"name": "Lightstorm Entertainment", "id": 574}]'