# Análise Explorátoria - Classificação de Filmes TMBd

Este projeto consiste em analisar as bases de dados de classificação de filmes da IMBb e TMBd com a linguagem de programação Python e suas principais bibliotecas para análise de dados.
Todos os dados utilizados no projeto podem ser encontrados [aqui](https://github.com/rafaelladuarte/film_rating_exploratory_analysis/tree/main/Dados).

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

In [2]:
import json

Agora vamos tratar e analisar conhecida a classificação de filmes da TMBd, também conhecida como The Movie Database, é uma base de dados grátis e de código aberto, sobre filmes e seriados, criado por Travis Bell em 2008. Atualizado constantemente através do apoio da comunidade. Inicialmente, era apenas uma base de dados sobre filmes, mas em 2013 foi adicionada a seção de séries.

## TMBd

### Tratamento

In [3]:
path_tmbd = "https://raw.githubusercontent.com/rafaelladuarte/film_rating_exploratory_analysis/main/Dados/tmdb_5000_movies.csv"

In [4]:
df_tmbd= pd.read_csv(path_tmbd)
df_tmbd.sample(3)

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count
2688,14000000,"[{""id"": 53, ""name"": ""Thriller""}, {""id"": 9648, ...",,12403,"[{""id"": 1668, ""name"": ""hawaii""}, {""id"": 2676, ...",en,A Perfect Getaway,"For their honeymoon, newlyweds Cliff and Cydne...",20.792603,"[{""name"": ""Davis Entertainment"", ""id"": 1302}, ...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2009-06-08,22852638,98.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,Welcome to paradise. Enter at your own risk.,A Perfect Getaway,6.2,351
3817,967686,"[{""id"": 35, ""name"": ""Comedy""}, {""id"": 80, ""nam...",http://www.four-lions.co.uk/,37495,"[{""id"": 13015, ""name"": ""terrorism""}, {""id"": 19...",en,Four Lions,Four Lions tells the story of a group of Briti...,20.544999,"[{""name"": ""Film4"", ""id"": 9349}, {""name"": ""Draf...","[{""iso_3166_1"": ""GB"", ""name"": ""United Kingdom""}]",2010-05-07,4270000,101.0,"[{""iso_639_1"": ""ar"", ""name"": ""\u0627\u0644\u06...",Released,We are 4 Lions.,Four Lions,7.0,423
2150,20000000,"[{""id"": 18, ""name"": ""Drama""}, {""id"": 53, ""name...",,17182,"[{""id"": 570, ""name"": ""rape""}, {""id"": 1419, ""na...",en,Eye for an Eye,It's fire and brimstone time as grieving mothe...,3.996392,"[{""name"": ""Paramount Pictures"", ""id"": 4}]","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",1996-01-12,0,101.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,What do you do when justice fails?,Eye for an Eye,5.8,63


In [5]:
df_tmbd.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4803 entries, 0 to 4802
Data columns (total 20 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   budget                4803 non-null   int64  
 1   genres                4803 non-null   object 
 2   homepage              1712 non-null   object 
 3   id                    4803 non-null   int64  
 4   keywords              4803 non-null   object 
 5   original_language     4803 non-null   object 
 6   original_title        4803 non-null   object 
 7   overview              4800 non-null   object 
 8   popularity            4803 non-null   float64
 9   production_companies  4803 non-null   object 
 10  production_countries  4803 non-null   object 
 11  release_date          4802 non-null   object 
 12  revenue               4803 non-null   int64  
 13  runtime               4801 non-null   float64
 14  spoken_languages      4803 non-null   object 
 15  status               

In [6]:
df_tmbd = df_tmbd.drop(
    columns=[
        "homepage",
        "id",
        "keywords",
        "original_title",
        "overview",
        "spoken_languages",
        "tagline",
    ]
)
df_tmbd

Unnamed: 0,budget,genres,original_language,popularity,production_companies,production_countries,release_date,revenue,runtime,status,title,vote_average,vote_count
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",en,150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2009-12-10,2787965087,162.0,Released,Avatar,7.2,11800
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",en,139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2007-05-19,961000000,169.0,Released,Pirates of the Caribbean: At World's End,6.9,4500
2,245000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",en,107.376788,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...","[{""iso_3166_1"": ""GB"", ""name"": ""United Kingdom""...",2015-10-26,880674609,148.0,Released,Spectre,6.3,4466
3,250000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",en,112.312950,"[{""name"": ""Legendary Pictures"", ""id"": 923}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-07-16,1084939099,165.0,Released,The Dark Knight Rises,7.6,9106
4,260000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",en,43.926995,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}]","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-03-07,284139100,132.0,Released,John Carter,6.1,2124
...,...,...,...,...,...,...,...,...,...,...,...,...,...
4798,220000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",es,14.269792,"[{""name"": ""Columbia Pictures"", ""id"": 5}]","[{""iso_3166_1"": ""MX"", ""name"": ""Mexico""}, {""iso...",1992-09-04,2040920,81.0,Released,El Mariachi,6.6,238
4799,9000,"[{""id"": 35, ""name"": ""Comedy""}, {""id"": 10749, ""...",en,0.642552,[],[],2011-12-26,0,85.0,Released,Newlyweds,5.9,5
4800,0,"[{""id"": 35, ""name"": ""Comedy""}, {""id"": 18, ""nam...",en,1.444476,"[{""name"": ""Front Street Pictures"", ""id"": 3958}...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2013-10-13,0,120.0,Released,"Signed, Sealed, Delivered",7.0,6
4801,0,[],en,0.857008,[],"[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-05-03,0,98.0,Released,Shanghai Calling,5.7,7


In [7]:
df_tmbd.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4803 entries, 0 to 4802
Data columns (total 13 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   budget                4803 non-null   int64  
 1   genres                4803 non-null   object 
 2   original_language     4803 non-null   object 
 3   popularity            4803 non-null   float64
 4   production_companies  4803 non-null   object 
 5   production_countries  4803 non-null   object 
 6   release_date          4802 non-null   object 
 7   revenue               4803 non-null   int64  
 8   runtime               4801 non-null   float64
 9   status                4803 non-null   object 
 10  title                 4803 non-null   object 
 11  vote_average          4803 non-null   float64
 12  vote_count            4803 non-null   int64  
dtypes: float64(3), int64(3), object(7)
memory usage: 487.9+ KB


In [8]:
df_tmbd.dropna()

Unnamed: 0,budget,genres,original_language,popularity,production_companies,production_countries,release_date,revenue,runtime,status,title,vote_average,vote_count
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",en,150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2009-12-10,2787965087,162.0,Released,Avatar,7.2,11800
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",en,139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2007-05-19,961000000,169.0,Released,Pirates of the Caribbean: At World's End,6.9,4500
2,245000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",en,107.376788,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...","[{""iso_3166_1"": ""GB"", ""name"": ""United Kingdom""...",2015-10-26,880674609,148.0,Released,Spectre,6.3,4466
3,250000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",en,112.312950,"[{""name"": ""Legendary Pictures"", ""id"": 923}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-07-16,1084939099,165.0,Released,The Dark Knight Rises,7.6,9106
4,260000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",en,43.926995,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}]","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-03-07,284139100,132.0,Released,John Carter,6.1,2124
...,...,...,...,...,...,...,...,...,...,...,...,...,...
4798,220000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",es,14.269792,"[{""name"": ""Columbia Pictures"", ""id"": 5}]","[{""iso_3166_1"": ""MX"", ""name"": ""Mexico""}, {""iso...",1992-09-04,2040920,81.0,Released,El Mariachi,6.6,238
4799,9000,"[{""id"": 35, ""name"": ""Comedy""}, {""id"": 10749, ""...",en,0.642552,[],[],2011-12-26,0,85.0,Released,Newlyweds,5.9,5
4800,0,"[{""id"": 35, ""name"": ""Comedy""}, {""id"": 18, ""nam...",en,1.444476,"[{""name"": ""Front Street Pictures"", ""id"": 3958}...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2013-10-13,0,120.0,Released,"Signed, Sealed, Delivered",7.0,6
4801,0,[],en,0.857008,[],"[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-05-03,0,98.0,Released,Shanghai Calling,5.7,7


In [9]:
df_tmbd["genres"][0]

'[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 14, "name": "Fantasy"}, {"id": 878, "name": "Science Fiction"}]'

In [10]:
json.loads(df_tmbd["genres"][0])[0]["name"]

'Action'

In [11]:
df_tmbd["production_companies"][0]

'[{"name": "Ingenious Film Partners", "id": 289}, {"name": "Twentieth Century Fox Film Corporation", "id": 306}, {"name": "Dune Entertainment", "id": 444}, {"name": "Lightstorm Entertainment", "id": 574}]'

In [12]:
json.loads(df_tmbd["production_companies"][0])[0]["name"]

'Ingenious Film Partners'

In [13]:
df_tmbd["production_countries"][0]

'[{"iso_3166_1": "US", "name": "United States of America"}, {"iso_3166_1": "GB", "name": "United Kingdom"}]'

In [14]:
json.loads(df_tmbd["production_countries"][0])[0]["name"]

'United States of America'

In [15]:
def get_main_column(string):
    if (string != None) and (string != '[]'):
        tupla = json.loads(string)
        if len(tupla) > 0:
            atributo = tupla[0]["name"]
            return atributo
    else:
        return None

In [16]:
get_main_column(df_tmbd["genres"][0])

'Action'

In [17]:
get_main_column(df_tmbd["production_companies"][0])

'Ingenious Film Partners'

In [18]:
get_main_column(df_tmbd["production_countries"][0])

'United States of America'

In [20]:
df_tmbd["main_genre"] = df_tmbd["genres"].apply(lambda x: get_main_column(x))
df_tmbd["main_genre"]

0            Action
1         Adventure
2            Action
3            Action
4            Action
           ...     
4798         Action
4799         Comedy
4800         Comedy
4801           None
4802    Documentary
Name: main_genre, Length: 4803, dtype: object

In [21]:
df_tmbd["main_company"] = df_tmbd["production_companies"].apply(lambda x: get_main_column(x))
df_tmbd["main_company"]

0        Ingenious Film Partners
1           Walt Disney Pictures
2              Columbia Pictures
3             Legendary Pictures
4           Walt Disney Pictures
                  ...           
4798           Columbia Pictures
4799                        None
4800       Front Street Pictures
4801                        None
4802    rusty bear entertainment
Name: main_company, Length: 4803, dtype: object

In [23]:
df_tmbd["main_country"] = df_tmbd["production_countries"].apply(lambda x: get_main_column(x))
df_tmbd["main_country"]

0       United States of America
1       United States of America
2                 United Kingdom
3       United States of America
4       United States of America
                  ...           
4798                      Mexico
4799                        None
4800    United States of America
4801    United States of America
4802    United States of America
Name: main_country, Length: 4803, dtype: object

In [24]:
df_tmbd.sample(3)

Unnamed: 0,budget,genres,original_language,popularity,production_companies,production_countries,release_date,revenue,runtime,status,title,vote_average,vote_count,main_genre,main_company,main_country
4230,1500000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",en,5.817519,"[{""name"": ""Live Entertainment"", ""id"": 285}]","[{""iso_3166_1"": ""FR"", ""name"": ""France""}, {""iso...",1993-10-01,418961,96.0,Released,Killing Zoe,6.1,111,Action,Live Entertainment,France
1136,31000000,"[{""id"": 18, ""name"": ""Drama""}, {""id"": 12, ""name...",zh,23.607392,"[{""name"": ""Beijing New Picture Film Co. Ltd."",...","[{""iso_3166_1"": ""CN"", ""name"": ""China""}]",2002-12-19,177394432,99.0,Released,Hero,7.2,635,Drama,Beijing New Picture Film Co. Ltd.,China
2954,11000000,"[{""id"": 18, ""name"": ""Drama""}, {""id"": 36, ""name...",en,1.396985,"[{""name"": ""Paramount Pictures"", ""id"": 4}, {""na...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",1970-01-27,2200000,124.0,Released,The Molly Maguires,5.9,16,Drama,Paramount Pictures,United States of America


In [29]:
df_tmbd["year"] = df_tmbd["release_date"].str.split(pat="-",expand = True)[0]

In [30]:
df_tmbd

Unnamed: 0,budget,genres,original_language,popularity,production_companies,production_countries,release_date,revenue,runtime,status,title,vote_average,vote_count,main_genre,main_company,main_country,year
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",en,150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2009-12-10,2787965087,162.0,Released,Avatar,7.2,11800,Action,Ingenious Film Partners,United States of America,2009
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",en,139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2007-05-19,961000000,169.0,Released,Pirates of the Caribbean: At World's End,6.9,4500,Adventure,Walt Disney Pictures,United States of America,2007
2,245000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",en,107.376788,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...","[{""iso_3166_1"": ""GB"", ""name"": ""United Kingdom""...",2015-10-26,880674609,148.0,Released,Spectre,6.3,4466,Action,Columbia Pictures,United Kingdom,2015
3,250000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",en,112.312950,"[{""name"": ""Legendary Pictures"", ""id"": 923}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-07-16,1084939099,165.0,Released,The Dark Knight Rises,7.6,9106,Action,Legendary Pictures,United States of America,2012
4,260000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",en,43.926995,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}]","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-03-07,284139100,132.0,Released,John Carter,6.1,2124,Action,Walt Disney Pictures,United States of America,2012
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4798,220000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",es,14.269792,"[{""name"": ""Columbia Pictures"", ""id"": 5}]","[{""iso_3166_1"": ""MX"", ""name"": ""Mexico""}, {""iso...",1992-09-04,2040920,81.0,Released,El Mariachi,6.6,238,Action,Columbia Pictures,Mexico,1992
4799,9000,"[{""id"": 35, ""name"": ""Comedy""}, {""id"": 10749, ""...",en,0.642552,[],[],2011-12-26,0,85.0,Released,Newlyweds,5.9,5,Comedy,,,2011
4800,0,"[{""id"": 35, ""name"": ""Comedy""}, {""id"": 18, ""nam...",en,1.444476,"[{""name"": ""Front Street Pictures"", ""id"": 3958}...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2013-10-13,0,120.0,Released,"Signed, Sealed, Delivered",7.0,6,Comedy,Front Street Pictures,United States of America,2013
4801,0,[],en,0.857008,[],"[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-05-03,0,98.0,Released,Shanghai Calling,5.7,7,,,United States of America,2012


### Análise

Algumas perguntas a se pensar:

* Filmes por votos
* Filmes por gênero
* Filmes por linguagem original
* Filmes por estudio de cinema
* Filmes por pais
* Filmes por ano/decada de lançamento
* Filmes por status

In [31]:
df_tmbd.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4803 entries, 0 to 4802
Data columns (total 17 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   budget                4803 non-null   int64  
 1   genres                4803 non-null   object 
 2   original_language     4803 non-null   object 
 3   popularity            4803 non-null   float64
 4   production_companies  4803 non-null   object 
 5   production_countries  4803 non-null   object 
 6   release_date          4802 non-null   object 
 7   revenue               4803 non-null   int64  
 8   runtime               4801 non-null   float64
 9   status                4803 non-null   object 
 10  title                 4803 non-null   object 
 11  vote_average          4803 non-null   float64
 12  vote_count            4803 non-null   int64  
 13  main_genre            4775 non-null   object 
 14  main_company          4452 non-null   object 
 15  main_country         

In [32]:
df_tmbd.describe()

Unnamed: 0,budget,popularity,revenue,runtime,vote_average,vote_count
count,4803.0,4803.0,4803.0,4801.0,4803.0,4803.0
mean,29045040.0,21.492301,82260640.0,106.875859,6.092172,690.217989
std,40722390.0,31.81665,162857100.0,22.611935,1.194612,1234.585891
min,0.0,0.0,0.0,0.0,0.0,0.0
25%,790000.0,4.66807,0.0,94.0,5.6,54.0
50%,15000000.0,12.921594,19170000.0,103.0,6.2,235.0
75%,40000000.0,28.313505,92917190.0,118.0,6.8,737.0
max,380000000.0,875.581305,2787965000.0,338.0,10.0,13752.0
