About Dataset
This dataset contains information about the top-rated 9,000 movies from The Movie Database (TMDB). It includes key details like movie titles, release dates, popularity, user ratings, and genre classifications. This dataset can be used to analyze trends in movie ratings, popularity, and genre distribution over time.

Column Descriptions:

id:
A unique identifier for each movie on TMDB.

title:
The title of the movie.

original_language:
The language in which the movie was originally made. Common values include "en" for English, "fr" for French, etc.

release_date:
The date when the movie was released, in the format YYYY-MM-DD.

vote_average:
The average rating of the movie, based on user ratings on TMDB.

vote_count:
The total number of user votes or ratings that the movie has received.

popularity:
A measure of the movie's popularity on TMDB, based on factors like views, ratings, and social media mentions.

overview:
A brief summary or synopsis of the movie’s plot.

genre_ids:
A list of numeric IDs representing the genres of the movie.

Genres:
A list of the genres associated with the movie, expressed in human-readable format (e.g., Drama, Crime, History).

In [1]:
import pandas as pd

In [2]:
# Importando a base de dados

df = pd.read_csv('https://raw.githubusercontent.com/henriqueoffice/Top-rated-9000-movies-TMDB/refs/heads/main/top_rated_9000_movies_on_TMDB.csv')
df

Unnamed: 0,id,title,original_language,release_date,vote_average,vote_count,popularity,overview,genre_ids,Genres
0,278,The Shawshank Redemption,en,1994-09-23,8.706,26840,150.307,Imprisoned in the 1940s for the double murder ...,"[18, 80]","['Drama', 'Crime']"
1,238,The Godfather,en,1972-03-14,8.690,20373,122.973,"Spanning the years 1945 to 1955, a chronicle o...","[18, 80]","['Drama', 'Crime']"
2,240,The Godfather Part II,en,1974-12-20,8.575,12291,94.204,In the continuing saga of the Corleone crime f...,"[18, 80]","['Drama', 'Crime']"
3,424,Schindler's List,en,1993-12-15,8.565,15695,74.615,The true story of how businessman Oskar Schind...,"[18, 36, 10752]","['Drama', 'History', 'War']"
4,389,12 Angry Men,en,1957-04-10,8.546,8522,54.678,The defense and the prosecution have rested an...,[18],['Drama']
...,...,...,...,...,...,...,...,...,...,...
9625,12142,Alone in the Dark,en,2005-01-28,3.251,599,11.337,Edward Carnby is a private investigator specia...,"[28, 14, 27]","['Action', 'Fantasy', 'Horror']"
9626,13805,Disaster Movie,en,2008-08-29,3.179,1015,19.006,"Over the course of one evening, an unsuspectin...",[35],['Comedy']
9627,11059,House of the Dead,en,2003-04-11,3.129,379,12.745,"Set on an island off the coast, a techno rave ...","[27, 28, 53]","['Horror', 'Action', 'Thriller']"
9628,14164,Dragonball Evolution,en,2009-03-12,2.896,2016,20.904,"On his 18th birthday, Goku receives a mystical...","[28, 12, 14, 878, 53]","['Action', 'Adventure', 'Fantasy', 'Science Fi..."


In [3]:
# Tratamento dos dados

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9630 entries, 0 to 9629
Data columns (total 10 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   id                 9630 non-null   int64  
 1   title              9630 non-null   object 
 2   original_language  9630 non-null   object 
 3   release_date       9630 non-null   object 
 4   vote_average       9630 non-null   float64
 5   vote_count         9630 non-null   int64  
 6   popularity         9630 non-null   float64
 7   overview           9630 non-null   object 
 8   genre_ids          9630 non-null   object 
 9   Genres             9630 non-null   object 
dtypes: float64(2), int64(2), object(6)
memory usage: 752.5+ KB


In [4]:
df.isnull().sum()

Unnamed: 0,0
id,0
title,0
original_language,0
release_date,0
vote_average,0
vote_count,0
popularity,0
overview,0
genre_ids,0
Genres,0


In [5]:
df.duplicated().sum()

117

In [6]:
filtro_duplicadas = df.duplicated()
df[filtro_duplicadas]

Unnamed: 0,id,title,original_language,release_date,vote_average,vote_count,popularity,overview,genre_ids,Genres
500,632322,All My Life,en,2020-10-23,7.847,474,17.077,It was a chance meeting started by one of Sol’...,"[10749, 18]","['Romance', 'Drama']"
540,16859,Kiki's Delivery Service,ja,1989-07-29,7.826,3983,57.343,"A young witch, on her mandatory year of indepe...","[16, 10751, 14]","['Animation', 'Family', 'Fantasy']"
544,323272,War Room,en,2015-08-28,7.822,506,34.532,The family-friendly movie explores the transfo...,[18],['Drama']
560,25606,White Collar Blues,it,1975-03-27,7.817,817,6.950,A good-natured but unlucky Italian is constant...,[35],['Comedy']
800,803700,The Eight Mountains,it,2022-12-21,7.687,675,15.020,An epic journey of friendship and self-discove...,[18],['Drama']
...,...,...,...,...,...,...,...,...,...,...
8210,44982,13,en,2010-03-12,5.836,795,11.430,A naive young man assumes a dead man's identit...,"[18, 53]","['Drama', 'Thriller']"
8297,305932,Expelled,en,2014-12-12,5.809,447,6.774,Felix is a legendary prankster who gets expell...,[35],['Comedy']
8440,357096,I Spit on Your Grave III: Vengeance Is Mine,en,2015-10-01,5.748,925,32.152,Jennifer Hills is still tormented by the bruta...,"[27, 53]","['Horror', 'Thriller']"
8885,61012,Silent Hill: Revelation 3D,en,2012-10-10,5.523,1603,27.798,Heather Mason and her father have been on the ...,"[53, 27, 9648]","['Thriller', 'Horror', 'Mystery']"


In [7]:
df.groupby('title').get_group('War Room')

Unnamed: 0,id,title,original_language,release_date,vote_average,vote_count,popularity,overview,genre_ids,Genres
538,323272,War Room,en,2015-08-28,7.822,506,34.532,The family-friendly movie explores the transfo...,[18],['Drama']
544,323272,War Room,en,2015-08-28,7.822,506,34.532,The family-friendly movie explores the transfo...,[18],['Drama']


In [8]:
df.drop_duplicates(inplace=True)
df.duplicated().sum()

0

In [9]:
df.isna().sum()

Unnamed: 0,0
id,0
title,0
original_language,0
release_date,0
vote_average,0
vote_count,0
popularity,0
overview,0
genre_ids,0
Genres,0


In [10]:
df['release_date'] = pd.to_datetime(df['release_date'])
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 9513 entries, 0 to 9629
Data columns (total 10 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   id                 9513 non-null   int64         
 1   title              9513 non-null   object        
 2   original_language  9513 non-null   object        
 3   release_date       9513 non-null   datetime64[ns]
 4   vote_average       9513 non-null   float64       
 5   vote_count         9513 non-null   int64         
 6   popularity         9513 non-null   float64       
 7   overview           9513 non-null   object        
 8   genre_ids          9513 non-null   object        
 9   Genres             9513 non-null   object        
dtypes: datetime64[ns](1), float64(2), int64(2), object(5)
memory usage: 817.5+ KB


In [11]:
df.sort_values(by='vote_average', ascending=False)
df.reset_index(inplace = True, drop = True)
df

Unnamed: 0,id,title,original_language,release_date,vote_average,vote_count,popularity,overview,genre_ids,Genres
0,278,The Shawshank Redemption,en,1994-09-23,8.706,26840,150.307,Imprisoned in the 1940s for the double murder ...,"[18, 80]","['Drama', 'Crime']"
1,238,The Godfather,en,1972-03-14,8.690,20373,122.973,"Spanning the years 1945 to 1955, a chronicle o...","[18, 80]","['Drama', 'Crime']"
2,240,The Godfather Part II,en,1974-12-20,8.575,12291,94.204,In the continuing saga of the Corleone crime f...,"[18, 80]","['Drama', 'Crime']"
3,424,Schindler's List,en,1993-12-15,8.565,15695,74.615,The true story of how businessman Oskar Schind...,"[18, 36, 10752]","['Drama', 'History', 'War']"
4,389,12 Angry Men,en,1957-04-10,8.546,8522,54.678,The defense and the prosecution have rested an...,[18],['Drama']
...,...,...,...,...,...,...,...,...,...,...
9508,12142,Alone in the Dark,en,2005-01-28,3.251,599,11.337,Edward Carnby is a private investigator specia...,"[28, 14, 27]","['Action', 'Fantasy', 'Horror']"
9509,13805,Disaster Movie,en,2008-08-29,3.179,1015,19.006,"Over the course of one evening, an unsuspectin...",[35],['Comedy']
9510,11059,House of the Dead,en,2003-04-11,3.129,379,12.745,"Set on an island off the coast, a techno rave ...","[27, 28, 53]","['Horror', 'Action', 'Thriller']"
9511,14164,Dragonball Evolution,en,2009-03-12,2.896,2016,20.904,"On his 18th birthday, Goku receives a mystical...","[28, 12, 14, 878, 53]","['Action', 'Adventure', 'Fantasy', 'Science Fi..."




---

Distribuição de filmes por gênero

---



In [12]:
distribuicao_generos = df['Genres'].value_counts().nlargest(15)
distribuicao_generos

Unnamed: 0_level_0,count
Genres,Unnamed: 1_level_1
['Comedy'],565
['Drama'],545
"['Drama', 'Romance']",262
"['Comedy', 'Romance']",243
"['Comedy', 'Drama']",218
"['Horror', 'Thriller']",190
['Horror'],149
"['Comedy', 'Drama', 'Romance']",139
"['Drama', 'Thriller']",127
"['Drama', 'History']",121


In [13]:
import plotly.graph_objects as go

# Criando um gráfico de colunas usando Plotly Graph Objects
fig = go.Figure()

fig.add_trace(go.Bar(x=distribuicao_generos.index, y=distribuicao_generos.values, text=distribuicao_generos.values, textposition='auto'))

fig.update_layout(title='Top 15 Gêneros',
                  xaxis_title='Gêneros', yaxis_title='Quantidade Filmes')

# Mostrando o gráfico
fig.show()



---

Distribuição de filmes por idioma

---



In [14]:
distribuicao_idiomas = df['original_language'].value_counts().nlargest(10)

# Criando um gráfico de colunas usando Plotly Graph Objects
fig = go.Figure()

fig.add_trace(go.Bar(x=distribuicao_idiomas.index, y=distribuicao_idiomas.values, text=distribuicao_idiomas.values, textposition='auto'))

fig.update_layout(title='Top 10 Idiomas',
                  xaxis_title='Idiomas', yaxis_title='Quantidade Filmes')

# Mostrando o gráfico
fig.show()

In [15]:
df['year'] = df['release_date'].dt.year
df['month'] = df['release_date'].dt.month
df

Unnamed: 0,id,title,original_language,release_date,vote_average,vote_count,popularity,overview,genre_ids,Genres,year,month
0,278,The Shawshank Redemption,en,1994-09-23,8.706,26840,150.307,Imprisoned in the 1940s for the double murder ...,"[18, 80]","['Drama', 'Crime']",1994,9
1,238,The Godfather,en,1972-03-14,8.690,20373,122.973,"Spanning the years 1945 to 1955, a chronicle o...","[18, 80]","['Drama', 'Crime']",1972,3
2,240,The Godfather Part II,en,1974-12-20,8.575,12291,94.204,In the continuing saga of the Corleone crime f...,"[18, 80]","['Drama', 'Crime']",1974,12
3,424,Schindler's List,en,1993-12-15,8.565,15695,74.615,The true story of how businessman Oskar Schind...,"[18, 36, 10752]","['Drama', 'History', 'War']",1993,12
4,389,12 Angry Men,en,1957-04-10,8.546,8522,54.678,The defense and the prosecution have rested an...,[18],['Drama'],1957,4
...,...,...,...,...,...,...,...,...,...,...,...,...
9508,12142,Alone in the Dark,en,2005-01-28,3.251,599,11.337,Edward Carnby is a private investigator specia...,"[28, 14, 27]","['Action', 'Fantasy', 'Horror']",2005,1
9509,13805,Disaster Movie,en,2008-08-29,3.179,1015,19.006,"Over the course of one evening, an unsuspectin...",[35],['Comedy'],2008,8
9510,11059,House of the Dead,en,2003-04-11,3.129,379,12.745,"Set on an island off the coast, a techno rave ...","[27, 28, 53]","['Horror', 'Action', 'Thriller']",2003,4
9511,14164,Dragonball Evolution,en,2009-03-12,2.896,2016,20.904,"On his 18th birthday, Goku receives a mystical...","[28, 12, 14, 878, 53]","['Action', 'Adventure', 'Fantasy', 'Science Fi...",2009,3


In [17]:
top_year = df['year'].value_counts().nlargest(20)

# Criando um gráfico de colunas usando Plotly Graph Objects
fig = go.Figure()

fig.add_trace(go.Bar(x=top_year.index, y=top_year.values, text=top_year.values, textposition='auto'))

fig.update_layout(title='Top 20 anos',
                  xaxis_title='Anos', yaxis_title='Quantidade Filmes')

# Mostrando o gráfico
fig.show()

In [18]:
top_month = df['month'].value_counts().nlargest(20)

# Criando um gráfico de colunas usando Plotly Graph Objects
fig = go.Figure()

fig.add_trace(go.Bar(x=top_month.index, y=top_month.values, text=top_month.values, textposition='auto'))

fig.update_layout(title='Top 20 meses',
                  xaxis_title='Meses', yaxis_title='Quantidade Filmes')

# Mostrando o gráfico
fig.show()

In [20]:
# Selcionando os top 25 filmes pela nota média (vote_average)
top_25_movies = df.nlargest(25, 'vote_average')

# Criando um gráfico de colunas usando Plotly Graph Objects
fig = go.Figure()

fig.add_trace(go.Bar(x=top_25_movies['title'], y=top_25_movies['vote_average'], text=top_25_movies['vote_average'], textposition='auto'))

fig.update_layout(title='Top 25 filmes por nota média',
                  xaxis_title='Filmes', yaxis_title='Nota Média')

# Mostrando o gráfico
fig.show()