# üé¨ TMDB Movie Data Analysis

**Author:** Asres Yelia  
**Project Type:** Exploratory Data Analysis Project  
**Data Source:** The Movie Database (TMDB) API  
**Dataset:** 1,500 movies (multiple languages)



## üìå Project Description
This project focuses on collecting and analyzing movie data from
The Movie Database (TMDB) API using Python.
The dataset includes movies from **multiple languages**, not only English.
The goal is to practice API usage, data cleaning, and basic data analysis.

## üéØ Project Objectives
- Collect movie data using TMDB API
- Store data in CSV format
- Analyze movie popularity and ratings
- Explore movie languages
- Visualize insights using charts


In [20]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np


## load dataset 

In [21]:
# load dataset 
data = pd.read_csv("../data/tmdb_1500_movies_all_languages.csv")


## Display the first few rows of the dataset

In [22]:
# Display the first few rows of the dataset
print(data.head())

        id                      title release_date    status  \
0    83533       Avatar: Fire and Ash   2025-12-17  Released   
1  1511417         BƒÅhubali: The Epic   2025-10-29  Released   
2  1084242                 Zootopia 2   2025-11-26  Released   
3  1228246  Five Nights at Freddy's 2   2025-12-03  Released   
4   982843            The Great Flood   2025-09-18  Released   

  original_language  vote_average  vote_count  popularity     budget  \
0                en         7.399         882    543.4410  350000000   
1                te         7.600          10    437.2846   65000000   
2                en         7.623         837    391.9073  150000000   
3                en         6.881         434    356.4375   36000000   
4                ko         6.100         341    312.7574          0   

      revenue  runtime                                         genres  
0   761622924      198            Science Fiction, Adventure, Fantasy  
1     3500000      224               

## let's get some information about the dataset

In [23]:
# let's get some information about the dataset

print("Infromation about the tmdb_1500_movies_all_languages")
print(data.info())

Infromation about the tmdb_1500_movies_all_languages
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1500 entries, 0 to 1499
Data columns (total 12 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   id                 1500 non-null   int64  
 1   title              1500 non-null   object 
 2   release_date       1496 non-null   object 
 3   status             1500 non-null   object 
 4   original_language  1500 non-null   object 
 5   vote_average       1500 non-null   float64
 6   vote_count         1500 non-null   int64  
 7   popularity         1500 non-null   float64
 8   budget             1500 non-null   int64  
 9   revenue            1500 non-null   int64  
 10  runtime            1500 non-null   int64  
 11  genres             1497 non-null   object 
dtypes: float64(2), int64(5), object(5)
memory usage: 140.8+ KB
None


## Check for Missing Values

In [24]:
# Check for missing values in all columns
missing_values = data.isnull().sum()
print("Missing Values in Each Column\n")
for col, val in missing_values.items():
    print(f"{col}: {val} missing")

print("\n")  # Add a blank line for readability

# Check for zero values in numeric columns: budget, revenue, runtime
zero_values = (data[['budget', 'revenue', 'runtime']] == 0).sum()
print("Zero Values in Numeric Columns")
for col, val in zero_values.items():
    print(f"{col}: {val} zeros")


Missing Values in Each Column

id: 0 missing
title: 0 missing
release_date: 4 missing
status: 0 missing
original_language: 0 missing
vote_average: 0 missing
vote_count: 0 missing
popularity: 0 missing
budget: 0 missing
revenue: 0 missing
runtime: 0 missing
genres: 3 missing


Zero Values in Numeric Columns
budget: 464 zeros
revenue: 439 zeros
runtime: 31 zeros
