# 🎬 IMDB Top 250 Movies Analysis

In this project, I will analyze the IMDB Top 250 Movies dataset.  
The main goals are to practice **Pandas** and **Matplotlib** by exploring:  
- The top-rated movies  
- Movies distribution by decades  
- Average runtime of movies  
- Movie rating categories (PG, R, etc.)

## Step 0: Setup

First, we will import the required Python libraries and load the dataset.
We will also take an initial look at the data to understand its structure.


In [8]:
import pandas as pd
import matplotlib.pyplot as plt


#load the dataset
imdb = pd.read_csv('IMDB_processed_data.csv')


#preview
imdb.head()



Unnamed: 0,Rank,Title,Release,Runtime,Rated,Ratings
0,1,The Shawshank Redemption,1994,2h 22m,R,9.3
1,2,The Godfather,1972,2h 55m,R,9.2
2,3,The Dark Knight,2008,2h 32m,PG-13,9.0
3,4,The Godfather Part II,1974,3h 22m,R,9.0
4,5,12 Angry Men,1957,1h 36m,Approved,9.0


## Step 1: Data Preparation

The "Runtime" column is currently in text format ("2h 22m").  
We will convert it into minutes for easier analysis.  
We will also create a new "Decade" column to group movies by release decade.

In [10]:
#convert runtime string into minutes
def run_time(s):
    s = str(s).strip()
    hours = 0
    minutes = 0
    for part in s.split():
        if part.endswith('h'):
            hours = int(part[:-1])
        elif part.endswith('m'):
            minutes = int(part[:-1])

    return  hours * 60 + minutes


imdb['RuntimeMinutes'] = imdb['Runtime'].apply(run_time)

#Create Decade column
imdb['Decade'] = (imdb['Release'] // 10)*10

imdb.head()

Unnamed: 0,Rank,Title,Release,Runtime,Rated,Ratings,RuntimeMinutes,Decade
0,1,The Shawshank Redemption,1994,2h 22m,R,9.3,142,1990
1,2,The Godfather,1972,2h 55m,R,9.2,175,1970
2,3,The Dark Knight,2008,2h 32m,PG-13,9.0,152,2000
3,4,The Godfather Part II,1974,3h 22m,R,9.0,202,1970
4,5,12 Angry Men,1957,1h 36m,Approved,9.0,96,1950
