# Exploratory Data Analysis of IMDb's Top Global Movies (1950-2020)

This notebook contains the exploratory data analysis (EDA) for the dataset of IMDb's top global movies from 1950 to 2020. The analysis includes data loading, preprocessing, statistical analysis, and visualizations.

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set visualization style
sns.set(style='whitegrid')

## Load Data

In this section, we will load the processed data from the `data/processed` directory.

In [None]:
# Load the processed data
data_path = '../data/processed/movies_data.csv'
movies_df = pd.read_csv(data_path)

# Display the first few rows of the dataset
movies_df.head()

## Data Overview

Let's take a look at the basic statistics and structure of the dataset.

In [None]:
# Display basic statistics
movies_df.describe(include='all')

# Display the data types and missing values
movies_df.info()

## Data Visualization

In this section, we will create visualizations to uncover trends and patterns in the dataset.

In [None]:
# Example visualization: Distribution of movie ratings
plt.figure(figsize=(10, 6))
sns.histplot(movies_df['rating'], bins=20, kde=True)
plt.title('Distribution of Movie Ratings')
plt.xlabel('Rating')
plt.ylabel('Frequency')
plt.show()

## Conclusion

This notebook provides a starting point for exploratory data analysis of IMDb's top global movies. Further analysis can be conducted to investigate specific trends, correlations, and insights.