
# Movie Data Analysis Notebook

This notebook contains a series of analyses performed on two datasets related to movies and awards. The analyses include trends over time, correlations, and visual insights.

### Datasets:
1. **Movies Dataset**: Contains information about movies, such as title, director, genres, IMDb scores, and user reviews.
2. **Awards Dataset**: Contains information about awards, including categories, winners, and films.

Each analysis is implemented in its own section for clarity.



## 1. Load and Explore the Datasets

We start by loading the datasets and exploring their structure to understand the available information.

```python
import pandas as pd

# Paths to the datasets (replace with actual paths)
movies_path = 'movies.csv'  # Example movies dataset path
awards_path = 'awards.csv'  # Example awards dataset path

# Load datasets
movies_df = pd.read_csv(movies_path)
awards_df = pd.read_csv(awards_path)

# Display the first few rows of each dataset
print("Movies Dataset:")
print(movies_df.head())

print("\nAwards Dataset:")
print(awards_df.head())
```


## 2. Genre Trends Over Time

In [None]:

We analyze the trends in movie genres over time.

```python
# Preprocess genres data
movies_df['genres'] = movies_df['genres'].str.split('|')
movies_expanded = movies_df.explode('genres')

# Group by year and genre
genres_over_time = movies_expanded.groupby(['title_year', 'genres']).size().unstack(fill_value=0)

# Plot genre trends over time
import matplotlib.pyplot as plt

plt.figure(figsize=(12, 8))
for genre in genres_over_time.columns:
    plt.plot(genres_over_time.index, genres_over_time[genre], label=genre)

plt.title('Genre Trends Over Time')
plt.xlabel('Year')
plt.ylabel('Number of Movies')
plt.legend(title='Genres', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.grid()
plt.show()
```


## 3. Movie Duration Trends

In [None]:

We analyze how the average duration of movies has changed over time.

```python
# Group by year and calculate average duration
duration_trends = movies_df.groupby('title_year')['duration'].mean()

# Plot the trend
plt.figure(figsize=(12, 8))
plt.plot(duration_trends.index, duration_trends.values, marker='o')
plt.title('Average Movie Duration Over Time')
plt.xlabel('Year')
plt.ylabel('Duration (minutes)')
plt.grid()
plt.show()
```
