# ðŸŽ¬ IMDB Films Data Analysis Exercise

## Introduction

Welcome to this hands-on data analysis exercise! You'll be working with real IMDB movie data to answer various questions about film history, actors, and movie production trends.

**What you'll learn:**
- Loading and exploring datasets with Pandas
- Filtering and sorting data
- Grouping and aggregating information
- Merging multiple datasets
- Extracting insights from real-world data

**Datasets:**
- `titles.csv`: Contains information about movies (title, year, etc.)
- `cast.csv`: Contains information about movie cast members (actor, character, role order)

**Skills you'll practice:**
- DataFrame operations
- Boolean indexing
- Sorting and filtering
- Merging DataFrames
- Aggregation and counting

---

## Setup

First, let's import the necessary libraries:

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Display settings for better output
pd.set_option('display.max_rows', 100)
pd.set_option('display.max_columns', None)

print("âœ“ Libraries loaded successfully")

---

## Question 1: Load the IMDB Titles Dataset

**Task:** Load the `titles.csv` file into a DataFrame called `titles`.

**Expected columns:**
- `title`: Movie title
- `year`: Release year
- Other relevant information

**Instructions:**
1. Use `pd.read_csv()` to load the file
2. Display the first few rows
3. Check the shape of the DataFrame
4. Display column names and data types

In [None]:
# YOUR CODE HERE
# Load titles.csv into a DataFrame called 'titles'


---

## Question 2: Load the IMDB Cast Dataset

**Task:** Load the `cast.csv` file into a DataFrame called `cast`.

**Expected columns:**
- `title`: Movie title
- `year`: Release year
- `name`: Actor name
- `type`: Type of role (actor/actress)
- `character`: Character name
- `n`: Role order/importance (lower numbers = more important roles)

**Instructions:**
1. Load the file using `pd.read_csv()`
2. Display the first few rows
3. Check the shape and info

In [None]:
# YOUR CODE HERE
# Load cast.csv into a DataFrame called 'cast'


---

## Question 3: How Many Movies Are Listed?

**Task:** Count the total number of movies in the `titles` DataFrame.

**Hint:** Use `.shape[0]` or `len()` to count rows.

In [None]:
# YOUR CODE HERE
# Count total number of movies


---

## Question 4: What Are the Oldest Movies Listed?

**Task:** Find the oldest movies in the dataset.

**Instructions:**
1. Sort the DataFrame by year (ascending)
2. Display the first 10 movies
3. What is the earliest year in the dataset?

**Hint:** Use `.sort_values()` and `.head()`

In [None]:
# YOUR CODE HERE
# Find and display the oldest movies


---

## Question 5: How Many Movies Are Named "Dracula"?

**Task:** Count how many movies have the exact title "Dracula".

**Hint:** Use boolean indexing with `titles['title'] == 'Dracula'`

In [None]:
# YOUR CODE HERE
# Count movies titled "Dracula"


---

## Question 6: Most Common Titles in Film History

**Task:** Find the most frequently used movie titles.

**Instructions:**
1. Count how many times each title appears
2. Sort by count in descending order
3. Display the top 20 most common titles

**Hint:** Use `.value_counts()`

In [None]:
# YOUR CODE HERE
# Find the most common movie titles


---

## Question 7: First "Romeo and Juliet" Movie

**Task:** Find the year of the first movie titled "Romeo and Juliet".

**Instructions:**
1. Filter for movies with title "Romeo and Juliet"
2. Sort by year
3. Display the first one

**Expected output:** The earliest "Romeo and Juliet" film

In [None]:
# YOUR CODE HERE
# Find the first "Romeo and Juliet" movie


---

## Question 8: List All "Exorcist" Movies

**Task:** List all movies that contain the word "Exorcist" in their title, ordered from oldest to newest.

**Instructions:**
1. Use `.str.contains()` to find titles with "Exorcist"
2. Sort by year (ascending)
3. Display all matching movies

**Hint:** `titles[titles['title'].str.contains('Exorcist')]`

In [None]:
# YOUR CODE HERE
# Find all "Exorcist" movies, sorted by year


---

## Question 9: Movies Made in 1950

**Task:** Count how many movies were made in 1950.

**Hint:** Filter by `year == 1950` and count

In [None]:
# YOUR CODE HERE
# Count movies from 1950


---

## Question 10: Movies Made in 1970

**Task:** Count how many movies were made in 1970.

In [None]:
# YOUR CODE HERE
# Count movies from 1970


---

## Question 11: Movies from the 1950s Decade

**Task:** Count how many movies were made between 1950 and 1959 (inclusive).

**Hint:** Use `(titles['year'] >= 1950) & (titles['year'] <= 1959)`

In [None]:
# YOUR CODE HERE
# Count movies from 1950-1959


---

## Question 12: Years of "Batman" Movies

**Task:** List all the years when a movie titled "Batman" was released.

**Instructions:**
1. Filter for title "Batman"
2. Get unique years
3. Sort them

**Expected output:** A list/series of years

In [None]:
# YOUR CODE HERE
# Find all years when "Batman" was released


---

## Question 13: Cast Size of "The Godfather"

**Task:** Count how many roles/characters were in "The Godfather" movie.

**Instructions:**
1. Filter the `cast` DataFrame for "The Godfather"
2. Count the number of rows

**Note:** If there are multiple "The Godfather" movies, you might need to also filter by year.

In [None]:
# YOUR CODE HERE
# Count roles in "The Godfather"


---

## Question 14: Uncredited Roles in "The Godfather"

**Task:** Count how many roles in "The Godfather" do NOT have an 'n' value (uncredited/unranked roles).

**Hint:** Check for null values in the 'n' column using `.isnull()` or `.isna()`

In [None]:
# YOUR CODE HERE
# Count roles without 'n' classification


---

## Question 15: Credited Roles in "The Godfather"

**Task:** Count how many roles in "The Godfather" DO have an 'n' value (credited/ranked roles).

**Hint:** Use `.notna()` or `.notnull()`

In [None]:
# YOUR CODE HERE
# Count roles with 'n' classification


---

## Question 16: Complete Cast of "2001: A Space Odyssey"

**Task:** Display the complete cast of "2001: A Space Odyssey" ordered by their 'n' classification, ignoring roles without an 'n' value.

**Instructions:**
1. Filter cast for this movie
2. Remove rows where 'n' is null
3. Sort by 'n' (ascending - lower numbers = more important)
4. Display relevant columns: name, character, n

**Expected output:** Ordered list of actors and their characters

In [None]:
# YOUR CODE HERE
# Display cast of "2001: A Space Odyssey" ordered by 'n'


---

## Question 17: Cast of "Dracula" (1958)

**Task:** Display the complete cast of the 1958 "Dracula" movie, ordered by 'n' classification.

**Instructions:**
1. Filter for title "Dracula" AND year 1958
2. Remove rows where 'n' is null
3. Sort by 'n'
4. Display name, character, and n

**Note:** You need to filter by both title AND year

In [None]:
# YOUR CODE HERE
# Display cast of "Dracula" (1958) ordered by 'n'


---

## Question 18: Cast Size of "The Wizard of Oz" (1939)

**Task:** Count how many roles were listed for "The Wizard of Oz" released in 1939.

**Hint:** Filter by both title and year, then count

In [None]:
# YOUR CODE HERE
# Count roles in "The Wizard of Oz" (1939)


---

## Question 19: How Many Actors Played "Romeo"?

**Task:** Count how many different people have played the character "Romeo" throughout film history.

**Instructions:**
1. Filter cast for character name "Romeo"
2. Count unique actors (use `.nunique()`)

**Expected output:** Number of unique actors

In [None]:
# YOUR CODE HERE
# Count unique actors who played "Romeo"


---

## Question 20: Robert De Niro's Career Roles

**Task:** Count how many roles Robert De Niro has had in his career.

**Hint:** Filter by actor name and count rows

In [None]:
# YOUR CODE HERE
# Count Robert De Niro's total roles


---

## Question 21: Charlton Heston's Supporting Roles (1950s)

**Task:** List supporting roles played by Charlton Heston in the 1950s, ordered by year.

**Instructions:**
1. Filter for actor "Charlton Heston"
2. Filter for years 1950-1959
3. Filter for supporting roles (n > 1 or higher n values)
4. Sort by year (ascending)
5. Display: title, year, character, n

**Note:** Supporting roles typically have n > 1 (leading roles usually have n = 1)

In [None]:
# YOUR CODE HERE
# Find Charlton Heston's supporting roles in the 1950s


---

## Question 22: Charlton Heston's Leading Roles (1960s)

**Task:** List leading roles played by Charlton Heston in the 1960s, ordered by year (descending).

**Instructions:**
1. Filter for actor "Charlton Heston"
2. Filter for years 1960-1969
3. Filter for leading roles (n == 1)
4. Sort by year (descending)
5. Display: title, year, character, n

**Expected output:** His starring roles from newest to oldest in that decade

In [None]:
# YOUR CODE HERE
# Find Charlton Heston's leading roles in the 1960s (descending order)


---

## Bonus Questions (Optional Challenges)

If you finish early, try these additional challenges:

### Bonus 1: Most Prolific Actors
Find the 10 actors with the most roles in film history.

### Bonus 2: Decade Analysis
Create a visualization showing the number of movies produced per decade.

### Bonus 3: Common Character Names
Find the most common character names across all movies.

### Bonus 4: Actor Collaborations
Find which two actors have appeared together in the most movies.

### Bonus 5: Year with Most Productions
Which year had the highest number of movie productions?

In [None]:
# BONUS QUESTIONS - YOUR CODE HERE


---

## Summary

Congratulations! ðŸŽ‰

You've completed a comprehensive data analysis of IMDB movie data. 

**Skills you practiced:**
- âœ“ Loading and exploring datasets
- âœ“ Filtering data with boolean indexing
- âœ“ Sorting and ordering data
- âœ“ Counting and aggregating
- âœ“ Handling missing values
- âœ“ Working with multiple DataFrames
- âœ“ String operations with `.str` methods

**Next steps:**
- Practice with other datasets on Kaggle
- Learn data visualization with these insights
- Explore more complex queries and aggregations
- Try the bonus questions for extra challenge!

**Resources:**
- [Pandas Documentation](https://pandas.pydata.org/docs/)
- [IMDB Datasets](https://www.imdb.com/interfaces/)
- [Kaggle Datasets](https://www.kaggle.com/datasets)

Happy analyzing! ðŸ“ŠðŸŽ¬