# Jeopardy Data Analysis Project 🎯

## Introduction  
This project explores Jeopardy questions to identify patterns, trends, and relationships between different aspects of the game.

### 🎯 Project Goals:
✔ Find questions containing specific words  
✔ Convert monetary values into numerical data  
✔ Analyse how question topics have changed over time  
✔ Identify the most common categories in each round 

## 1️⃣ Importing Libraries & Loading the Data

In [8]:
import pandas as pd
pd.set_option('display.max_colwidth', None)

# Loading the data and investigating it
jeopardy = pd.read_csv('jeopardy.csv')
print(jeopardy.columns)

# Renaming columns to remove leading spaces
jeopardy = jeopardy.rename(columns = {" Air Date": "Air Date", " Round" : "Round", " Category": "Category", " Value": "Value", " Question":"Question", " Answer": "Answer"})

# Confirming changes
print(jeopardy.columns)
print(jeopardy["Question"])

Index(['Show Number', ' Air Date', ' Round', ' Category', ' Value',
       ' Question', ' Answer'],
      dtype='object')
Index(['Show Number', 'Air Date', 'Round', 'Category', 'Value', 'Question',
       'Answer'],
      dtype='object')
0                               For the last 8 years of his life, Galileo was under house arrest for espousing this man's theory
1                    No. 2: 1912 Olympian; football star at Carlisle Indian School; 6 MLB seasons with the Reds, Giants & Braves
2                                       The city of Yuma in this state has a record average of 4,055 hours of sunshine each year
3                                           In 1963, live on "The Art Linkletter Show", this company served its billionth burger
4                       Signer of the Dec. of Indep., framer of the Constitution of Mass., second President of the United States
                                                                   ...                                               

## 2️⃣ Data Filtering: Finding Questions with Specific Words

In [11]:
# Function to filter questions that contain specific words
def filter_data(data, words):
    filter = lambda x: all(word in x for word in words)
    return data.loc[data['Question'].apply(filter)]

# Testing
filtered = filter_data(jeopardy, ['King', 'England'])
print(filtered['Question'])

4953                                                                                                                                                                                                                                                                      Both England's King George V & FDR put their stamp of approval on this "King of Hobbies"
14912                                                                                                                                                                                                                                                            This country's King Louis IV was nicknamed "Louis From Overseas" because he was raised in England
21511                                                                                                                                                                                                                                                                                 this man and

In [13]:
# Improving the function
# version 1 - Lowercases all words in the list of words as well as the questions BUT no longer find words like "England's"
def filter_data(data, words):
    filter = lambda x: all(word.lower() in x.lower() for word in words)
    return data.loc[data['Question'].apply(filter)]

# Testing
filtered = filter_data(jeopardy, ['King', 'England'])
print(filtered['Question'])


# version 2 - - lowercases all words in the list of words, doesn't find substrings, ensures NaN values don't break the function
def filter_data_2(data, words):
    filter_2 = lambda x: all(f" {word.lower()} " in f" {str(x).lower()} " for word in words)
    return data.loc[data["Question"].apply(filter_2)]

# Testing 
filtered_2 = filter_data_2(jeopardy, ["King", "England"])
print(filtered_2["Question"])

4953                    Both England's King George V & FDR put their stamp of approval on this "King of Hobbies"
6337      In retaliation for Viking raids, this "Unready" king of England attacks Norse areas of the Isle of Man
9191                    This king of England beat the odds to trounce the French in the 1415 Battle of Agincourt
11710               This Scotsman, the first Stuart king of England, was called "The Wisest Fool in Christendom"
13454                                       It's the number that followed the last king of England named William
                                                           ...                                                  
208295        In 1066 this great-great grandson of Rollo made what some call the last Viking invasion of England
208742                      Dutch-born king who ruled England jointly with Mary II & is a tasty New Zealand fish
213870                In 1781 William Herschel discovered Uranus & initially named it after this

## 3️⃣ Data Cleaning: Converting Monetary Values to Floats

In [16]:
# Function to clean and convert "Value" column to float
def convert_float(data):
    converter = lambda x: str(x).replace(",", "").replace("no value", "0.0").replace("$", "") # Cleaning values for conversion
    data["Value_converted"] = data["Value"].apply(converter).astype(float) # Converting to float
    
    return data

df_converted = convert_float(jeopardy)
print(df_converted[["Value", "Value_converted"]].head(10))  # Check before & after

  Value  Value_converted
0  $200            200.0
1  $200            200.0
2  $200            200.0
3  $200            200.0
4  $200            200.0
5  $200            200.0
6  $400            400.0
7  $400            400.0
8  $400            400.0
9  $400            400.0


## 4️⃣ Finding the Average Value of Questions Containing Specific Words

In [19]:
# Function to calculate the average value of questions containing specific words
def find_avg_value_words(data, words):
    filtered_data = filter_data_2(data, words) # Filtering the dataset
    return filtered_data["Value_converted"].mean() # Calculating the average

avg_value = find_avg_value_words(jeopardy, ["King"])
print(avg_value)

805.4698795180723


## 5️⃣ Counting Unique Answers to Filtered Questions

In [22]:
# Function to count unique answers for filtered questions
def count_unique_answers(data, words):
    filtered_data = filter_data_2(data, words)
    return filtered_data['Answer'].value_counts()

unique_answers = count_unique_answers(jeopardy, ['king'])
print(unique_answers)

# Finding the most common answer:
most_common_asnwer = unique_answers.idxmax()
print(f"The most common answer is: {most_common_asnwer}.")

Answer
Henry VIII                           41
Sweden                               24
Solomon                              23
Norway                               22
Richard III                          21
                                     ..
Tory                                  1
Naomi Watts Riots                     1
Bad, Bad Leroy Brown                  1
Elephants                             1
a pyramid (the pyramids accepted)     1
Name: count, Length: 1165, dtype: int64
The most common answer is: Henry VIII.


## 6️⃣ Analysing How Questions Change Over Time
**NOTE**: There are different ways of doing this. one would be to check if the string (since data in that column is a string) contains specific numbers to filter thorught the dates. This approach is slightly different.

In [25]:
print(jeopardy.dtypes) # checking data types

# Filtering questions that contain 'Computer'
filtered_data = filter_data_2(jeopardy, ["Computer"]).copy() # copy to avoid modifying the original

# Converting 'Air Date' values to datetime
filtered_data["Air Date"] = pd.to_datetime(filtered_data["Air Date"])

# Extracting year as a integer from date
filtered_data["Year"] = filtered_data["Air Date"].dt.year  # Creates a new column 'Year'

# Filtering by decade (90s and 2000s)
data_90s = filtered_data[(filtered_data["Year"] >= 1900) & (filtered_data["Year"] <= 1999)]
data_2000s = filtered_data[(filtered_data["Year"] >= 2000) & (filtered_data["Year"] <= 2009)]

# Counting the number of questions in each period (.shape[0] returns the number of rows)
count_90s = data_90s.shape[0]
count_2000s = data_2000s.shape[0]

# Printing the reults
print(f"Questions in the 90s that contain 'Computer': {count_90s}")
print(f"Questions in the 2000s that contain 'Computer': {count_2000s}")

Show Number          int64
Air Date            object
Round               object
Category            object
Value               object
Question            object
Answer              object
Value_converted    float64
dtype: object
Questions in the 90s that contain 'Computer': 71
Questions in the 2000s that contain 'Computer': 197


## 7️⃣ Investigating Category Trends by Round

In [28]:
# Function to count how many times a category appears in each round
def count_category_in_rounds(data, category_name):
    #option one: (.groupby() and .size())
    #grouped_data = data.groupby(['Round', 'Category']).size().reset_index(name="Count") 

    #option two: (.value_counts())
    category_counts = data[['Round', 'Category']].value_counts().reset_index(name="Count") 

    # Converting category names to lowercase and removing spaces to ensure a match
    category_counts["Category"] = category_counts['Category'].str.strip().str.lower()
    category_name = category_name.strip().lower()

    # Filtering for specific category
    filtered_counts = category_counts[category_counts["Category"] == category_name]

    return filtered_counts
    
# Testing 
literature_counts = count_category_in_rounds(jeopardy, "Literature")
print(literature_counts)

science_counts = count_category_in_rounds(jeopardy, "Science")
print(science_counts)

# Finding the most common categories per round:
most_common_per_round = jeopardy[['Round', 'Category']].value_counts().reset_index(name="Count")
print(most_common_per_round.head(10))

                 Round    Category  Count
1     Double Jeopardy!  literature    381
95           Jeopardy!  literature    105
2724   Final Jeopardy!  literature     10
                 Round Category  Count
2     Double Jeopardy!  science    296
13           Jeopardy!  science    217
4499   Final Jeopardy!  science      6
              Round         Category  Count
0  Double Jeopardy!   BEFORE & AFTER    450
1  Double Jeopardy!       LITERATURE    381
2  Double Jeopardy!          SCIENCE    296
3         Jeopardy!   STUPID ANSWERS    255
4         Jeopardy!        POTPOURRI    255
5  Double Jeopardy!  WORLD GEOGRAPHY    254
6         Jeopardy!           SPORTS    253
7  Double Jeopardy!            OPERA    250
8  Double Jeopardy!    WORLD HISTORY    237
9         Jeopardy!          ANIMALS    233


## 🎯 Summary of Findings
- The most common Jeopardy! category is **"BEFORE & AFTER"** in Double Jeopardy.
- **Literature** and **Science** appear far more often in **Double Jeopardy! than Jeopardy!**.
- The word **"Computer"** appeared more in Jeopardy! questions in **the 2000s than in the 90s**.
- **"Stupid Answers"** is one of **the most frequent** categories in Jeopardy! 