# This is Jeopardy!

#### Overview

This project is slightly different than others you have encountered thus far. Instead of a step-by-step tutorial, this project contains a series of open-ended requirements which describe the project you'll be building. There are many possible ways to correctly fulfill all of these requirements, and you should expect to use the internet, Codecademy, and/or other resources when you encounter a problem that you cannot easily solve.

#### Project Goals

You will work to write several functions that investigate a dataset of _Jeopardy!_ questions and answers. Filter the dataset for topics that you're interested in, compute the average difficulty of those questions, and train to become the next Jeopardy champion!

## Prerequisites

In order to complete this project, you should have completed the Pandas lessons in the <a href="https://www.codecademy.com/learn/paths/analyze-data-with-python">Analyze Data with Python Skill Path</a>. You can also find those lessons in the <a href="https://www.codecademy.com/learn/data-processing-pandas">Data Analysis with Pandas course</a> or the <a href="https://www.codecademy.com/learn/paths/data-science/">Data Scientist Career Path</a>.

Finally, the <a href="https://www.codecademy.com/learn/practical-data-cleaning">Practical Data Cleaning</a> course may also be helpful.

## Project Requirements

1. We've provided a csv file containing data about the game show _Jeopardy!_ in a file named `jeopardy.csv`. Load the data into a DataFrame and investigate its contents. Try to print out specific columns.

   Note that in order to make this project as "real-world" as possible, we haven't modified the data at all - we're giving it to you exactly how we found it. As a result, this data isn't as "clean" as the datasets you normally find on Codecademy. More specifically, there's something odd about the column names. After you figure out the problem with the column names, you may want to rename them to make your life easier for the rest of the project.
   
   In order to display the full contents of a column, we've added this line of code for you:
   
   ```py
   pd.set_option('display.max_colwidth', None)
   ```

In [2]:
import pandas as pd
pd.set_option('display.max_colwidth', None)

# Loading jeopardy data into pandas DataFrame
jeopardy_data = pd.read_csv('jeopardy.csv')
print(jeopardy_data.columns)

# Renaming Columns 
jeopardy_data = jeopardy_data.rename(columns = {" Air Date": "Air Date", " Round": "Round", " Category": "Category", 
" Value": "Value", " Question": "Question", " Answer": "Answer"})
print(jeopardy_data.columns)


Index(['Show Number', ' Air Date', ' Round', ' Category', ' Value',
       ' Question', ' Answer'],
      dtype='object')
Index(['Show Number', 'Air Date', 'Round', 'Category', 'Value', 'Question',
       'Answer'],
      dtype='object')


2. Write a function that filters the dataset for questions that contains all of the words in a list of words. For example, when the list `["King", "England"]` was passed to our function, the function returned a DataFrame of 49 rows. Every row had the strings `"King"` and `"England"` somewhere in its `" Question"`.

   Test your function by printing out the column containing the question of each row of the dataset.

In [3]:
# Filtering a dataset by a list of words
def filter_data(data, words):
  filter = lambda x: all(word in x for word in words)
  return data.loc[data["Question"].apply(filter)]

# Testing the filter function
filtered = filter_data(jeopardy_data, ["King", "England"])
print(filtered["Question"])

4953                                                                                                                                                                                                                                                                      Both England's King George V & FDR put their stamp of approval on this "King of Hobbies"
14912                                                                                                                                                                                                                                                            This country's King Louis IV was nicknamed "Louis From Overseas" because he was raised in England
21511                                                                                                                                                                                                                                                                                 this man and

3. Test your original function with a few different sets of words to try to find some ways your function breaks. Edit your function so it is more robust.

   For example, think about capitalization. We probably want to find questions that contain the word `"King"` or `"king"`.
   
   You may also want to check to make sure you don't find rows that contain substrings of your given words. For example, our function found a question that didn't contain the word `"king"`, however it did contain the word `"viking"` &mdash; it found the `"king"` inside `"viking"`. Note that this also comes with some drawbacks &mdash; you would no longer find questions that contained words like `"England's"`.

In [4]:
# Filtering a dataset by a list of words
def filter_data2(data, words):
  # Lowercases all words in the list of words as well as the questions. Returns true if all of the words in the list appear in the question.
  filter = lambda x: all(word.lower() in x.lower() for word in words)
  # Applies the lambda function to the Question column and returns the rows where the function returned True
  return data.loc[data["Question"].apply(filter)]

# Testing the filter function
filtered = filter_data2(jeopardy_data, ["King", "England"])
print(filtered["Question"])

4953                    Both England's King George V & FDR put their stamp of approval on this "King of Hobbies"
6337      In retaliation for Viking raids, this "Unready" king of England attacks Norse areas of the Isle of Man
9191                    This king of England beat the odds to trounce the French in the 1415 Battle of Agincourt
11710               This Scotsman, the first Stuart king of England, was called "The Wisest Fool in Christendom"
13454                                       It's the number that followed the last king of England named William
                                                           ...                                                  
208295        In 1066 this great-great grandson of Rollo made what some call the last Viking invasion of England
208742                      Dutch-born king who ruled England jointly with Mary II & is a tasty New Zealand fish
213870                In 1781 William Herschel discovered Uranus & initially named it after this

4. We may want to eventually compute aggregate statistics, like `.mean()` on the `" Value"` column. But right now, the values in that column are strings. Convert the`" Value"` column to floats. If you'd like to, you can create a new column with float values.

   While most of the values in the `" Value"` column represent a dollar amount as a string, note that some do not &mdash; these values will need to be handled differently!

   Now that you can filter the dataset of question, use your new column that contains the float values of each question to find the "difficulty" of certain topics. For example, what is the average value of questions that contain the word `"King"`?
   
   Make sure to use the dataset that contains the float values as the dataset you use in your filtering function.

In [5]:
# Adding a new column. If the value of the float column is not "None", then we cut off the first character (which is a dollar sign), and replace all commas with nothing, and then cast that value to a float. 
# If the answer was "None", then we just enter a 0.
jeopardy_data["Float Value"] = jeopardy_data["Value"].apply(lambda x: float(x[1:].replace(',','')) if x != "no value" else 0)

# Filtering the dataset and finding the average value of those questions
filtered = filter_data2(jeopardy_data, ["King"])
print(filtered["Float Value"].mean())

771.8833850722094


5. Write a function that returns the count of unique answers to all of the questions in a dataset. For example, after filtering the entire dataset to only questions containing the word `"King"`, we could then find all of the unique answers to those questions. The answer "Henry VIII" appeared 55 times and was the most common answer.

In [6]:
# A function to find the unique answers of a set of data
def get_answer_counts(data):
    return data["Answer"].value_counts()

# Testing the answer count function
print(get_answer_counts(filtered))

Answer
Henry VIII                   55
Solomon                      35
Richard III                  33
Louis XIV                    31
David                        30
                             ..
cardiac (in card I acted)     1
Henderson                     1
Computer                      1
Indians                       1
work                          1
Name: count, Length: 5268, dtype: int64


6. Explore from here! This is an incredibly rich dataset, and there are so many interesting things to discover. There are a few columns that we haven't even started looking at yet. Here are some ideas on ways to continue working with this data:

 * Investigate the ways in which questions change over time by filtering by the date. How many questions from the 90s use the word `"Computer"` compared to questions from the 2000s?
 * Is there a connection between the round and the category? Are you more likely to find certain categories, like `"Literature"` in Single Jeopardy or Double Jeopardy?
 * Build a system to quiz yourself. Grab random questions, and use the <a href="https://docs.python.org/3/library/functions.html#input">input</a> function to get a response from the user. Check to see if that response was right or wrong.

7. Investigate the ways in which questions change over time by filtering by the date. How many questions from the 90s use the word `"Computer"` compared to questions from the 2000s?

In [None]:
# Converting Air Date into datetime data type
jeopardy_data['Air Date'] = pd.to_datetime(jeopardy_data['Air Date'])

# Filtering jeopary_data for questions containing "Computer"
computer_questions = filter_data2(jeopardy_data, ['Computer'])

# Filter for 1990s 
computer_90s = computer_questions[(computer_questions['Air Date'] >= '1990-01-01') & (computer_questions['Air Date'] < '2000-01-01')]
# Filter for 2000s 
computer_2000s = computer_questions[(computer_questions['Air Date'] >= '2000-01-01') & (computer_questions['Air Date'] < '2010-01-01')]

print(f'Computer questions in 1990s: {len(computer_90s)}')
print(f'Computer questions in 2000s: {len(computer_2000s)}')

# Percentage Change 
if len(computer_90s) > 0:
    change = (len(computer_2000s) - len(computer_90s) / len(computer_90s) * 100)
print(f'Percentage Change from 90s to 2000s: {change: .1f}%')

Computer questions in 1990s: 98
Computer questions in 2000s: 268
Percentage Change from 90s to 2000s:  168.0%


8. Is there a connection between the round and the category? Are you more likely to find certain categories, like `"Literature"` in Single Jeopardy or Double Jeopardy?

In [18]:
literature_questions = jeopardy_data[jeopardy_data['Category'].str.contains("LITERATURE", case=False, na=False)]
literature_by_round = literature_questions['Round'].value_counts()

# Calculate percentages
literature_percentages = literature_questions['Round'].value_counts(normalize=True) * 100
print(f"\nLiterature questions by round (percentages):")
print(literature_percentages)

# Let's do a more comprehensive analysis for all categories
print(f"\nOverall distribution of rounds:")
overall_rounds = jeopardy_data['Round'].value_counts()
print(overall_rounds)

# Create a crosstab to see the relationship
category_round_crosstab = pd.crosstab(jeopardy_data['Category'], jeopardy_data['Round'])
print(f"\nTop 10 categories by total questions:")
top_categories = jeopardy_data['Category'].value_counts().head(10)
print(top_categories)



Literature questions by round (percentages):
Round
Double Jeopardy!    67.607441
Jeopardy!           27.132777
Final Jeopardy!      5.259782
Name: proportion, dtype: float64

Overall distribution of rounds:
Round
Jeopardy!           107384
Double Jeopardy!    105912
Final Jeopardy!       3631
Tiebreaker               3
Name: count, dtype: int64

Top 10 categories by total questions:
Category
BEFORE & AFTER             547
SCIENCE                    519
LITERATURE                 496
AMERICAN HISTORY           418
POTPOURRI                  401
WORLD HISTORY              377
WORD ORIGINS               371
COLLEGES & UNIVERSITIES    351
HISTORY                    349
SPORTS                     342
Name: count, dtype: int64


8. Build a system to quiz yourself. Grab random questions, and use the <a href="https://docs.python.org/3/library/functions.html#input">input</a> function to get a response from the user. Check to see if that response was right or wrong.

In [19]:
# Building a System to quiz myself 
    # I need to write a function that is able to pick a random question from the jeopardy_data and output that
    # Ask the user for an input based on the question asked 
    # Check if the user's input is right or wrong by checking against the Question's answer 
    # Output whether the user got it right or wrong; if they did print out 'That is correct. You Won {"Float Value"}!'

import random 
def jeopardy_quiz():
    """
    A simple Jeopardy quiz system that asks random questions
    """
    print("Welcome to the Jeopardy Quiz!")
    print("Type 'quit' to exit at any time.\n")
    
    score = 0
    total_questions = 0
    
    while True:
        # Get a random question
        random_row = jeopardy_data.sample(1).iloc[0]
        question = random_row['Question']
        correct_answer = random_row['Answer']
        category = random_row['Category']
        value = random_row['Float Value']
        
        print(f"Category: {category}")
        print(f"Value: ${value:.0f}")
        print(f"Question: {question}")
        
        # Get user input
        user_answer = input("\nYour answer: ").strip()
        
        if user_answer.lower() == 'quit':
            break
            
        # Check if answer is correct (case-insensitive)
        if user_answer.lower() == correct_answer.lower():
            print(f"Correct! You earned ${value:.0f}")
            score += value
        else:
            print(f"Sorry, the correct answer was: {correct_answer}")
        
        total_questions += 1
        print(f"Current score: ${score:.0f}")
        print("-" * 50)
        
        # Ask if they want to continue
        continue_game = input("Continue? (y/n): ").strip().lower()
        if continue_game != 'y':
            break
    
    print(f"\nGame Over!")
    print(f"Final Score: ${score:.0f}")
    print(f"Questions answered: {total_questions}")
    if total_questions > 0:
        print(f"Average per question: ${score/total_questions:.0f}")

jeopardy_quiz()




Welcome to the Jeopardy Quiz!
Type 'quit' to exit at any time.

Category: PARADISE LOST
Value: $800
Question: The poem describes this Biblical beast as "hugest of living creatures, on the deep stretcht like a promontorie"


Sorry, the correct answer was: Leviathan
Current score: $0
--------------------------------------------------

Game Over!
Final Score: $0
Questions answered: 1
Average per question: $0


More Sophisticated Quiz 

In [22]:
# Alternative: More sophisticated answer checking
def check_answer_similarity(user_answer, correct_answer):
    """
    Check if answers are similar (handles common variations)
    """
    import re
    
    # Clean both answers
    def clean_answer(answer):
        # Remove common prefixes/suffixes
        answer = re.sub(r'^(what is|who is|what are|who are)\s*', '', answer.lower())
        answer = re.sub(r'^(a|an|the)\s*', '', answer)
        # Remove punctuation and extra spaces
        answer = re.sub(r'[^\w\s]', '', answer)
        return answer.strip()
    
    user_clean = clean_answer(user_answer)
    correct_clean = clean_answer(correct_answer)
    
    # Check exact match
    if user_clean == correct_clean:
        return True
    
    # Check if one is contained in the other
    if user_clean in correct_clean or correct_clean in user_clean:
        return True
    
    return False

def enhanced_jeopardy_quiz():
    """
    Enhanced quiz with better answer matching
    """
    print("Welcome to the Enhanced Jeopardy Quiz!")
    print("Type 'quit' to exit at any time.")
    print("Tip: You don't need to include 'What is' or 'Who is'\n")
    
    score = 0
    total_questions = 0
    
    while True:
        # Get a random question with a reasonable value (not 0)
        valid_questions = jeopardy_data[jeopardy_data['Float Value'] > 0]
        random_row = valid_questions.sample(1).iloc[0]
        
        question = random_row['Question']
        correct_answer = random_row['Answer']
        category = random_row['Category']
        value = random_row['Float Value']
        
        print(f"Category: {category}")
        print(f"Value: ${value:.0f}")
        print(f"Question: {question}")
        
        user_answer = input("\nYour answer: ").strip()
        
        if user_answer.lower() == 'quit':
            break
            
        # Check answer with similarity function
        if check_answer_similarity(user_answer, correct_answer):
            print(f"Correct! You earned ${value:.0f}")
            score += value
        else:
            print(f"Sorry, the correct answer was: {correct_answer}")
        
        total_questions += 1
        print(f"Current score: ${score:.0f}")
        print("-" * 50)
        
        continue_game = input("Continue? (y/n): ").strip().lower()
        if continue_game != 'y':
            break
    
    print(f"\nGame Over!")
    print(f"Final Score: ${score:.0f}")
    print(f"Questions answered: {total_questions}")
    if total_questions > 0:
        print(f"Average per question: ${score/total_questions:.0f}")

enhanced_jeopardy_quiz()


Welcome to the Enhanced Jeopardy Quiz!
Type 'quit' to exit at any time.
Tip: You don't need to include 'What is' or 'Who is'

Category: MAKES SCENTS TO ME!
Value: $400
Question: Stacked Style makes 2 fragrances that rhyme: Razzle & this one
Correct! You earned $400
Current score: $400
--------------------------------------------------
Category: FILL IN THE HISTORY _____
Value: $1000
Question: Deadly period of the French Revolution: The ____ of Terror
Correct! You earned $1000
Current score: $1400
--------------------------------------------------
Category: THE OLYMPICS
Value: $2000
Question: 1 of 3 new medal sports added to the 1992 Summer Olympics
Sorry, the correct answer was: Badminton, baseball or women's judo
Current score: $1400
--------------------------------------------------
Category: NEXT STOP, VENUS
Value: $400
Question: In 1610 this Italian discovered that Venus has phases like the moon
Correct! You earned $400
Current score: $1800
-----------------------------------------