# This is Jeopardy!

## Introduction

#### Project Goals

This project will work to write several functions that investigate a dataset of _Jeopardy!_ questions and answers. Filter the dataset for topics with keywords, compute the average difficulty of those questions, and train to become the next Jeopardy champion!

## Data

### Input Python Modules
First, import the primary modules that will be used in this project.

In [1]:
import pandas as pd

### Load the data

The column names have an inconsistent format. Rename the columns for easier use throughout the analysis.

In [2]:
# Display full contents of a column 
pd.set_option('display.max_colwidth', None)

jeopardy = pd.read_csv('jeopardy.csv')
jeopardy.rename(columns = {
    jeopardy.columns[0]: 'show_number',
    jeopardy.columns[1]: 'date', 
    jeopardy.columns[2]: 'round',
    jeopardy.columns[3]: 'category',
    jeopardy.columns[4]: 'value',
    jeopardy.columns[5]: 'question',
    jeopardy.columns[6]: 'answer'}, inplace = True)
jeopardy.reset_index(inplace = True, drop = True)
print(jeopardy.head())

   show_number        date      round                         category value  \
0         4680  2004-12-31  Jeopardy!                          HISTORY  $200   
1         4680  2004-12-31  Jeopardy!  ESPN's TOP 10 ALL-TIME ATHLETES  $200   
2         4680  2004-12-31  Jeopardy!      EVERYBODY TALKS ABOUT IT...  $200   
3         4680  2004-12-31  Jeopardy!                 THE COMPANY LINE  $200   
4         4680  2004-12-31  Jeopardy!              EPITAPHS & TRIBUTES  $200   

                                                                                                      question  \
0             For the last 8 years of his life, Galileo was under house arrest for espousing this man's theory   
1  No. 2: 1912 Olympian; football star at Carlisle Indian School; 6 MLB seasons with the Reds, Giants & Braves   
2                     The city of Yuma in this state has a record average of 4,055 hours of sunshine each year   
3                         In 1963, live on "The Art Linkletter 

Write a function that filters the dataset for questions that contains all of the words in a list of words. 

This function will take a list of keywords and return the `"question"` containing the particular words. For example, when the list `["King", "England"]` was passed to the function, it will return a DataFrame of rows. Every row had the strings `"King"` and `"England"` somewhere in its `"question"`.

In [3]:
def filter_data(data, words):
    # lowercase the list of word as well as the words in the question column
    # all function takes iterable input and return True if all the elements are true
    # for each word in the list of words, 
    # it iterates whether each word is in the question, in this case 'x' stores the question
    # Compare the 'word' to 'x', it returns True if 'x' contains all the 'word'
    filter = lambda x: all(word.lower() in x.lower() for word in words)

    # applies to lambda function to the question columns
    # returns the rows where the function returns True
    return data.loc[data.question.apply(filter)]

filter_data(jeopardy, ['King', 'England'])

Unnamed: 0,show_number,date,round,category,value,question,answer
4953,3003,1997-09-24,Double Jeopardy!,"""PH""UN WORDS",$200,"Both England's King George V & FDR put their stamp of approval on this ""King of Hobbies""",Philately (stamp collecting)
6337,3517,1999-12-14,Double Jeopardy!,Y1K,$800,"In retaliation for Viking raids, this ""Unready"" king of England attacks Norse areas of the Isle of Man",Ethelred
9191,3907,2001-09-04,Double Jeopardy!,WON THE BATTLE,$800,This king of England beat the odds to trounce the French in the 1415 Battle of Agincourt,Henry V
11710,2903,1997-03-26,Double Jeopardy!,BRITISH MONARCHS,$600,"This Scotsman, the first Stuart king of England, was called ""The Wisest Fool in Christendom""",James I
13454,4726,2005-03-07,Jeopardy!,A NUMBER FROM 1 TO 10,$1000,It's the number that followed the last king of England named William,4
...,...,...,...,...,...,...,...
208295,4621,2004-10-11,Jeopardy!,THE VIKINGS,$600,In 1066 this great-great grandson of Rollo made what some call the last Viking invasion of England,William the Conqueror
208742,4863,2005-11-02,Double Jeopardy!,BEFORE & AFTER,"$3,000",Dutch-born king who ruled England jointly with Mary II & is a tasty New Zealand fish,William of Orange roughy
213870,5856,2010-02-15,Double Jeopardy!,URANUS,$1600,In 1781 William Herschel discovered Uranus & initially named it after this king of England,George III
216021,1881,1992-11-09,Double Jeopardy!,HISTORIC NAMES,$1000,"His nickname was ""Bertie"", but he used this name & number when he became king of England in 1901",Edward VII


Test the function with a few different sets of words to try to find some ways that the function breaks. Then, edit the function so it is more robust.

In [4]:
print(filter_data(jeopardy, ['King']))
print(filter_data(jeopardy, ["England's"]))
print(filter_data(jeopardy, ['The', 'King']))
print(filter_data(jeopardy, ['word', 'King']))
# Need to find a way to find the exact 
# matching list of words in the question.

        show_number        date             round                    category  \
34             4680  2004-12-31  Double Jeopardy!                 "X"s & "O"s   
40             4680  2004-12-31  Double Jeopardy!  DR. SEUSS AT THE MULTIPLEX   
50             4680  2004-12-31  Double Jeopardy!  DR. SEUSS AT THE MULTIPLEX   
56             5957  2010-07-06         Jeopardy!               GEOGRAPHY "E"   
72             5957  2010-07-06         Jeopardy!                LET'S BOUNCE   
...             ...         ...               ...                         ...   
216777         5070  2006-09-29  Double Jeopardy!             ANCIENT HISTORY   
216787         5070  2006-09-29  Double Jeopardy!    TALES OF E.T.A. HOFFMANN   
216789         5070  2006-09-29  Double Jeopardy!             ANCIENT HISTORY   
216856         5195  2007-03-23  Double Jeopardy!            HAIL TO THE CHEF   
216916         4999  2006-05-11  Double Jeopardy!                  QUOTATIONS   

         value  \
34       

        show_number        date             round                  category  \
935            3834  2001-04-12  Double Jeopardy!          SCIENCE & NATURE   
3264           5084  2006-10-19  Double Jeopardy!           FROM THE FRENCH   
5993           5935  2010-06-04         Jeopardy!          & CROWN THY GOOD   
7385           3255  1998-10-30  Double Jeopardy!          CREATION STORIES   
7793           4146  2002-09-16         Jeopardy!             KIDS IN BOOKS   
...             ...         ...               ...                       ...   
212026          315  1985-11-22  Double Jeopardy!                      LOVE   
212227         3610  2000-04-21         Jeopardy!             THE U.S. MINT   
214693         5619  2009-01-29         Jeopardy!                    EXODUS   
216787         5070  2006-09-29  Double Jeopardy!  TALES OF E.T.A. HOFFMANN   
216916         4999  2006-05-11  Double Jeopardy!                QUOTATIONS   

         value  \
935       $800   
3264      $800 

Prior to compute aggregate statistics on the `"value"` column, it is necessary to convert the `"value"` column to floats. 

In [5]:
# Create a new column and cut off the first character of the value output which is the dollar sign
# remove the ',' from the value, replace 'no value' as a 0

jeopardy['float_value'] = jeopardy.value.apply(lambda x: float(x[1:].replace(',', '')) if x != 'no value' else 0)

# Filter the dataset and find the average value of those questions
filtered = filter_data(jeopardy, 'King')
print(round(filtered['float_value'].mean(), 2))

777.49


Now, write a function that returns the count of unique answers to all of the questions in a dataset. 

Use the word `"King"` to test the function. In particular, after filtering the entire dataset to only questions containing the word `"King"`, find all of the unique answers to those questions. It seems the answer "Japan" appeared 68 times while "Sweden" occured 62 times.

In [6]:
def unique_answer(data, words):
    filtered = filter_data(data, words)
    return filtered['answer'].value_counts()

unique_answer(jeopardy, 'King')

Japan                    68
Sweden                   62
Henry VIII               61
Australia                55
India                    55
                         ..
Albrecht Durer            1
The Departed              1
bow (or sway or rock)     1
a fulcrum                 1
Stomp                     1
Name: answer, Length: 37687, dtype: int64

In [7]:
def get_unique_answers(data):
    return data['answer'].value_counts()

print(get_unique_answers(filter_data(jeopardy, 'King')))

Japan                    68
Sweden                   62
Henry VIII               61
Australia                55
India                    55
                         ..
Albrecht Durer            1
The Departed              1
bow (or sway or rock)     1
a fulcrum                 1
Stomp                     1
Name: answer, Length: 37687, dtype: int64


## Conclusion

This project was able to perform some data manipulation such as cleaning the dataset by renaming the colums, removing partcular symbols. Also, it produces a few function to filter specific words 

7. Compare your program to our <a href="https://content.codecademy.com/PRO/independent-practice-projects/jeopardy/jeopardy_solution.zip">sample solution code</a> - remember, that your program might look different from ours (and probably will) and that's okay!

8. Great work! Visit <a href="https://discuss.codecademy.com/t/this-is-jeopardy-challenge-project-python-pandas/462365">our forums</a> to compare your project to our sample solution code. You can also learn how to host your own solution on GitHub so you can share it with other learners! Your solution might look different from ours, and that's okay! There are multiple ways to solve these projects, and you'll learn more by seeing others' code.