##  Winning Jeopardy

Jeopardy is a popular TV show in the US where participants answer questions to win money. It's been running for many years, and is a major force in popular culture. Imagine that you want to compete on Jeopardy, and you're looking for any way to win. In this project, you'll work with a dataset of Jeopardy questions to figure out some patterns in the questions that could help you win.

The dataset used can be found [here](https://www.reddit.com/r/datasets/comments/1uyd0t/200000_jeopardy_questions_in_a_json_file) 

In [41]:
import pandas as pd
import numpy as np
import sys
import string

In [2]:
#read in csv file with data and print first rows
jeopardy = pd.read_csv('jeopardy.csv')
jeopardy.head()

Unnamed: 0,Show Number,Air Date,Round,Category,Value,Question,Answer
0,4680,2004-12-31,Jeopardy!,HISTORY,$200,"For the last 8 years of his life, Galileo was ...",Copernicus
1,4680,2004-12-31,Jeopardy!,ESPN's TOP 10 ALL-TIME ATHLETES,$200,No. 2: 1912 Olympian; football star at Carlisl...,Jim Thorpe
2,4680,2004-12-31,Jeopardy!,EVERYBODY TALKS ABOUT IT...,$200,The city of Yuma in this state has a record av...,Arizona
3,4680,2004-12-31,Jeopardy!,THE COMPANY LINE,$200,"In 1963, live on ""The Art Linkletter Show"", th...",McDonald's
4,4680,2004-12-31,Jeopardy!,EPITAPHS & TRIBUTES,$200,"Signer of the Dec. of Indep., framer of the Co...",John Adams


In [38]:
#remove whitespaces from column names
cols = [col.replace(' ', '') for col in jeopardy.columns.values.tolist()]
jeopardy.columns = cols
jeopardy.columns.values

array(['ShowNumber', 'AirDate', 'Round', 'Category', 'Value', 'Question',
       'Answer'], dtype=object)

In [55]:
#function to remove punctuation and convert string to lowercase only 
def normalize(s):
    return (''.join(word.strip(string.punctuation) for word in s)).lower()

In [57]:
#new columns using function defined above
jeopardy['clean_question'] = jeopardy.Question.apply(normalize)
jeopardy['clean_answer'] = jeopardy.Answer.apply(normalize)
jeopardy.head()


Unnamed: 0,ShowNumber,AirDate,Round,Category,Value,Question,Answer,clean_question,clean_answer
0,4680,2004-12-31,Jeopardy!,HISTORY,$200,"For the last 8 years of his life, Galileo was ...",Copernicus,for the last 8 years of his life galileo was u...,copernicus
1,4680,2004-12-31,Jeopardy!,ESPN's TOP 10 ALL-TIME ATHLETES,$200,No. 2: 1912 Olympian; football star at Carlisl...,Jim Thorpe,no 2 1912 olympian football star at carlisle i...,jim thorpe
2,4680,2004-12-31,Jeopardy!,EVERYBODY TALKS ABOUT IT...,$200,The city of Yuma in this state has a record av...,Arizona,the city of yuma in this state has a record av...,arizona
3,4680,2004-12-31,Jeopardy!,THE COMPANY LINE,$200,"In 1963, live on ""The Art Linkletter Show"", th...",McDonald's,in 1963 live on the art linkletter show this c...,mcdonalds
4,4680,2004-12-31,Jeopardy!,EPITAPHS & TRIBUTES,$200,"Signer of the Dec. of Indep., framer of the Co...",John Adams,signer of the dec of indep framer of the const...,john adams


Normalize function working as expected. We will now normalize the dollar column.

In [66]:
#function to remove punctuation and convert string to integer  
def normalize_dollars(dollars):
    dollars = ''.join(word.strip(string.punctuation) for word in dollars)
    try:
        return int(dollars)
    except:
        return 0

In [70]:
#new column for normalized value column using normalize_dollars()
jeopardy['clean_value'] = jeopardy.Value.apply(normalize_dollars)
jeopardy['clean_value'].head()

0    200
1    200
2    200
3    200
4    200
Name: clean_value, dtype: int64

In [87]:
def count_sth(series):
    match_count = 0
    split_answer = series.clean_answer.split()
    split_question = series.clean_question.split()
    if 'the' in split_answer:
        split_answer.remove('the')
    if len(split_answer) == 0:
        return 0
    for item in split_answer:
        if item in split_question:
            match_count +=1
    return match_count/len(split_answer)

In [88]:
jeopardy['answer_in_question'] = jeopardy.apply(count_sth, axis=1)
jeopardy['answer_in_question'].mean()

0.058861482035140716

In [73]:
sen = "Hello, world. I'm a boy, you're a girl."
split = sen.split()
split

['Hello,', 'world.', "I'm", 'a', 'boy,', "you're", 'a', 'girl.']

In [77]:
'boy,' in split

True

In [59]:
dollar1 = '200$'
dollar2 = '0$'

In [64]:
dollar1 = ''.join(word.strip(string.punctuation) for word in dollar2)
dollar2 = ''.join(word.strip(string.punctuation) for word in dollar2)

In [65]:
int(dollar1)

0

In [47]:
string.punctuation

'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

In [None]:
' '.join(word.strip(string.punctuation) for word in "Hello, world. I'm a boy, you're a girl.".split())