# Jeopardy Analysis

Date: April 29, 2018

Jeopardy is a popular TV show in the US where participants answer questions to win money. It has been running for a few decades now, is a major force in pop culture in the U.S. and is watched in other parts of the world.

The notebook below is an exercise analysis to help gain an edge to win in the game. Specifically, looking into how often questions can tip you into correct answers and how looking at the frequency of certain words in past questions can guide one's preparation for the contest.

The dataset is named jeopardy.csv, and contains 20,000 rows from the beginning of a full dataset of Jeopardy questions, which can be downloaded from https://www.reddit.com/r/datasets/comments/1uyd0t/200000_jeopardy_questions_in_a_json_file/

In [2]:
import pandas as pd

In [3]:
jeopardy = pd.read_csv('jeopardy.csv')

In [4]:
jeopardy.head()

Unnamed: 0,Show Number,Air Date,Round,Category,Value,Question,Answer
0,4680,2004-12-31,Jeopardy!,HISTORY,$200,"For the last 8 years of his life, Galileo was ...",Copernicus
1,4680,2004-12-31,Jeopardy!,ESPN's TOP 10 ALL-TIME ATHLETES,$200,No. 2: 1912 Olympian; football star at Carlisl...,Jim Thorpe
2,4680,2004-12-31,Jeopardy!,EVERYBODY TALKS ABOUT IT...,$200,The city of Yuma in this state has a record av...,Arizona
3,4680,2004-12-31,Jeopardy!,THE COMPANY LINE,$200,"In 1963, live on ""The Art Linkletter Show"", th...",McDonald's
4,4680,2004-12-31,Jeopardy!,EPITAPHS & TRIBUTES,$200,"Signer of the Dec. of Indep., framer of the Co...",John Adams


In [5]:
jeopardy.columns

Index(['Show Number', ' Air Date', ' Round', ' Category', ' Value',
       ' Question', ' Answer'],
      dtype='object')

In [6]:
jeopardy.rename(columns={'Show Number':'ShowNumber',
                  ' Air Date':'AirDate',
                  ' Round': 'Round',
                  ' Category': 'Category',
                  ' Value': 'Value',
                  ' Question': 'Question',
                  ' Answer': 'Answer'},inplace=True)

In [7]:
jeopardy.columns

Index(['ShowNumber', 'AirDate', 'Round', 'Category', 'Value', 'Question',
       'Answer'],
      dtype='object')

In [8]:
import string
def normtext (text):
    for punctuation in string.punctuation:
        text = text.replace(punctuation, '')
        text = text.lower()
    return text
jeopardy['clean_question']=jeopardy['Question'].apply(normtext)

In [9]:
jeopardy['clean_answer']=jeopardy['Answer'].apply(normtext)

# Housekeeping

This section is to do some data housekeeping - remove punctuations from the Questions and Answer columns, while converting the characters to lower case. And to clean and convert the Value column to a number. And the Air Date column into a datetime format.


In [10]:
def normdollar (text):
    for punctuation in string.punctuation:
        text = text.replace(punctuation, '')
        try: 
            nowint = int(float(text))
        except:
            nowint = 0
    return nowint
jeopardy['clean_value'] = jeopardy['Value'].apply(normdollar)

In [11]:
from datetime import datetime

In [12]:
datetime.strptime('2004-12-31','%Y-%m-%d')

datetime.datetime(2004, 12, 31, 0, 0)

In [13]:
from datetime import datetime
def normdate (text):
    try: 
        nowdt = datetime.strptime(text, '%Y-%m-%d')
    except:
        Print("error")
    return nowdt
jeopardy['clean_airdate'] = jeopardy['AirDate'].apply(normdate)

In [14]:
jeopardy['clean_airdate']

0       2004-12-31
1       2004-12-31
2       2004-12-31
3       2004-12-31
4       2004-12-31
5       2004-12-31
6       2004-12-31
7       2004-12-31
8       2004-12-31
9       2004-12-31
10      2004-12-31
11      2004-12-31
12      2004-12-31
13      2004-12-31
14      2004-12-31
15      2004-12-31
16      2004-12-31
17      2004-12-31
18      2004-12-31
19      2004-12-31
20      2004-12-31
21      2004-12-31
22      2004-12-31
23      2004-12-31
24      2004-12-31
25      2004-12-31
26      2004-12-31
27      2004-12-31
28      2004-12-31
29      2004-12-31
           ...    
19969   2009-05-14
19970   2009-05-14
19971   2009-05-14
19972   2009-05-14
19973   2009-05-14
19974   2009-05-14
19975   2009-05-14
19976   2009-05-14
19977   2009-05-14
19978   2009-05-14
19979   2009-05-14
19980   2009-05-14
19981   2009-05-14
19982   2009-05-14
19983   2009-05-14
19984   2009-05-14
19985   2009-05-14
19986   2009-05-14
19987   2009-05-14
19988   2000-03-14
19989   2000-03-14
19990   2000

# Answers in Questions

Analysis into whether questions can "tip" an answer based on occurences of certain words in questions.

One example - see the occurence of the word "index" in the answer and question.

Answer

['uv', 'index']

Question

['on', 'june', '28', '1994', 'the', 'natl', 'weather', 'service', 'began', 'issuing', 'this', 'index', 'that', 'rates', 'the', 'intensity', 'of', 'the', 'suns', 'radiation']


In [15]:
jeopardy.loc[0]['clean_answer']

'copernicus'

In [16]:
#split() yields a vector
def count_a_in_q(df):
    split_answer = df['clean_answer'].split(' ')
    split_question = df['clean_question'].split(' ')
    #print(split_answer)
    match_count = 0
    try:
        split_answer.remove('the') 
    except ValueError:
        pass
    for a in split_answer:
        #print('*')
        if a in split_question:
            print (split_answer)
            print (split_question)
            match_count +=1
    try:
        return match_count/len(split_answer)
    except:
        return print("error")
        
        

In [17]:
jeopardy['answer_in_question'] = jeopardy.apply(count_a_in_q,axis=1)

['crate', '', 'barrel']
['this', 'housewares', 'store', 'was', 'named', 'for', 'the', 'packaging', 'its', 'merchandise', 'came', 'in', '', 'was', 'first', 'displayed', 'on']
['uv', 'index']
['on', 'june', '28', '1994', 'the', 'natl', 'weather', 'service', 'began', 'issuing', 'this', 'index', 'that', 'rates', 'the', 'intensity', 'of', 'the', 'suns', 'radiation']
['congress', 'party']
['this', 'asian', 'political', 'party', 'was', 'founded', 'in', '1885', 'with', 'indian', 'national', 'as', 'part', 'of', 'its', 'name']
['a', 'kennel']
['it', 'can', 'be', 'a', 'place', 'to', 'leave', 'your', 'puppy', 'when', 'you', 'take', 'a', 'trip', 'or', 'a', 'carrier', 'for', 'him', 'that', 'fits', 'under', 'an', 'airplane', 'seat']
['mystery', 'train']
['during', 'the', '19541955', 'sun', 'sessions', 'elvis', 'climbed', 'aboard', 'this', 'train', 'sixteen', 'coaches', 'long']
['night', 'train']
['in', '1961', 'james', 'brown', 'announced', 'all', 'aboard', 'for', 'this', 'train']
['a', 'dart']
['sma

['the', 'word', 'panic', 'comes', 'from', 'the', 'name', 'of', 'a', 'greek', 'god', 'who', 'was', 'this', 'type', 'of', 'creature']
['a', 'cote']
['its', 'a', 'coop', 'for', 'sheep', 'or', 'pigeons']
['bleak', 'house']
['the', 'prospect', 'of', 'an', 'endless', 'lawsuit', 'winding', 'through', 'generations', 'leaves', 'a', 'bleak', 'vision']
['brigadier', 'general']
['its', 'the', 'rank', 'just', 'below', 'the', 'very', 'model', 'of', 'a', 'modern', 'major', 'general']
['mystery', 'of', 'edwin', 'drood']
['its', 'no', 'mystery', 'why', 'john', 'jasper', 'haunts', 'mehis', 'fingers', 'have', 'knives', 'on', 'them']
['an', 'egg']
['choline', 'may', 'help', 'adult', 'brains', 'grow', 'one', 'of', 'these', 'breakfast', 'items', 'has', 'about', 'a', 'third', 'of', 'your', 'rdawant', 'an', 'omelette']
['galveston', 'bay']
['the', 'houston', 'ship', 'channel', 'flows', 'into', 'this', 'bay', 'that', 'shares', 'its', 'name', 'with', 'a', 'city']
error
['a', 'circuit']
['a', 'hrefhttpwwwjarchiv

['rhode', 'island']
['block', 'island', 'sound', 'separates', 'block', 'island', 'from', 'this', 'tiny', 'states', 'mainland']
['it', 'broke', 'in', 'half']
['this', 'happened', 'between', 'the', 'third', '', 'fourth', 'funnel', 'a', 'fact', 'no', 'one', 'knew', 'until', 'the', 'titanic', 'was', 'found', 'in', '1985']
['butch', 'cassidy', 'and', 'sundance', 'kid']
['1969', 'film', 'in', 'which', 'paul', 'newman', 'tells', 'robert', 'redford', 'boy', 'i', 'got', 'vision', 'and', 'the', 'rest', 'of', 'the', 'world', 'wears', 'bifocals']
['a', 'kilt']
['traditional', 'highland', 'dress', 'includes', 'a', 'wide', 'belt', 'presumably', 'holding', 'up', 'this']
['cat', 'on', 'a', 'hot', 'tin', 'roof']
['newman', 'played', 'brick', 'opposite', 'liz', 'taylors', 'maggie', 'in', 'this', 'film', 'adaptation', 'of', 'a', 'play']
['mutiny', 'on', 'bounty']
['movies', 'shot', 'on', 'catalina', 'include', 'a', 'hrefhttpwwwjarchivecommedia20061012dj27jpg', 'targetblankthisa', '1935', 'oscar', 'winner

['its', 'not', 'unusual']
['its', 'not', 'odd', 'that', 'this', '1965', 'song', 'is', 'heard', 'in', 'the', '1998', 'film', 'little', 'voice']
['its', 'not', 'unusual']
['its', 'not', 'odd', 'that', 'this', '1965', 'song', 'is', 'heard', 'in', 'the', '1998', 'film', 'little', 'voice']
['left', 'bank']
['when', 'lenin', 'moved', 'to', 'paris', 'in', '1908', 'he', 'naturally', 'settled', 'on', 'this', 'bank', 'of', 'the', 'seine']
['son', 'of', 'paleface']
['roy', 'rogers', 'sang', 'buttons', 'and', 'bows', 'with', 'bob', 'hope', '', 'jane', 'russell', 'in', 'this', 'sequel', 'to', 'the', 'paleface']
['cant', 'smile', 'without', 'you']
['having', 'this', 'title', 'problem', 'barry', 'manilow', 'sings', 'i', 'cant', 'laugh', 'and', 'i', 'cant', 'sing', 'im', 'finding', 'it', 'hard', 'to', 'do', 'anything']
['ten', 'plagues', 'of', 'egypt']
['the', 'sixth', 'of', 'these', 'was', 'an', 'outbreak', 'of', 'boils', '', 'sores']
['john', 'milton', 'from', 'paradise', 'lost']
['his', 'pride', 'h

['a', 'bluebird']
['missouri', '', 'not', 'a', 'redbird', 'but', 'this', 'colorful', 'creature']
['a', 'mammoth']
['2syllable', 'name', 'for', 'the', 'longago', 'elephant', 'relative', 'with', '13foot', 'tusks', 'that', 'has', 'become', 'a', 'synonym', 'for', 'huge']
['a', 'bar', 'magnet']
['a', 'hrefhttpwwwjarchivecommedia20070221dj08jpg', 'targetblankjon', 'shows', 'images', 'of', 'early', 'devices', 'on', 'the', 'monitora', 'early', 'electromagnets', 'like', 'joseph', 'henrys', 'in', '1831', 'were', 'horseshoeshaped', 'to', 'double', 'the', 'pulling', 'power', 'of', 'a', 'hrefhttpwwwjarchivecommedia20070221dj08ajpg', 'targetblankthisa', 'type', 'of', 'magnet']
['a', 'bar', 'magnet']
['a', 'hrefhttpwwwjarchivecommedia20070221dj08jpg', 'targetblankjon', 'shows', 'images', 'of', 'early', 'devices', 'on', 'the', 'monitora', 'early', 'electromagnets', 'like', 'joseph', 'henrys', 'in', '1831', 'were', 'horseshoeshaped', 'to', 'double', 'the', 'pulling', 'power', 'of', 'a', 'hrefhttpwwwjar

['besides', 'xy', '', 'z', '2', 'of', 'the', '3', 'consonants', 'that', 'dont', 'begin', 'a', 'states', 'name']
['a', 'turntable']
['a', 'fault', 'in', 'the', 'rotation', 'speed', 'of', 'this', 'device', 'produces', 'a', 'sound', 'called', 'a', 'wow']
['a', 'diode']
['name', 'given', 'to', 'the', 'simplest', 'electron', 'tubes', 'as', 'they', 'have', 'just', '2', 'main', 'parts', 'a', 'plate', '', 'an', 'emitter']
['attar', 'of', 'roses']
['bulgaria', 'is', 'the', 'chief', 'producer', 'of', 'this', 'perfume', 'oil', 'obtained', 'by', 'passing', 'steam', 'thru', 'rose', 'petals']
['a', 'redneck']
['this', 'term', 'for', 'a', 'rural', 'white', 'southerner', 'was', 'originally', 'applied', 'to', 'sunburned', 'agricultural', 'workers']
['a', 'bird']
['the', 'cahow', 'or', 'bermuda', 'petrel', 'a', 'type', 'of', 'this', 'breeds', 'only', 'in', 'bermuda']
['200', 'million']
['of', '2', 'million', '20', 'million', 'or', '200', 'million', 'the', 'length', 'in', 'years', 'of', 'one', 'trip', 'a

['bonnie', 'prince', 'charlie']
['this', 'bonnie', 'prince', 'had', 'a', 'daughter', 'by', 'his', 'mistress', 'clementina', 'walkinshaw']
['bonnie', 'prince', 'charlie']
['this', 'bonnie', 'prince', 'had', 'a', 'daughter', 'by', 'his', 'mistress', 'clementina', 'walkinshaw']
['prince', 'rainier']
['in', '1949', 'he', 'succeeded', 'his', 'grandfather', '', 'prince', 'louis', 'ii', 'as', 'ruler', 'of', 'monaco']
['truant', 'officer']
['one', 'who', 'plays', 'hooky', 'from', 'school', 'might', 'find', 'himself', 'pursued', 'by', 'this', 'type', 'of', 'officer']
['texas', 'rangers']
['since', '1935', 'this', 'agency', 'that', 'originated', 'in', 'the', '1820s', 'has', 'operated', 'as', 'a', 'branch', 'of', 'the', 'texas', 'dept', 'of', 'public', 'safety']
['76', 'cents']
['of', '76', '86', 'or', '96', 'cents', 'what', 'us', 'women', 'working', 'fulltime', 'earn', 'for', 'every', 'dollar', 'their', 'male', 'counterparts', 'make']
['76', 'cents']
['of', '76', '86', 'or', '96', 'cents', 'what

['a', 'mule']
['a', 'hrefhttpwwwjarchivecommedia20050323j02jpg', 'targetblanksarah', 'of', 'the', 'clue', 'crew', 'reports', 'from', 'jfk', 'airport', 'in', 'new', 'yorka', '', 'as', 'in', 'maria', 'full', 'of', 'grace', 'the', 'drug', 'smugglers', 'known', 'by', 'this', 'animal', 'term', 'often', 'run', 'into', 'trouble', 'at', 'us', 'airports']
['brussels']
['rome', 'brussels', 'lisbon']
['cairo']
['khartoum', 'cairo', 'kinshasa']
['manila']
['manila', 'jakarta', 'canberra']
['belmopan']
['brasilia', 'buenos', 'aires', 'bogota', 'belmopan']
['rat', '', 'pig']
['the', 'chinese', 'zodiacs', '12year', 'cycle', 'begins', '', 'ends', 'with', 'these', '2', '3letter', 'animals']
['a', 'magnetic', 'field']
['a', 'hrefhttpwwwjarchivecommedia20050323j17jpg', 'targetblanksarah', 'of', 'the', 'clue', 'crew', 'reports', 'from', 'a', 'metal', 'detector', 'at', 'jfk', 'airport', 'in', 'new', 'yorka', '', 'a', 'metal', 'detector', 'creates', 'this', 'type', 'of', 'fieldif', 'a', 'sort', 'of', 'echo'

['as', 'well', 'as', 'an', 'establishment', 'that', 'sells', 'art', 'it', 'can', 'be', 'one', 'room', 'or', 'area', 'in', 'a', 'museum']
['a', 'satchel']
['its', 'a', 'small', 'bag']
['edmond', 'dantes', 'count', 'of', 'monte', 'cristo']
['he', 'spent', '14', 'years', 'in', 'the', 'gigantic', 'structure', 'of', 'the', 'chateau', 'difa', 'fortress', 'off', 'marseilles']
['sea', 'of', 'galilee']
['israeli', 'playwright', 'nathan', 'alterman', 'called', 'his', 'first', 'play', 'kineret', 'kineret', 'kineret', 'being', 'hebrew', 'for', 'the', 'sea', 'of', 'this']
['sea', 'of', 'galilee']
['israeli', 'playwright', 'nathan', 'alterman', 'called', 'his', 'first', 'play', 'kineret', 'kineret', 'kineret', 'being', 'hebrew', 'for', 'the', 'sea', 'of', 'this']
['nato', 'north', 'atlantic', 'treaty', 'organization']
['the', 'warsaw', 'pact', 'was', 'created', 'in', '1955', 'to', 'oppose', 'this', 'defense', 'organization', 'formed', 'by', 'the', 'us', '', 'european', 'allies']
['7th', 'level']
['i

['this', 'sea', 'off', 'the', 'coast', 'of', 'north', 'america', 'is', 'named', 'for', 'rear', 'admiral', 'francis']
['a', 'homecoming', 'queen']
['cheer', 'up', 'sleepy', 'jean', 'oh', 'what', 'can', 'it', 'mean', 'to', 'a', 'daydream', 'believer', 'and', 'this']
['an', 'anecdote']
['its', 'an', 'amusing', 'yarn', 'or', 'reminiscence', 'in', 'fact', 'did', 'i', 'ever', 'tell', 'you', 'about', 'the', 'time']
['a', 'frog']
['some', 'find', 'this', 'nut', 'on', 'a', 'violin', 'bowribbeting']
['duke', '', 'the', 'king']
['these', '2', 'royal', 'characters', 'from', 'huck', 'finn', 'end', 'up', 'tarred', '', 'feathered', 'after', 'their', 'cons', 'go', 'awry']
['a', 'slug']
['a', 'hit', 'with', 'the', 'fist', 'or', 'one', 'swallow', 'of', 'liquor']
['a', 'housewarming']
['a', 'party', 'to', 'celebrate', 'a', 'persons', 'or', 'familys', 'move', 'to', 'a', 'new', 'residence']
['french', 'guiana']
['of', 'dutch', 'guiana', 'french', 'guiana', 'or', 'british', 'guiana', 'the', 'one', 'that', '

['a', 'large', 'x', 'separating', '2', 'rs', 'in', 'a', 'circular', 'road', 'sign', 'indicates', 'this']
['ditto', 'marks']
['from', 'the', 'latin', 'dictus', 'meaning', 'said', 'its', 'a', 'pair', 'of', 'marks', 'placed', 'under', 'words', 'indicating', 'repetition']
['a', 'tuffet']
['its', 'a', 'hassock', 'or', 'stool', 'as', 'miss', 'muffet', 'could', 'tell', 'you']
['his', 'duty']
['nelson', 'said', 'england', 'expects', 'that', 'every', 'man', 'will', 'do', 'this', '', 'died', 'thanking', 'god', 'he', 'did', 'his']
['empire', 'state', 'building']
['president', 'hoover', 'was', 'among', 'those', 'dedicating', 'this', 'nyc', 'building', 'at', '350', 'fifth', 'ave', 'on', 'may', '1', '1931']
['2', 'of', 'aries', 'taurus', '', 'capricorn']
['2', 'of', '3', 'signs', 'of', 'the', 'zodiac', 'whose', 'symbols', 'have', 'horns']
['2', 'of', 'aries', 'taurus', '', 'capricorn']
['2', 'of', '3', 'signs', 'of', 'the', 'zodiac', 'whose', 'symbols', 'have', 'horns']
['queen', 'anne']
['the', 'st

['sea', 'of', 'japan']
['the', 'korean', 'peninsula', 'borders', 'the', 'yellow', 'sea', 'to', 'the', 'west', '', 'this', 'sea', 'to', 'the', 'east']
['atlantic', 'ocean']
['some', 'say', 'the', 'weddell', 'sea', 'is', 'an', 'arm', 'of', 'the', 'antarctic', 'ocean', 'others', 'say', 'its', 'part', 'of', 'this', 'larger', 'ocean']
['close', 'to', 'you']
['title', 'that', 'completes', 'that', 'is', 'why', 'all', 'the', 'girls', 'in', 'town', 'follow', 'you', 'all', 'around', 'just', 'like', 'me', 'they', 'long', 'to', 'be']
['close', 'to', 'you']
['title', 'that', 'completes', 'that', 'is', 'why', 'all', 'the', 'girls', 'in', 'town', 'follow', 'you', 'all', 'around', 'just', 'like', 'me', 'they', 'long', 'to', 'be']
['galapagos', 'islands']
['these', 'islands', 'made', 'a', 'national', 'park', 'by', 'ecuador', 'in', '1959', 'were', 'made', 'a', 'world', 'heritage', 'site', 'in', '1978']
['harry', 'winston']
['born', 'in', '1896', 'he', 'followed', 'his', 'father', 'jacob', 'winston', 'in

['seen', 'a', 'hrefhttpwwwjarchivecommedia20090211j08jpg', 'targetblankherea', 'this', 'pair', 'is', 'just', 'the', 'thing', 'for', 'pounding', 'or', 'grinding']
['a', 'bow']
['a', 'violinist', 'not', 'an', 'archer', 'giuseppe', 'tartini', 'helped', 'establish', 'the', 'modern', 'style', 'of', 'using', 'this']
['last', 'of', 'mohicans']
['uncas', 'a', 'young', 'indian', 'chief', 'is', 'a', 'hero', 'of', 'this', 'novel']
['thirty', 'years', 'war']
['protestant', '', 'catholic', 'disagreement', 'about', 'the', '1555', 'peace', 'of', 'augsburg', 'was', 'a', 'cause', 'of', 'this', 'numeric', 'war']
['war', 'of', '1812']
['after', 'shutting', 'off', 'trade', 'with', 'the', 'uk', 'president', 'madison', 'advised', 'congress', 'to', 'get', 'ready', 'for', 'what', 'would', 'be', 'known', 'as', 'this', 'war']
['last', 'tycoon']
['it', 'really', 'was', 'f', 'scott', 'fitzgeralds', 'last', 'novel', 'in', 'fact', 'he', 'never', 'finished', 'it']
['wars', 'of', 'the', 'roses']
['a', 'split', 'in', 

['essential', 'oil']
['thyme', 'contains', 'about', '1', 'this', 'type', 'of', 'oil', 'used', 'in', 'fragrances', '', 'pharmaceuticals']
['lawrence', 'ferlinghetti', 'owner', 'of', 'city', 'lights', 'bookstore', 'in', 'san', 'francisco']
['a', 'san', 'francisco', 'resident', 'since', 'the', '1950s', 'in', '1998', 'he', 'became', 'the', 'citys', 'first', 'poet', 'laureate']
['lawrence', 'ferlinghetti', 'owner', 'of', 'city', 'lights', 'bookstore', 'in', 'san', 'francisco']
['a', 'san', 'francisco', 'resident', 'since', 'the', '1950s', 'in', '1998', 'he', 'became', 'the', 'citys', 'first', 'poet', 'laureate']
['lawrence', 'ferlinghetti', 'owner', 'of', 'city', 'lights', 'bookstore', 'in', 'san', 'francisco']
['a', 'san', 'francisco', 'resident', 'since', 'the', '1950s', 'in', '1998', 'he', 'became', 'the', 'citys', 'first', 'poet', 'laureate']
['justice', 'league', 'of', 'america']
['original', 'members', 'of', 'this', '4word', 'group', 'included', 'the', 'flash', 'green', 'lantern', '',

In [18]:
jeopardy['answer_in_question']

0        0.000000
1        0.000000
2        0.000000
3        0.000000
4        0.000000
5        0.000000
6        0.000000
7        0.000000
8        0.000000
9        0.333333
10       0.000000
11       0.000000
12       0.000000
13       0.000000
14       0.500000
15       0.000000
16       0.000000
17       0.000000
18       0.000000
19       0.000000
20       0.000000
21       0.000000
22       0.000000
23       0.000000
24       0.500000
25       0.000000
26       0.000000
27       0.000000
28       0.000000
29       0.000000
           ...   
19969    0.000000
19970    0.000000
19971    0.000000
19972    0.000000
19973    0.000000
19974    0.333333
19975    0.000000
19976    0.000000
19977    0.000000
19978    0.000000
19979    0.000000
19980    0.500000
19981    0.500000
19982    0.000000
19983    0.000000
19984    0.000000
19985    0.000000
19986    0.000000
19987    0.000000
19988    0.000000
19989    0.000000
19990    0.000000
19991    0.000000
19992    0.000000
19993    0

In [19]:
jeopardy[jeopardy['answer_in_question']>0].count()

ShowNumber            2619
AirDate               2619
Round                 2619
Category              2619
Value                 2619
Question              2619
Answer                2619
clean_question        2619
clean_answer          2619
clean_value           2619
clean_airdate         2619
answer_in_question    2619
dtype: int64

In [20]:
# answers in questions in over 2,600 cases!

# Overlaps

Analysis into question overlap. Based on analysis, about 70% overlap occurs between single-word terms in new questions and terms in old questions. Are questions being recycled?


In [21]:
question_overlap=[]


In [22]:
terms_used={}

In [23]:
#start

In [24]:
question_overlap=[]
terms_used=set()
    
for index, row in jeopardy.iterrows():
    split_question = row['clean_question'].split(' ')
    try:
        split_question=[item for item in split_question if len(item)>=6]
    except ValueError:
        pass
    match_count = 0
    for each in split_question:
        if each in terms_used:
            match_count +=1
        terms_used.add(each)
    if len(split_question) > 0:
        match_count /= len(split_question)
    question_overlap.append(match_count)
jeopardy['question_overlap'] = question_overlap
jeopardy['question_overlap'].mean()    
    

0.6919577992203644

In [25]:
jeopardy['question_overlap']

0        0.000000
1        0.000000
2        0.000000
3        0.000000
4        0.000000
5        0.000000
6        0.000000
7        0.250000
8        0.125000
9        0.000000
10       0.000000
11       0.000000
12       0.200000
13       0.142857
14       0.000000
15       0.000000
16       0.000000
17       0.000000
18       0.250000
19       0.000000
20       0.000000
21       0.000000
22       0.000000
23       0.000000
24       0.250000
25       0.000000
26       0.166667
27       0.000000
28       0.000000
29       0.000000
           ...   
19969    0.500000
19970    0.666667
19971    1.000000
19972    0.400000
19973    0.000000
19974    0.750000
19975    1.000000
19976    0.000000
19977    1.000000
19978    0.000000
19979    0.750000
19980    0.833333
19981    1.000000
19982    0.750000
19983    0.714286
19984    1.000000
19985    0.750000
19986    0.750000
19987    1.000000
19988    1.000000
19989    0.000000
19990    1.000000
19991    1.000000
19992    0.666667
19993    1

In [26]:
# there is a 70% overlap, are questions being recycled?

# Low or High Value Questions

Analysis of high dollar or low dollar value. To help gain insight into how to earn more money when you're on Jeopardy.

Based on two categories, analysing using the chi squared test.

Low value -- Any row where Value is less than 800.
High value -- Any row where Value is greater than 800



In [27]:
jeopardy['clean_value']

0         200
1         200
2         200
3         200
4         200
5         200
6         400
7         400
8         400
9         400
10        400
11        400
12        600
13        600
14        600
15        600
16        600
17        600
18        800
19        800
20        800
21        800
22       2000
23        800
24       1000
25       1000
26       1000
27       1000
28       1000
29        400
         ... 
19969    1200
19970    1200
19971    1500
19972    1200
19973    1200
19974    1200
19975    1600
19976    1600
19977    1600
19978    1600
19979    1600
19980    1600
19981    1200
19982    2000
19983    2000
19984    2000
19985    2000
19986    2000
19987       0
19988     100
19989     100
19990     100
19991     100
19992     100
19993     100
19994     200
19995     200
19996     200
19997     200
19998     200
Name: clean_value, Length: 19999, dtype: int64

In [28]:
def high_or_low_value(df):
    if df['clean_value']>800:
        value = 1
    else:
        value = 0
    return value



In [29]:
jeopardy['high_value'] = jeopardy.apply(high_or_low_value,axis=1)

In [30]:
jeopardy['high_value']

0        0
1        0
2        0
3        0
4        0
5        0
6        0
7        0
8        0
9        0
10       0
11       0
12       0
13       0
14       0
15       0
16       0
17       0
18       0
19       0
20       0
21       0
22       1
23       0
24       1
25       1
26       1
27       1
28       1
29       0
        ..
19969    1
19970    1
19971    1
19972    1
19973    1
19974    1
19975    1
19976    1
19977    1
19978    1
19979    1
19980    1
19981    1
19982    1
19983    1
19984    1
19985    1
19986    1
19987    0
19988    0
19989    0
19990    0
19991    0
19992    0
19993    0
19994    0
19995    0
19996    0
19997    0
19998    0
Name: high_value, Length: 19999, dtype: int64

In [None]:
def high_or_low_count(theword):
    low_count=0
    high_count=0
    for each, row in jeopardy.iterrows():
        if theword in row['clean_question'].split(' '):
            if row['high_value'] == 1:
                high_count += 1
            else:
                low_count += 1
    return high_count, low_count

observed_expected = []
comparison_terms = list(terms_used)[:100]
for each in comparison_terms:
    observed_expected.append(high_or_low_count(each))
observed_expected    

In [None]:
cterms=list(terms_used)[:100]

In [61]:
cterms

['spades', 'northwestern', 'thayer', 'vincit', 'freely']

In [62]:
high_or_low_count(cterms[0])
    

(0, 1)

In [99]:
from scipy.stats import chisquare
import numpy as np

high_value_count = len(jeopardy[jeopardy['high_value']==1])
low_value_count = len(jeopardy[jeopardy['high_value']==0])
chi_squared = []

for each in observed_expected:
    total = each[0]+each[1]
    total_prop = total/len(jeopardy)
    expectedhigh = total_prop * high_value_count
    expectedlow = total_prop * low_value_count
    expected = np.array([each[0],each[1]])
    observed = np.array([expectedhigh,expectedlow])
    print(expected)
    print(observed)
    chi_squared.append(chisquare(observed,expected))
chi_squared

[0 1]
[ 0.28671434  0.71328566]
[0 5]
[ 1.43357168  3.56642832]
[0 1]
[ 0.28671434  0.71328566]
[1 0]
[ 0.28671434  0.71328566]
[1 1]
[ 0.57342867  1.42657133]


  terms = (f_obs - f_exp)**2 / f_exp


[Power_divergenceResult(statistic=inf, pvalue=0.0),
 Power_divergenceResult(statistic=inf, pvalue=0.0),
 Power_divergenceResult(statistic=inf, pvalue=0.0),
 Power_divergenceResult(statistic=inf, pvalue=0.0),
 Power_divergenceResult(statistic=0.36392619670985549, pvalue=0.54633377076307466)]

In [98]:
chi_squared

[Power_divergenceResult(statistic=0.40196284612688399, pvalue=0.52607729857054686),
 Power_divergenceResult(statistic=2.00981423063442, pvalue=0.1562844540498966),
 Power_divergenceResult(statistic=0.40196284612688399, pvalue=0.52607729857054686),
 Power_divergenceResult(statistic=2.4877921171956752, pvalue=0.11473257634454047),
 Power_divergenceResult(statistic=0.44487748166127949, pvalue=0.50477764875459963)]

In [63]:
# worksheet
words = ['a', 'bbffff', 'cccxxx', 'dd']
words = [item for item in words if len(item)>=6]

In [87]:
len(jeopardy[jeopardy['high_value']==1])

5734

In [88]:
jeopardy[jeopardy["high_value"] == 1].shape[0]

5734