# Jeopardy! 
I'll use Tableau to create a visualization of over 20,000 Jeopardy questions. But first, I'll need to clean the data and assign coordinate for the final shape.

In [1]:
import pandas as pd
import numpy as np

This dataset is from Kaggle's [350,000 Jeopardy Questions set](https://www.kaggle.com/prondeau/350000-jeopardy-questions?select=master_season1-35.tsv).

In [2]:
jeo = pd.read_csv('./data/master_season1-35.tsv', sep='\t')

In [3]:
jeo.head()

Unnamed: 0,round,value,daily_double,category,comments,answer,question,air_date,notes
0,1,100,no,LAKES & RIVERS,-,River mentioned most often in the Bible,the Jordan,1984-09-10,-
1,1,200,no,LAKES & RIVERS,-,Scottish word for lake,loch,1984-09-10,-
2,1,400,no,LAKES & RIVERS,-,American river only 33 miles shorter than the ...,the Missouri,1984-09-10,-
3,1,500,no,LAKES & RIVERS,-,"World's largest lake, nearly 5 times as big as...",the Caspian Sea,1984-09-10,-
4,1,100,no,INVENTIONS,-,Marconi's wonderful wireless,the radio,1984-09-10,-


In [4]:
jeo.isna().sum()

round           0
value           0
daily_double    0
category        0
comments        0
answer          0
question        0
air_date        0
notes           0
dtype: int64

Encoding the "Daily Double" column as that will be used to distinguish the color later on.

In [5]:
jeo['daily_double'] = [1 if x != 'no' else 0 for x in jeo['daily_double']]

In [6]:
jeo['daily_double'].value_counts()

0    332620
1     17021
Name: daily_double, dtype: int64

This set contains close to 350,000 questions but I'll only need about 20,000 for the final product.

In [7]:
len(jeo)

349641

Some light cleaning of the text to remove the character '\\'.

In [8]:
def clean(text):
    return text.replace('\\', '')

In [9]:
jeo['answer'] = jeo['answer'].apply(clean)
jeo['category'] = jeo['category'].apply(clean)
jeo['question'] = jeo['question'].apply(clean)


### Assigning Coordinates

The x, y coordinats below are all the points on the grid 190 x 107.

In [10]:
x = list(range(1, 191)) * 107

In [11]:
y = list(range(1, 108)) * 190

Pairing them up.

In [12]:
coord = list(zip(x, y))

In [13]:
coord[0]

(1, 1)

Next, I'm importing another CSV I created that has the coordinate to spell out "JEOPARDY!". These will be assigned to the Daily Double questions.

In [14]:
dd = pd.read_csv('./data/all_coord.csv')

In [15]:
dd = dd[['X', 'Y']]

In [16]:
dd.tail()

Unnamed: 0,X,Y
2380,152.0,49.0
2381,152.0,48.0
2382,152.0,44.0
2383,152.0,43.0
2384,152.0,42.0


In [17]:
dd.isna().sum()

X    1
Y    1
dtype: int64

In [18]:
dd.dropna(inplace=True)
dd.drop_duplicates(inplace = True)
dd.reset_index(inplace=True)
dd['X'] = dd['X'].astype(int)
dd['Y'] = dd['Y'].astype(int)

Pairing these up as well.

In [19]:
dd_coor = list(zip(dd['X'], dd['Y']))

I'm going to remove these DD coordinates from the original list. Making them a set will be faster to check through.

In [20]:
to_check = set(dd_coor)

In [21]:
len(to_check)

2373

In [22]:
len(coord)

20330

This function only keep coordinates that do not appear in the DD set.

In [23]:
def strip_coord(some_list):
    cleaned = []
    for x in some_list:
        if x not in to_check:
            cleaned.append(x)
    return cleaned

In [24]:
coord = strip_coord(coord)

Unpairing the normal coordinates.

In [25]:
norm_x = [x for x, y in coord]
norm_y = [y for x, y in coord]

This DataFrame is only Daily Doubles, will be assigned the DD coordinates.

In [26]:
daily = jeo[jeo['daily_double'] == 1].tail(2373).reset_index()

These are the normal questions, assigned the normal coordinates.

In [27]:
normal = jeo[jeo['daily_double'] == 0].tail(17957).reset_index()

In [28]:
daily = pd.concat([daily, dd[['X']], dd[['Y']]], axis = 1)

In [29]:
normal['X'] = norm_x

In [30]:
normal['Y'] = norm_y

Adding everything into one DataFrame.

In [31]:
final = pd.concat([daily, normal], axis = 0)

In [32]:
final

Unnamed: 0,index,round,value,daily_double,category,comments,answer,question,air_date,notes,X,Y
0,304468,2,3800,1,WE'RE ON THE ROAD TO,-,...see the sights in this capital; we've left ...,Riyadh (Saudi Arabia),2015-12-07,-,35,51
1,304490,2,3000,1,PULLING RANK,-,This prince & would-be king ignored advisors &...,Bonnie Prince Charlie,2015-12-07,-,35,50
2,304505,1,500,1,COUNTRIES' NATIONAL ANTHEMS,-,"""Himno Istmeño"", or ""Isthmus Hymn""",Panama,2015-12-08,-,35,49
3,304525,2,3000,1,COLONIAL NEW ENGLAND,-,First formed to drive New York settlers out of...,the Green Mountain Boys,2015-12-08,-,35,48
4,304536,2,2000,1,PARLIAMENT VS. CONGRESS,-,It's the area from which you can watch politic...,the gallery,2015-12-08,-,35,47
...,...,...,...,...,...,...,...,...,...,...,...,...
17952,349636,2,400,0,MAKE IT SNAPPY,-,"As well as photosharing on this app, you can w...",Snapchat,2019-07-26,-,186,103
17953,349637,2,800,0,MAKE IT SNAPPY,-,"Genus Antirrhinum, these flowers snap closed a...",snapdragons,2019-07-26,-,187,104
17954,349638,2,1600,0,MAKE IT SNAPPY,-,This hyphenated tool company owns brands like ...,Snap-On,2019-07-26,-,188,105
17955,349639,2,2000,0,MAKE IT SNAPPY,-,"In 2019 meteorologist Daryl Ritchison at NDSU,...",North Dakota State University,2019-07-26,-,189,106


In [33]:
final.isna().sum()

index           0
round           0
value           0
daily_double    0
category        0
comments        0
answer          0
question        0
air_date        0
notes           0
X               0
Y               0
dtype: int64

In [34]:
final.dropna(inplace = True)

In [35]:
dd_dict = {1 : 'DAILY DOUBLE !!!', 0:''}

Changing Daily Doubles to the text I want displayed and exporting.

In [36]:
final['daily_double'] = final['daily_double'].map(dd_dict)

In [37]:
final['daily_double'].value_counts()

                    17957
DAILY DOUBLE !!!     2373
Name: daily_double, dtype: int64

In [38]:
final.to_csv('./data/final_coord.csv', index=False)