# This is Jeopardy!
    Creating Visualizations with Jeopardy Data


### If you've never seen the show before...

- Three players per game
- Players are choosing and answering questions from a 5x6 (30 question) board
- Each question is worth some amount of money. If they answer correctly they win the money, if they answer incorrectly they lose the money from their total
- The game is divided into three components or "rounds"
    - Round 1: Questions on the board worth \$200-\$1000
    - Round 2: Questions on the board worth \$400-\$2000
    - Final Jeopardy: ("round 3" in the data) All players with a positive score make a wager on a single question.
- Occasionally a chosen question is revealed to be a "Daily Double" at which point the player can make a wager and only that player can answer the question  
- The winner goes on to play the next game against two new players


### About the Data
This data is obtained from J! Archive (j-archive.com) and covers a <a href="https://en.wikipedia.org/wiki/Ken_Jennings#Streak_on_Jeopardy!">very particular set of 75 games</a> between June 2, 2004 and Nov 30, 2004.

Four CSV files:
 - <b>games.csv</b>
     - gameId: Unique ID for that game used by j-archive. You can view full game data at http://www.j-archive.com/showgame.php?game_id=<gameId\>
     - date: Date the game was broadcast
     
 - <b>scores.csv</b>
     - gameId: (see above)
     - playerId: Unique ID for that player used by j-archive. You can view full player data at http://www.j-archive.com/showplayer.php?player_id=<playerId\>
     - breakScore: Score at the first commercial break
     - round1: Score after round 1
     - round2: Score after round 2
     - final: Score after Final Jeopardy question (round 3)
     - coryat: An adjusted score that disregards the effect of wagering (https://j-archive.com/help.php#coryatscore). Used only for analysis purposes, is not an official game score or used in the game in any way
     
- <b>questions.csv</b>
    - gameId: (see above)
    - round: There are two rounds in Jeopardy. Final Jeopardy question is denoted as round '3'
    - pickorder: The order the questions were chosen by players during that game. Final Jeopardy question has value '0'
    - amount: Amount the question was worth. In the case of Daily Double questions (where a single player wagers an amount of their choosing) this amount is set to what they wagered. In Final Jeopardy the amount is always 0. Players can make different wagers here, so look up their score and do some math if you need this information.

- <b>answers.csv</b>
    - questionId: Unique ID from the questions file
    - playerId: (see above)
    - correct: Boolean whether or not the player answered correctly. 
    
    
### When working through these:

- Use either Matplotlib or Seaborn. Personally, I prefer Seaborn
- If time allows, try to make it pretty
    - Label the axes
    - Add a title
    - Add a legend
    - Make sure the values in the axis are properly formatted
- If you can think of a more interesting visualization than the one I suggested, go for it!


In [None]:
import pandas as pd
import seaborn as sns
import matplotlib as plt

## Read and process the data

In [None]:
games = pd.read_csv('games.csv')
games.set_index('id', inplace=True)
games.date = pd.to_datetime(games.date)
games.sort_values(by='date', inplace=True)
games.head(2)

In [None]:
scores = pd.read_csv('scores.csv')
scores.head(2)

In [None]:
questions = pd.read_csv('questions.csv')
questions.set_index('id', inplace=True)
questions.head(2)

In [None]:
answers = pd.read_csv('answers.csv')
answers.head(2)

### Make Ranked Data

In [None]:
def first(aList):
    return aList.iloc[0]

def second(aList):
    if len(aList) > 1:
        return aList.iloc[1]

def third(aList):
    if len(aList) > 2:
        return aList.iloc[2]
    
scores.sort_values(by=['gameId', 'final'], ascending=False, inplace=True)
ranked_scores = scores.groupby('gameId').agg({
    'breakScore': [first, second, third],
    'round1': [first, second, third],
    'round2': [first, second, third],
    'final': [first, second, third],
    'coryat': [first, second, third]
})
ranked_scores.head(2)

### Add cumulative values

In [None]:
answers_with_scores = answers.merge(questions, how='left', left_on='questionId', right_index=True)
answers_with_scores['score_impact'] = answers_with_scores.apply(lambda r: r['amount'] if r['correct'] else -r['amount'], axis=1)

answers_with_scores['score_cumulative'] = answers_with_scores[
    ['gameId', 'playerId', 'pickorder','round', 'score_impact']].sort_values(by=['round','pickorder']).groupby(
    ['gameId', 'playerId']).cumsum()['score_impact']
answers_with_scores.head()


## 1. Line plot

Create a line plot showing the 1st, 2nd, and 3rd place scores for each game over time (days on the x-axis). Use any score value column you want.


## 2. Scatterplot

Create a scatterplot with a dot for each non-Final Jeopardy (round 3) answer across all games and players

<b>x-axis:</b> Question order in game (round, pickorder)<br>
<b>y-axis:</b> Score (after question was answered) of the player answering the question<br>
<b>color:</b> Whether the question was right or wrong.

### 2.2 Splitting up the scatterplot

Design a visualization that differentiates Ken's answers from the other players' in some way. This could be a different shape, different color, two plots, etc.

### 2.3 Add a line

Add a line plot or line plots showing the average values of your scatterplot across the x-axis. You may want to show lines for only Ken's scores, only third/second place scores, only correct or incorrect answers, etc.


## 3. Bar plot

Create a stacked bar plot showing mean scores of first, second, and third place contestants at various points in the game. The goal is to communicate how champions (and the less champion-ish) tend to build up their score over the course of play. Strong start? Major gains in the second round?

You will find that third place tends to lose at Final Jeopardy more often than they win, leaving you with a negative value for the difference between their round2 and final score. Choose how best to represent this.


## 4. Histogram
Create a plot with four subplots. In each subplot draw two overlapping histograms. One histogram showing the distribution of Ken Jenning's scores and another one showing the distribution of all other player's scores at the following different points in the game:

- End of Round 1
- End of Round 2
- Final score
- Coryat-adjusted score

Note: You should have half the data in the "Ken" histograms as in the "non-Ken" histograms
Also, do not use the "first, second, third" place scores for this -- Ken is not guaranteed to be in first place