# Jeopardy! Project

The objective of this project was to explore and play with a rich dataset of the reality TV show, *Jeopardy!* and demonstrate the use of pandas with Python.

In [30]:
import pandas as pd
pd.set_option('display.max_colwidth', None)

jeopardy_data = pd.read_csv("jeopardy.csv")
print(jeopardy_data.columns)

Index(['Show Number', ' Air Date', ' Round', ' Category', ' Value',
       ' Question', ' Answer'],
      dtype='object')


It appears that the columns on the data file have an extra ' ' in the beggining, let's correct that and reescribe the names so it is easier to manipulate them later.

In [31]:
jeopardy_data = jeopardy_data.rename(columns = {"Show Number": "show_number", " Air Date": "air_date", " Round" : "round", " Category": "category", " Value": "value", " Question": "question", " Answer": "answer"})
print(jeopardy_data.columns)

Index(['show_number', 'air_date', 'round', 'category', 'value', 'question',
       'answer'],
      dtype='object')


Now, so we can understand better what is in each column, let's print the first 6 lines of the dataset:

In [32]:
jeopardy_data.head(6)

Unnamed: 0,show_number,air_date,round,category,value,question,answer
0,4680,2004-12-31,Jeopardy!,HISTORY,$200,"For the last 8 years of his life, Galileo was under house arrest for espousing this man's theory",Copernicus
1,4680,2004-12-31,Jeopardy!,ESPN's TOP 10 ALL-TIME ATHLETES,$200,"No. 2: 1912 Olympian; football star at Carlisle Indian School; 6 MLB seasons with the Reds, Giants & Braves",Jim Thorpe
2,4680,2004-12-31,Jeopardy!,EVERYBODY TALKS ABOUT IT...,$200,"The city of Yuma in this state has a record average of 4,055 hours of sunshine each year",Arizona
3,4680,2004-12-31,Jeopardy!,THE COMPANY LINE,$200,"In 1963, live on ""The Art Linkletter Show"", this company served its billionth burger",McDonald's
4,4680,2004-12-31,Jeopardy!,EPITAPHS & TRIBUTES,$200,"Signer of the Dec. of Indep., framer of the Constitution of Mass., second President of the United States",John Adams
5,4680,2004-12-31,Jeopardy!,3-LETTER WORDS,$200,"In the title of an Aesop fable, this insect shared billing with a grasshopper",the ant


It's better to understand now what is happening in this data and what we can search for. To have more information we can use the .dtypes() function to view some more information about the data.

In [33]:
jeopardy_data.dtypes

show_number     int64
air_date       object
round          object
category       object
value          object
question       object
answer         object
dtype: object

It looks like not all the data is in its ideal format, let's change the date first and see how it looks:

In [42]:
jeopardy_data.air_date = pd.to_datetime(jeopardy_data.air_date)
print(jeopardy_data.air_date.head())

0   2004-12-31
1   2004-12-31
2   2004-12-31
3   2004-12-31
4   2004-12-31
Name: air_date, dtype: datetime64[ns]


The information still the same but now we can work better with the date format.

Let's discover how many questions we have in this dataset for each year:

In [44]:
jeopardy_data.groupby(jeopardy_data["air_date"].dt.year).answer.count()

air_date
1984     1179
1985      888
1986     1409
1987     1275
1988     1290
1989     2067
1990     4337
1991     1444
1992     1885
1993     2132
1994     1136
1995     1138
1996     4891
1997    13099
1998    13143
1999    13540
2000    13439
2001    12097
2002     6859
2003     9425
2004    13190
2005    13560
2006    13726
2007    13940
2008    14036
2009    13579
2010    13756
2011    13375
2012     1093
Name: answer, dtype: int64

In 2012 the number suddenly drops, it's value is almost 1/12 of the values from just some years before, maybe it's because the data only has until January. But we can confirm that:

In [47]:
jeopardy_data.air_date.max()

Timestamp('2012-01-27 00:00:00')

We were right! Now as we are using pandas, we can look for how many questions about pandas were asked, first we will have to change the data type of the Questions column to string, then we can search it.

In [50]:
jeopardy_data.question.astype(str)
def filter_data(data, words):
  filter = lambda x: all(word.lower() in x.lower() for word in words)
  return data.loc[data["question"].apply(filter)]

filtered = filter_data(jeopardy_data, ["Panda", "Pandas"])
print(filtered["question"])

49769     The World Wildlife Fund is working with the Chinese on a detailed survey of pandas & this, their only food
49806                    Tiffany sells a collection of jewelry made to look like this woody grass consumed by pandas
80415                                            This treelike grass with a woody stem is the favored food of pandas
92855      "Revenge of the Space Pandas" is a play for children by this playwright better known for profane dialogue
100925               Because they can't digest cellulose, pandas may eat 90 pounds a day of the shoots of this plant
106940                                                   Giant pandas;  the movie "Citizen Kane";  a clear-cut issue
125394                                    Pandas have an enlarged wristbone that functions like this digit in humans
147179                                                        This scientific field that might study pandas or pumas
153609                                                          