# Welcome to ScavHunt Lists Data Analysis

This is intended to be a turn-key environment for doing data analysis with the collected historical lists and transcripts of the UChicago ScavHunt. 

If you aren't familiar with the Jupyter environment, this document freely combines Python code and text. When you first open this document, none of the code has been run - the results you see are the saved outputs from last time this notebook was saved. You can run a cell yourself by clicking on it and pressing the Run button above (see the Help menu for keyboard shortcuts).

TODO: add links to tutorial/reference materials.

## Fetching Data

First things first, let's retrieve some data. The following command will download the collection of historical PDFs of lists, supplementals, fake lists, etc.

In [1]:
!git clone https://github.com/mbmilligan/uchicago-scavhunt-lists.git

Cloning into 'uchicago-scavhunt-lists'...
remote: Enumerating objects: 56, done.[K
remote: Counting objects: 100% (56/56), done.[K
remote: Compressing objects: 100% (56/56), done.[K
remote: Total 56 (delta 3), reused 49 (delta 0), pack-reused 0[K
Unpacking objects: 100% (56/56), done.


In [2]:
ls uchicago-scavhunt-lists/

1987.pdf  1998.pdf            2007.pdf           2016.pdf
1988.pdf  1999.pdf            2008.pdf           2017.pdf
1989.pdf  2000.pdf            2008-ScavAir.pdf   2018.pdf
1990.pdf  2001.pdf            2009.pdf           2019.pdf
1991.pdf  2002-Abductees.pdf  2010.pdf           [0m[01;34mFake Lists[0m/
1992.pdf  2002.pdf            2011-Guinness.pdf  lists.pdf
1993.pdf  2003.pdf            2011.pdf           [01;31mPrevious Lists_ tex files.zip[0m
1994.pdf  2004-AllStars.pdf   2012.pdf           README.md
1995.pdf  2004.pdf            2014a.pdf
1996.pdf  2005.pdf            2014b.pdf
1997.pdf  2006.pdf            2015.pdf


Next, let's retrieve the spreadsheet where the transcribed item texts have been compiled. 

In [4]:
!curl $(cat biglist.url) -o biglist.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 3611k    0 3611k    0     0  2533k      0 --:--:--  0:00:01 --:--:-- 2534k


In [5]:
ls biglist.csv

biglist.csv


## Reading the big list

Here is how we read the list into memory and manipulate it.

In [7]:
import pandas as pd

In [14]:
items = pd.read_csv('biglist.csv', skiprows=1, encoding='utf8',
                    names=['year','special','item','text','scoring','points','notes','formatted'])

In [15]:
items.sample(5)

Unnamed: 0,year,special,item,text,scoring,points,notes,formatted
8403,2014(1),,193,A hornets’ nest. Minimum length 1.5 feet. No h...,15 points,15,,2014(1).193. A hornets’ nest. Minimum length 1...
10035,2018,,260,"If we see an out-of-place stone on campus, we ...",7 points,7,,2018.260. If we see an out-of-place stone on c...
3509,1998,,239,Harry Caray Beanie Baby. [25 points. 5 bonus p...,25 points. 5 bonus points if it can take a sho...,25. 5 bonus if it can take a shot. 1 bonus if ...,,1998.239. Harry Caray Beanie Baby. [25 points....
9967,2018,,192,What all started with some Driscoll's® strawbe...,6 points,6,,2018.192. What all started with some Driscoll'...
8040,2012,,189,Bumper stickers from 2012 Democratic president...,"2 points per sticker, max three stickers","2 per sticker, max three stickers",,2012.189. Bumper stickers from 2012 Democratic...


The above table is randomly sampled from the full list. If you re-run the above code cell, you will get different items back.

In [19]:
print(items.iloc[8040].formatted)

2012.189. Bumper stickers from 2012 Democratic presidential candidates other than Barack Obama. [2 points per sticker, max three stickers]


Here is how you would search for a keyword.

In [24]:
for i in items[items.text.str.contains("Shaq")].formatted: print(i)

2004.242. What's the Big Aristotle's take on fowl trouble? Write an editorial for Shaq's Chicken Herald. [6 pizzles]
2011.216. By 11 a.m. on Sunday, have your team's website be the nth Google search result for the phrase "Mama Shaq, Mama Shaq, Shaq's your mom, that's a fact". [ $ rac{20}{n}$ points ]
2014(2).224. A shaq-in-the-box, shaq'o'lantern, and battleShaqxe. [3 points each, and that's a fact]
2014(2).240. At the Bata Museum, what size is Shaquile O'Neal's shoe? What did the Polish refer to break shoes as? What were flat-soled cogs used for? How did smuggler's shoes work? [6 points]
2017.Road Trip.27. Linda's friends cheer her up with donuts at the Krispy Kreme owned by her all-time favorite NBA player, Shaq. She realizes that there might be some reason to stay on Earth a bit longer. [3.2 Magicly Hot points] [3.2 Magicly Hot points]
