Skip to content

Extracts clues from the J! Archive website. Added options for using multithreading and gevent.

License

Notifications You must be signed in to change notification settings

n33t1/jeopardy-crawler

 
 

Repository files navigation

jeopardy-parser

Python crawler for jeopardy games on J! Archive.

Setup

git clone https://github.com/n33t1/jeopardy-parser.git
cd jeopardy-parser
pip install -r requirements.txt

This crawler provides 2 kind of output file formats: json and html. You can define the format you want with -o html or -o json.

If you want to download all seasons up to date, run python download_multiprocessing.py or python download_threading_gevent.py. download_threading_gevent.py uses multithreading and gevent while download_multiprocessing.py uses multiprocessing. Generally speaking, the former is faster than the latter. If you want to download a specific season in html files, say season 34, run python download_threading_gevent.py -s 34 -o html.

Sample json output file is included here. For each clue, we have the following attributes:

  • Jtype:
    • "single": single jeopardy. Prices for the corresponding clue should be either 200, 400, 600, 800 or 1000.
    • "double": daily doubles. Prices various.
    • "placeholder": clue was missing from J! Archive website. All other fields are defined as null.
  • price
  • prompt
  • solution
  • parsed_solution

Each game contains the following fields:

  • keys: rounds in this game. If a game has keys equal to [1, 2], then it means that game onlys has Jeopardy! Round and Double Jeopardy! Round.
  • 1: stands for Jeopardy! Round
  • 2: stands for Double Jeopardy! Round. Might be missing for some games.
  • 3: stands for Final Jeopardy! Round. Might be missing for some games.

About

Extracts clues from the J! Archive website. Added options for using multithreading and gevent.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%