Scraper now runs for 7 hours or more #249

Closed
nwinklareth opened this Issue Jan 18, 2014 · 1 comment

Comments

Projects
None yet
1 participant
Contributor

nwinklareth commented Jan 18, 2014

Since no longer able to use the name search functionality on the Cook County Sheriff's website the scraper takes too long to run. Need a faster scraper to find new, update existing and discharge inmates.

@nwinklareth nwinklareth added a commit to nwinklareth/cookcountyjail that referenced this issue Jan 19, 2014

@nwinklareth nwinklareth Issue #249 - started on new scraper.
Version version of Monitor and SearchCommands class
5eb6909

@nwinklareth nwinklareth added a commit to nwinklareth/cookcountyjail that referenced this issue Jan 19, 2014

@nwinklareth nwinklareth Issue #249 - added gevent version of http get that does multiple atte…
…mpts
cf4a0d2

@nwinklareth nwinklareth added a commit to nwinklareth/cookcountyjail that referenced this issue Jan 19, 2014

@nwinklareth nwinklareth Issue #249 - extracted http class out of testing code f238194

@nwinklareth nwinklareth added a commit to nwinklareth/cookcountyjail that referenced this issue Jan 20, 2014

@nwinklareth nwinklareth Issue #249 - added parsing of CCJ inmates details page. Basically cop…
…ied existing InmateDetails class, removed fetching logic from it and wrote a test for it.
d73a7d8

@nwinklareth nwinklareth added a commit to nwinklareth/cookcountyjail that referenced this issue Jan 25, 2014

@nwinklareth nwinklareth Issue #249 - fixed some typos. c05da3b

@nwinklareth nwinklareth added a commit to nwinklareth/cookcountyjail that referenced this issue Jan 25, 2014

@nwinklareth nwinklareth Issue #249 - extracted class InmateDetails from test case a395c75

@nwinklareth nwinklareth added a commit to nwinklareth/cookcountyjail that referenced this issue Jan 25, 2014

@nwinklareth nwinklareth Issue #249 - fixed missing import statement that resulted from forget…
…ting to run test after extracting InmateDetails class.
3b24fa9

@nwinklareth nwinklareth added a commit to nwinklareth/cookcountyjail that referenced this issue Feb 1, 2014

@nwinklareth nwinklareth Issue #249 - added basic search for inmate mechanism b4e7b9f

@nwinklareth nwinklareth added a commit to nwinklareth/cookcountyjail that referenced this issue Feb 2, 2014

@nwinklareth nwinklareth Issue #249 - extracted InmatesScraper out into own class file b2caaf1

@nwinklareth nwinklareth added a commit to nwinklareth/cookcountyjail that referenced this issue Feb 3, 2014

@nwinklareth nwinklareth Issue #249 - added in storing of inmate details into inmate model 43ebb63

@nwinklareth nwinklareth added a commit to nwinklareth/cookcountyjail that referenced this issue Feb 8, 2014

@nwinklareth nwinklareth Issue #249 - added notification mechanism to Monitor, this will be us…
…ed to track state transitions in the system and in particular to help determine if shutdown has occurred.
13cacd5

@nwinklareth nwinklareth added a commit to nwinklareth/cookcountyjail that referenced this issue Feb 9, 2014

@nwinklareth nwinklareth Issue #249 - added notification mechanism to Monitor
Added Heartbeat mechanism
9d5d031

@nwinklareth nwinklareth added a commit to nwinklareth/cookcountyjail that referenced this issue Feb 9, 2014

@nwinklareth nwinklareth Issue #249 - added initial version of Controller, just starts up Moni…
…tor and Heartbeat
e3ac8de

@nwinklareth nwinklareth added a commit to nwinklareth/cookcountyjail that referenced this issue Feb 22, 2014

@nwinklareth nwinklareth Issue #249 - modified controller to respond to halt notification 9155db6

@nwinklareth nwinklareth added a commit to nwinklareth/cookcountyjail that referenced this issue Feb 22, 2014

@nwinklareth nwinklareth Issue #249 - controller now sends start command to search_commands an…
…d responds to a finish command from it and sends finish command to inmate_scraper
156ebb0

@nwinklareth nwinklareth added a commit to nwinklareth/cookcountyjail that referenced this issue Feb 23, 2014

@nwinklareth nwinklareth Issue #249 completed controller so that it co-ordinates the other act…
…ors in the system.
cadbbf6

@nwinklareth nwinklareth added a commit to nwinklareth/cookcountyjail that referenced this issue Feb 23, 2014

@nwinklareth nwinklareth Issue #249 - added timestamps to debug messages and removed it from n…
…otification messages
fda7450

@nwinklareth nwinklareth added a commit to nwinklareth/cookcountyjail that referenced this issue Feb 23, 2014

@nwinklareth nwinklareth Issue #249 - switched log to use Monitor 631af9c

@nwinklareth nwinklareth added a commit to nwinklareth/cookcountyjail that referenced this issue Feb 23, 2014

@nwinklareth nwinklareth Issue #249 - added ng_scraper.py command shell which calls scrape. Fi…
…xed calling issues and successfully ran scraper, finding a 174 new inmates in 3 minutes.
471dd21

@nwinklareth nwinklareth added a commit to nwinklareth/cookcountyjail that referenced this issue Feb 23, 2014

@nwinklareth nwinklareth Issue #249 - added generate update inmate status commands b2d3a2c

@nwinklareth nwinklareth added a commit to nwinklareth/cookcountyjail that referenced this issue Feb 23, 2014

@nwinklareth nwinklareth Issue #249 - added update inmate status to inmate scraper d1fce71

@nwinklareth nwinklareth added a commit to nwinklareth/cookcountyjail that referenced this issue Feb 23, 2014

@nwinklareth nwinklareth Issue #249 - added discharging of inmates d63c505

@nwinklareth nwinklareth added a commit to nwinklareth/cookcountyjail that referenced this issue Feb 24, 2014

@nwinklareth nwinklareth Issue #249 - finished controller changes to handle updating inmate st…
…atus
e08f448

@nwinklareth nwinklareth added a commit to nwinklareth/cookcountyjail that referenced this issue Feb 24, 2014

@nwinklareth nwinklareth Issue #249 - fixed error with discharge method, removed development l…
…ogging messages
e747b5c

@nwinklareth nwinklareth added a commit to nwinklareth/cookcountyjail that referenced this issue Feb 24, 2014

@nwinklareth nwinklareth Issue #249 - updated scraper.sh shell script to use new scraper f251ba3
Contributor

nwinklareth commented Feb 24, 2014

Fixed this by implementing a concurrent scraper based on gevent library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment