Skip to content
This repository has been archived by the owner on Feb 2, 2022. It is now read-only.

Improve parsing failure logging #167

Merged
merged 1 commit into from
Jul 4, 2019

Conversation

rgreinho
Copy link
Member

@rgreinho rgreinho commented Jul 4, 2019

Types of changes

  • New feature (non-breaking change which adds functionality)
  • Code cleanup / Refactoring

Description

Logs more details about the fields that could not be parsed correctly.

It is already showing great information in the logs:

± scrapd -vv --pages 3 --format count
2019-07-04T12:30:44-0500 scrapd.core.apd:862  Retrieving fatalities from 0001-01-01 to 9999-12-31.
2019-07-04T12:30:44-0500 scrapd.core.apd:867  Fetching page 1...
2019-07-04T12:30:44-0500 scrapd.core.apd:878  4 fatality page(s) to process.
2019-07-04T12:30:46-0500 scrapd.core.apd:809  Fatality report http://austintexas.gov/news/traffic-fatality-36-3 was not parsed correctly:
	 * could not retrieve the location
2019-07-04T12:30:46-0500 scrapd.core.apd:809  Fatality report http://austintexas.gov/news/traffic-fatality-37-4 was not parsed correctly:
	 * could not retrieve the location
2019-07-04T12:30:47-0500 scrapd.core.apd:809  Fatality report http://austintexas.gov/news/traffic-fatality-35-3 was not parsed correctly:
	 * could not retrieve the location
2019-07-04T12:30:47-0500 scrapd.core.apd:911  4 fatality page(s) is/are within the specified time range.
2019-07-04T12:30:47-0500 scrapd.core.apd:867  Fetching page 2...
2019-07-04T12:30:47-0500 scrapd.core.apd:878  9 fatality page(s) to process.
2019-07-04T12:30:47-0500 scrapd.core.apd:809  Fatality report http://austintexas.gov/news/traffic-fatality-34-4 was not parsed correctly:
	 * could not retrieve the location
2019-07-04T12:30:47-0500 scrapd.core.apd:809  Fatality report http://austintexas.gov/news/traffic-fatality-31-3 was not parsed correctly:
	 * could not retrieve the location
2019-07-04T12:30:47-0500 scrapd.core.apd:809  Fatality report http://austintexas.gov/news/traffic-fatality-30-1 was not parsed correctly:
	 * could not retrieve the location
2019-07-04T12:30:47-0500 scrapd.core.apd:809  Fatality report http://austintexas.gov/news/traffic-fatality-25-update was not parsed correctly:
	 * could not retrieve the location
2019-07-04T12:30:47-0500 scrapd.core.apd:809  Fatality report http://austintexas.gov/news/traffic-fatality-28-3 was not parsed correctly:
	 * could not retrieve the location
2019-07-04T12:30:47-0500 scrapd.core.apd:809  Fatality report http://austintexas.gov/news/traffic-fatality-33-5 was not parsed correctly:
	 * could not retrieve the location
2019-07-04T12:30:47-0500 scrapd.core.apd:809  Fatality report http://austintexas.gov/news/traffic-fatality-32-5 was not parsed correctly:
	 * could not retrieve the location
2019-07-04T12:30:47-0500 scrapd.core.apd:809  Fatality report http://austintexas.gov/news/traffic-fatality-27-4 was not parsed correctly:
	 * could not retrieve the location
2019-07-04T12:30:47-0500 scrapd.core.apd:809  Fatality report http://austintexas.gov/news/traffic-fatality-29-3 was not parsed correctly:
	 * could not retrieve the location
2019-07-04T12:30:47-0500 scrapd.core.apd:911  9 fatality page(s) is/are within the specified time range.
2019-07-04T12:30:47-0500 scrapd.core.apd:867  Fetching page 3...
2019-07-04T12:30:48-0500 scrapd.core.apd:878  9 fatality page(s) to process.
2019-07-04T12:30:53-0500 scrapd.core.apd:809  Fatality report http://austintexas.gov/news/traffic-fatality-26-3 was not parsed correctly:
	 * could not retrieve the location
2019-07-04T12:30:54-0500 scrapd.core.apd:809  Fatality report http://austintexas.gov/news/traffic-fatality-22-3 was not parsed correctly:
	 * could not retrieve the location
2019-07-04T12:30:54-0500 scrapd.core.apd:809  Fatality report http://austintexas.gov/news/traffic-fatality-21-3 was not parsed correctly:
	 * could not retrieve the location
2019-07-04T12:30:54-0500 scrapd.core.apd:809  Fatality report http://austintexas.gov/news/traffic-fatality-25-4 was not parsed correctly:
	 * could not retrieve the deceased information
	 * could not retrieve the location
	 * no deceased information found in fatality page
	 * age is invalid: None
2019-07-04T12:30:54-0500 scrapd.core.apd:809  Fatality report http://austintexas.gov/news/traffic-fatality-18-4 was not parsed correctly:
	 * could not retrieve the location
2019-07-04T12:30:54-0500 scrapd.core.apd:809  Fatality report http://austintexas.gov/news/traffic-fatality-23-3 was not parsed correctly:
	 * could not retrieve the location
2019-07-04T12:30:54-0500 scrapd.core.apd:809  Fatality report http://austintexas.gov/news/traffic-fatality-24-5 was not parsed correctly:
	 * could not retrieve the location
2019-07-04T12:30:54-0500 scrapd.core.apd:809  Fatality report http://austintexas.gov/news/traffic-fatality-19-5 was not parsed correctly:
	 * could not retrieve the location
2019-07-04T12:30:54-0500 scrapd.core.apd:809  Fatality report http://austintexas.gov/news/traffic-fatality-20-4 was not parsed correctly:
	 * could not retrieve the location
	 * age is invalid: -1
2019-07-04T12:30:54-0500 scrapd.core.apd:911  9 fatality page(s) is/are within the specified time range.
2019-07-04T12:30:54-0500 scrapd.cli.cli:87   Total: 21

Checklist:

  • [] I have updated the documentation accordingly
  • [] I have written unit tests

Fixes #152

Logs more details about the fields that could not be parsed correctly.

Fixes scrapd#152
@rgreinho rgreinho self-assigned this Jul 4, 2019
@rgreinho rgreinho merged commit 84ef3c3 into scrapd:master Jul 4, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Better error handling and logging strategy for parsing failures
1 participant