Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve parsing failure logging #167

Merged
merged 1 commit into from Jul 4, 2019

Conversation

@rgreinho
Copy link
Member

commented Jul 4, 2019

Types of changes

  • New feature (non-breaking change which adds functionality)
  • Code cleanup / Refactoring

Description

Logs more details about the fields that could not be parsed correctly.

It is already showing great information in the logs:

± scrapd -vv --pages 3 --format count
2019-07-04T12:30:44-0500 scrapd.core.apd:862  Retrieving fatalities from 0001-01-01 to 9999-12-31.
2019-07-04T12:30:44-0500 scrapd.core.apd:867  Fetching page 1...
2019-07-04T12:30:44-0500 scrapd.core.apd:878  4 fatality page(s) to process.
2019-07-04T12:30:46-0500 scrapd.core.apd:809  Fatality report http://austintexas.gov/news/traffic-fatality-36-3 was not parsed correctly:
	 * could not retrieve the location
2019-07-04T12:30:46-0500 scrapd.core.apd:809  Fatality report http://austintexas.gov/news/traffic-fatality-37-4 was not parsed correctly:
	 * could not retrieve the location
2019-07-04T12:30:47-0500 scrapd.core.apd:809  Fatality report http://austintexas.gov/news/traffic-fatality-35-3 was not parsed correctly:
	 * could not retrieve the location
2019-07-04T12:30:47-0500 scrapd.core.apd:911  4 fatality page(s) is/are within the specified time range.
2019-07-04T12:30:47-0500 scrapd.core.apd:867  Fetching page 2...
2019-07-04T12:30:47-0500 scrapd.core.apd:878  9 fatality page(s) to process.
2019-07-04T12:30:47-0500 scrapd.core.apd:809  Fatality report http://austintexas.gov/news/traffic-fatality-34-4 was not parsed correctly:
	 * could not retrieve the location
2019-07-04T12:30:47-0500 scrapd.core.apd:809  Fatality report http://austintexas.gov/news/traffic-fatality-31-3 was not parsed correctly:
	 * could not retrieve the location
2019-07-04T12:30:47-0500 scrapd.core.apd:809  Fatality report http://austintexas.gov/news/traffic-fatality-30-1 was not parsed correctly:
	 * could not retrieve the location
2019-07-04T12:30:47-0500 scrapd.core.apd:809  Fatality report http://austintexas.gov/news/traffic-fatality-25-update was not parsed correctly:
	 * could not retrieve the location
2019-07-04T12:30:47-0500 scrapd.core.apd:809  Fatality report http://austintexas.gov/news/traffic-fatality-28-3 was not parsed correctly:
	 * could not retrieve the location
2019-07-04T12:30:47-0500 scrapd.core.apd:809  Fatality report http://austintexas.gov/news/traffic-fatality-33-5 was not parsed correctly:
	 * could not retrieve the location
2019-07-04T12:30:47-0500 scrapd.core.apd:809  Fatality report http://austintexas.gov/news/traffic-fatality-32-5 was not parsed correctly:
	 * could not retrieve the location
2019-07-04T12:30:47-0500 scrapd.core.apd:809  Fatality report http://austintexas.gov/news/traffic-fatality-27-4 was not parsed correctly:
	 * could not retrieve the location
2019-07-04T12:30:47-0500 scrapd.core.apd:809  Fatality report http://austintexas.gov/news/traffic-fatality-29-3 was not parsed correctly:
	 * could not retrieve the location
2019-07-04T12:30:47-0500 scrapd.core.apd:911  9 fatality page(s) is/are within the specified time range.
2019-07-04T12:30:47-0500 scrapd.core.apd:867  Fetching page 3...
2019-07-04T12:30:48-0500 scrapd.core.apd:878  9 fatality page(s) to process.
2019-07-04T12:30:53-0500 scrapd.core.apd:809  Fatality report http://austintexas.gov/news/traffic-fatality-26-3 was not parsed correctly:
	 * could not retrieve the location
2019-07-04T12:30:54-0500 scrapd.core.apd:809  Fatality report http://austintexas.gov/news/traffic-fatality-22-3 was not parsed correctly:
	 * could not retrieve the location
2019-07-04T12:30:54-0500 scrapd.core.apd:809  Fatality report http://austintexas.gov/news/traffic-fatality-21-3 was not parsed correctly:
	 * could not retrieve the location
2019-07-04T12:30:54-0500 scrapd.core.apd:809  Fatality report http://austintexas.gov/news/traffic-fatality-25-4 was not parsed correctly:
	 * could not retrieve the deceased information
	 * could not retrieve the location
	 * no deceased information found in fatality page
	 * age is invalid: None
2019-07-04T12:30:54-0500 scrapd.core.apd:809  Fatality report http://austintexas.gov/news/traffic-fatality-18-4 was not parsed correctly:
	 * could not retrieve the location
2019-07-04T12:30:54-0500 scrapd.core.apd:809  Fatality report http://austintexas.gov/news/traffic-fatality-23-3 was not parsed correctly:
	 * could not retrieve the location
2019-07-04T12:30:54-0500 scrapd.core.apd:809  Fatality report http://austintexas.gov/news/traffic-fatality-24-5 was not parsed correctly:
	 * could not retrieve the location
2019-07-04T12:30:54-0500 scrapd.core.apd:809  Fatality report http://austintexas.gov/news/traffic-fatality-19-5 was not parsed correctly:
	 * could not retrieve the location
2019-07-04T12:30:54-0500 scrapd.core.apd:809  Fatality report http://austintexas.gov/news/traffic-fatality-20-4 was not parsed correctly:
	 * could not retrieve the location
	 * age is invalid: -1
2019-07-04T12:30:54-0500 scrapd.core.apd:911  9 fatality page(s) is/are within the specified time range.
2019-07-04T12:30:54-0500 scrapd.cli.cli:87   Total: 21

Checklist:

  • [] I have updated the documentation accordingly
  • [] I have written unit tests

Fixes #152

Improve parsing failure logging
Logs more details about the fields that could not be parsed correctly.

Fixes #152

@rgreinho rgreinho self-assigned this Jul 4, 2019

@rgreinho rgreinho merged commit 84ef3c3 into scrapd:master Jul 4, 2019

9 checks passed

Summary 1 potential rule
Details
ci/circleci: docs Your tests passed on CircleCI!
Details
ci/circleci: format Your tests passed on CircleCI!
Details
ci/circleci: lint Your tests passed on CircleCI!
Details
ci/circleci: prepare Your tests passed on CircleCI!
Details
ci/circleci: test-integrations Your tests passed on CircleCI!
Details
ci/circleci: test-units Your tests passed on CircleCI!
Details
coverage/coveralls Coverage remained the same at 100.0%
Details
security/snyk - requirements.txt (rgreinho) No manifest changes detected
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.