Skip to content
This repository has been archived by the owner on Feb 2, 2022. It is now read-only.

Commit

Permalink
Improve location regex (#169)
Browse files Browse the repository at this point in the history
Ensures the location regex matches more cases.

Fixes #168
  • Loading branch information
rgreinho committed Jul 4, 2019
1 parent 84ef3c3 commit e2d0442
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 4 deletions.
9 changes: 5 additions & 4 deletions scrapd/core/apd.py
Expand Up @@ -704,11 +704,12 @@ def parse_location_field(page):
location_pattern = re.compile(
r'''
>Location: # The name of the desired field.
.* # Any character
> # The '>' character
\s* # Any whitespace (at least 2)
(?:</span>)? # Non capture closing strong tag
(?:</strong>)? # Non capture closing strong tag
\s{2,} # Any whitespace (at least 2)
(?:</strong>) # Non capture closing strong tag
?([^<]+) # Capture any character except '<'.
(?:</strong>)? # Non capture closing strong tag
([^<]+) # Capture any character except '<'.
''',
re.VERBOSE,
)
Expand Down
4 changes: 4 additions & 0 deletions tests/core/test_apd.py
Expand Up @@ -759,6 +759,10 @@ def test_sanitize_fatality_entity(input_, expected):
'>Location:</strong>     183 service road westbound and Payton Gin Rd.</p>',
'183 service road westbound and Payton Gin Rd.',
),
(
'<p> <strong>Location: </strong>8900 block of N Capital of Texas Highway </p>',
'8900 block of N Capital of Texas Highway ',
),
))
def test_parse_location_field_00(input_, expected):
"""Ensure."""
Expand Down

0 comments on commit e2d0442

Please sign in to comment.