Skip to content
This repository has been archived by the owner on May 3, 2022. It is now read-only.

Title changes sometimes are not correctly unattributed #61

Open
drinks opened this issue Mar 6, 2013 · 2 comments
Open

Title changes sometimes are not correctly unattributed #61

drinks opened this issue Mar 6, 2013 · 2 comments

Comments

@drinks
Copy link
Contributor

drinks commented Mar 6, 2013

Reference (search "By Mr. HARKIN (for himself,"):

http://www.gpo.gov/fdsys/pkg/CREC-2013-03-05/html/CREC-2013-03-05-pt1-PgS1129.htm
vs.
http://capitolwords.org/date/2013/03/05/S1129_statements-on-introduced-bills-and-joint-resolutio/

Suspect a whitespace issue with the parser.

@drinks
Copy link
Contributor Author

drinks commented Mar 6, 2013

Relevant line introduced in b091e8a: https://github.com/sunlightlabs/Capitol-Words/blame/master/parser/parser.py#L163

Expects titles to be a single line.

@drinks
Copy link
Contributor Author

drinks commented Mar 6, 2013

Encapsulating logic seems to be:

  • Line indented 6 spaces starts with 'By ' + re_newspeaker
  • Subsequent, optional lines indented 8 spaces
  • New paragraph that does not match re_newspeaker

Should be attributed as 'recorder'
The previous line with 8 centered underscores is truncated to empty by clean_line().

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant