Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revise bill relations filter #112

Merged
merged 2 commits into from
Jan 12, 2021
Merged

Revise bill relations filter #112

merged 2 commits into from
Jan 12, 2021

Conversation

hancush
Copy link
Collaborator

@hancush hancush commented Jan 12, 2021

Description

In #47, we assumed that current relations would share the highest flag value. Per Metro-Records/la-metro-councilmatic#669 (comment), this is not the case. This PR updates the relations method to return a deduplicated list of relations use the most recent version of each relation, rather than a deduplicated list of relations sharing the max value of the relation flag across the entire set. It also exposes a method that can be overridden in downstream scraper instances to customize how, if at all, relations should be filtered during a scrape.

Connects Metro-Records/la-metro-councilmatic#669.

Notes

We aren't 100% sure how the relation flag value is set (Metro is looking into it), but we do know that it isn't necessarily meaningful across all relations, only within versions of the same related bill.

Testing instructions

  • Navigate into your local scraper instance and install an editable version of this branch into your virtual environment: pip install -e /path/to/python-legistar-scraper
  • Scrape two classes of Metro matter relations: pupa update lametro --scrape bills matter_ids=4455,6008
  • View the scraped data in _data/lametro/bill* and confirm that the related_bills array in both files does not contain duplicates but does contain a relation object for each distinct bill in the API call.

@hancush hancush requested a review from fgregg January 12, 2021 15:24
)

# Assumes that there will not be more than 10 versions of a relation.
seen_relations = deque([], maxlen=10)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think a set object would be safer and clearer here. is there a reason to strongly prefer a deque?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

relations is a list of dictionaries. Dictionaries are not hashable, so relations can't be cast to a set. Also, relation objects referring to the same related bill are not identical duplicates:

<GranicusMatterRelation>
<MatterRelationFlag>1</MatterRelationFlag>
<MatterRelationGuid>1B541337-1A65-4D93-A6B3-BCFE4F5DA75C</MatterRelationGuid>
<MatterRelationId>1195</MatterRelationId>
<MatterRelationLastModifiedUtc>2017-09-20T22:38:45.3533333</MatterRelationLastModifiedUtc>
<MatterRelationMatterId>4333</MatterRelationMatterId>
<MatterRelationRowVersion>AAAAAACJaxw=</MatterRelationRowVersion>
</GranicusMatterRelation>

<GranicusMatterRelation>
<MatterRelationFlag>2</MatterRelationFlag>
<MatterRelationGuid>7863769D-6A6B-46AB-8F86-47CD6D0E7C33</MatterRelationGuid>
<MatterRelationId>1206</MatterRelationId>
<MatterRelationLastModifiedUtc>2017-10-09T22:27:14.51</MatterRelationLastModifiedUtc>
<MatterRelationMatterId>4333</MatterRelationMatterId>
<MatterRelationRowVersion>AAAAAACOQLs=</MatterRelationRowVersion>
</GranicusMatterRelation>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's true, but you are just checking relation_id which is hashable?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. I misunderstood your suggestion. Done.

@hancush hancush requested a review from fgregg January 12, 2021 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants