Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider alternative methods of determining whether an entity has been updated #107

Closed
hancush opened this issue Mar 31, 2020 · 5 comments · Fixed by #109
Closed

Consider alternative methods of determining whether an entity has been updated #107

hancush opened this issue Mar 31, 2020 · 5 comments · Fixed by #109

Comments

@hancush
Copy link
Collaborator

hancush commented Mar 31, 2020

There are several known actions that change a bill or event, but do not update the last modified timestamp. These include making an agenda or media link public. This means that those entities will not be captured in so-called "windowed" scrapes (that is, scrapes that only check for entities updated since a certain timestamp). One option for capturing those changes is to run full scrapes more frequently, but that can be a time and resource intensive step. Are there other ways to determine when an entity has been updated?

@fgregg
Copy link
Contributor

fgregg commented Apr 15, 2020

I don't think there are, but I had a potentially useful idea for a windowed scrape (at least for events).

Right now, we attempt the widowed scrapes attempt to find all events that have been updated after the beginning of the window.

It would probably be helpful if we also scraped events that are scheduled to occur after the beginning of the window.

These recent and upcoming events are the ones most likely to change.

An increased false positive rate seems worth it if this helps us really reduce the false negatives.

@hancush
Copy link
Collaborator Author

hancush commented Apr 15, 2020

@fgregg That's really smart!

@fgregg
Copy link
Contributor

fgregg commented Apr 15, 2020

Could do something similar with bills by using

MatterAgendaDate
MatterIntroducitonDate

and the other date field on matters.

@fgregg
Copy link
Contributor

fgregg commented Apr 20, 2020

let me know if you'd like to see a PR

@hancush
Copy link
Collaborator Author

hancush commented Apr 21, 2020

@fgregg We have the go-ahead from Metro to prioritize this for May. I'd love to review a PR at your convenience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

2 participants