-
-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Determine key features in diffs that could be used for filtration #51
Comments
@janakrajchadha When you decide you've looked at enough diffs to get an initial sense of their variety, can you put some remarks here? |
@danielballan Well, I can never be too sure if I've looked at enough diffs. However, I'll add interesting observations here.
For now, I've only used versions from Internet Archive. Other than dates, I have not noticed other irrelevant changes which occur frequently. |
Thanks for the comments above @janakrajchadha. Re.:
Have you not seen any examples of entire page sections that might be both updated frequently and irrelevant, such as a scrolling banner ad? |
I tried looking for patterns for scrolling banners and other sections but as any information or article can be included in a scrolling banner, it is hard to create a simple filter for them. I haven't seen any ads as we're mostly dealing with government agency websites, so scrolling banner ads are out of the picture. |
|
Just noting that after convo in Slack, we are leaning toward id'ing scrolling news feeds & banners as far as possible, though probably tagging those changes rather than filtering them automatically, per above concern. |
FWIW: in the new DB, we have annotations (all the current ways we classify in spreadsheets), priority, and significance. It might make sense to auto-assign scrolling news feeds the normal annotation (repeated changes 12) and maybe a low priority (if they are the only change on the page), like 0.1 (priority is from 0-1). |
No description provided.
The text was updated successfully, but these errors were encountered: