Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR]: few more simple standard options for not remarking articles UnRead #1279

Closed
Owyn opened this issue Jan 22, 2024 · 13 comments
Closed

[FR]: few more simple standard options for not remarking articles UnRead #1279

Owyn opened this issue Jan 22, 2024 · 13 comments
Assignees
Labels
Component-DB Status-Fixed Ticket is resolved. Type-Enhancement This is request for brand new feature.
Milestone

Comments

@Owyn
Copy link
Contributor

Owyn commented Jan 22, 2024

Brief description of the feature request

image
I think few more options here in addition to this outlined one would be appropriate and very handy for the users:

We have "Ignore changes in article body", so why not have:

1) Ignore changes in article title
2) Ignore changes in article date

?

Because now I have to apply "No duplicates by URL" (msg.isAlreadyInDatabase(MessageObject.SameUrl) filter to basically every feed I have, and when I add a new feed I have a chore to go to the filter list and add it to that filter, - twitter feeds tend to change titles a lot, either authors edit them or it's a technical thing done by twitter itself (I think) with some special characters like #

and for the Date - scrappers generate article dates based on relative dates (e.g. "yesterday" or "two hours ago") so article dates in feeds would fluctuate remarking em "unread" each and every time - I see no reason to remark something as unread just because its date changed? related Owyn/CSS2RSS#4

There shouldn't be a need to use filters for such standard things (and using filters might scare an average user)
and there shouldn't be a need to use a filter for literally every feed (because it is bothersome to re-apply it every newly added feed)

@Owyn Owyn added the Type-Enhancement This is request for brand new feature. label Jan 22, 2024
@martinrotter
Copy link
Owner

There was added a automatic feature which considers "time" unchanged if it differs from stored time max 120 seconds. If you believe that this thresshold should be made bigger, let me know.

As for your suggestions. Yes, these would make some sense to implement. That said, my TODO list is HUGE, sadly.

@martinrotter
Copy link
Owner

and for the Date - scrappers generate article dates based on relative dates (e.g. "yesterday" or "two hours ago") so article dates in feeds would fluctuate remarking em "unread" each and every time - I see no reason to remark something as unread just because its date changed? related Owyn/CSS2RSS#4

As for scrapers. I have some scraped websites which post absolute date time in some element and I would greatly like to parse these. In this case, everything will work greatly. I even have custom scripts which convert relative timestamps to absolute ones.

And like I said this is partly mitigated by upcoming feature to consider two dates (which differ to upto 120 seconds) as "same".

Moreover, do you feel that when article is "updated" it should NEVER be re-marked as unread by RSS Guard and original state should be kept all the time? If I make such a change, will you b e able to promptly test for me?

@Owyn
Copy link
Contributor Author

Owyn commented Jan 22, 2024

Moreover, do you feel that when article is "updated" it should NEVER be re-marked as unread

For my use-cases - yes,
but perhaps people would still want to be notified when something was changed - that would make sense for when the body is changed - but there's already an option for that

and I can do a swift test by using a portable version with no problems

max 120 seconds

that would not make a difference, look at the following relative dates:

  • two hours ago
  • yesterday
  • two weeks ago
  • last year

e.g. "two weeks ago" would change into "three weeks ago" only in a... week of time, not after 120 seconds, so every fetch (every 15 min) the date will be shifted by 15 minutes

@martinrotter
Copy link
Owner

  • two hours ago

  • yesterday

  • two weeks ago

  • last year

Like I said, I even think that CSS2RSS script should/could automagically resolved these dates. There is excellent python libarry for this called

https://dateparser.readthedocs.io/en/latest/

it can parse relative dates too with one line of python. Look it up. It is genious. If that was integrated into css2rss, it would be killer feature. User simply with argument picks an element (whatever) containing the absolute/relative date/time string, and boom, parsed. I already use it.

https://github.com/martinrotter/rssguard/blob/master/resources/scripts/scrapers/hudebnibazar.py#L53

@martinrotter
Copy link
Owner

APPENDUM: Yes, for GREATLY relative times like "week ago", it is really problem. Luckily, my use-case does not do that much "relativity", it is always relative with per-hour precision like "1 hour ago, 2 hour ago" so I always round date/times to whole hours.

@Owyn
Copy link
Contributor Author

Owyn commented Jan 22, 2024

Like I said, I even think that CSS2RSS script should/could automagically resolved these dates. There is excellent python libarry for this called

it does, it uses maya library which uses dataparser you have mentioned in it to do it in one line \ one function call

Luckily, my use-case does not do that much "relativity", it is always relative with per-hour precision like "1 hour ago, 2 hour ago" so I always round date/times to whole hours.

Test scenario:
you fetch an article dated "1 hour ago",
15 minutes passes,
you fetch it again and it still says "1 hour ago" - the date on the article would now be +15 minutes from what it was last time 15 minutes ago cuz it still says "1 hour ago" and would still say so for the next 45 minutes

result:
the article will be remarked as unread cuz 15 minutes is bigger than 120 seconds

@martinrotter
Copy link
Owner

Like I said, I even think that CSS2RSS script should/could automagically resolved these dates. There is excellent python libarry for this called

it does, it uses maya library which uses dataparser you have mentioned in it to do it in one line \ one function call

it is always relative with per-hour precision like "1 hour ago, 2 hour ago" so I always round date/times to whole hours.

Test scenario: you fetch an article dated "1 hour ago", 15 minutes passes, you fetch it again and it still says "1 hour ago" - they date on the article would now be +15 minutes from what it was last time 15 minutes ago cuz it still says "1 hour ago" and would still say so for the next 45 minutes

That is why I talker about precision. For some use case, like one of mine, what I did was following. Lets say it is 15:18 and article says 1 hour ago.

What I do, call dateparser(1 hour ago) -> 14:18 -> trim all minutes -> 14:00.

Lets say 15 minute passes. my uses case is "nice" and while it says "1 hour ago" it still likely internally know exact time, so even when time is 15:59 . then

call dateparser(1 hour ago) -> 14:59 -> trim all minutes -> 14:00.

then it is 16:02 and it returns "2 hours ago"

call dateparser(2 hour ago) -> 14:02 -> trim all minutes -> 14:00.

So when the source uses relative time with good enough "precision", then some techniques can be used to mitigate any problems.

But I agree this is only specific use case.

I pushed cda0f79, can you wait for build (or build yourself) and test? articles now should NOT be re-marked as unread if they are updated, but existing read state is honored instead.

@Owyn
Copy link
Contributor Author

Owyn commented Jan 22, 2024

I have tested it - changing the now doesn't seem to remark items as unread or to change their date at all

but changing titles is not recognized as a change at all - rather changed titles are considered brand new feeds

image

@martinrotter
Copy link
Owner

Does that particular article has its own "ID" provided from feed or script?

@Owyn
Copy link
Contributor Author

Owyn commented Jan 22, 2024

Does that particular article has its own "ID" provided from feed or script?

it's a feed generated by my CSS2RSS scrapper so I guess - there's no id

@martinrotter
Copy link
Owner

martinrotter commented Jan 23, 2024

RSS Guard has very robust logic of determining "uniqueness" of an article, meaning finding if some version of article already is stored in DB.

Mainly, most of feeds do provide "guid" for each of their entries. If such ID is provided by the feed, it is used as only key for identifiyng articles. In scripts, it is good practice to use URL of article as "guid". If there is now GUID then author/title/url triplet is used to identify messages and all three fields must match to mark the message as being "the one".

Check lines 1574-1600 of this file.

@martinrotter
Copy link
Owner

Addendum: I just tested article "updating" with local file-feed and everything works as expected. When date/time is only updated -> change is visible in rssguard (regardless if article has guid or not) and no duplicate article is created.

If I changed title for article which did not have guid, then yes, new article is created in rssguard, this is expected behavior at this point.

If article has guid, I tested changing date/time, title, author -> all changes are visible in rssguard, no duplicate article is correctly created.

@martinrotter
Copy link
Owner

Closing this for now, as "re-marking as unread" itself is now solved, for other issues, create separate tickets. :)

@martinrotter martinrotter added this to the 4.6.4 milestone Jan 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component-DB Status-Fixed Ticket is resolved. Type-Enhancement This is request for brand new feature.
Projects
None yet
Development

No branches or pull requests

2 participants