Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR]: SameId for article filters #509

Closed
sakkamade opened this issue Sep 28, 2021 · 23 comments
Closed

[FR]: SameId for article filters #509

sakkamade opened this issue Sep 28, 2021 · 23 comments
Assignees
Labels
Component-Message-Filters Status-Fixed Ticket is resolved. Type-Enhancement This is request for brand new feature.
Milestone

Comments

@sakkamade
Copy link
Contributor

Brief description of the feature request

At last laying my hands on scraping and filtering features (truly amazing thing, btw), I found that a crucial for me element is likely missing.

It is also not documented here, so I suspect that it does not exist yet.

@sakkamade sakkamade added the Type-Enhancement This is request for brand new feature. label Sep 28, 2021
@sakkamade
Copy link
Contributor Author

sakkamade commented Sep 28, 2021

Well, I guess in my case I can adapt it to the SameTitle as well.
But please do tell me if it's okay.

@martinrotter
Copy link
Owner

not exist yet, can be safely added

Well, I guess in my case I can adapt it to the SameTitle as well.
But please do tell me if it's okay.

Sure, you can do that, I will add sameid to. But your probably do not want SameId, you want SameCustomId.

id is rssguard's internal ID of the message while customid is ID of the message as was provided by the feed/service.

https://github.com/martinrotter/rssguard/blob/master/resources/docs/Documentation.md#messageobject-class

?

@sakkamade
Copy link
Contributor Author

while customid is ID of the message as was provided by the feed/service.

Yes, this one.

@martinrotter
Copy link
Owner

b35c977

@martinrotter martinrotter added the Status-Fixed Ticket is resolved. label Sep 30, 2021
@sakkamade
Copy link
Contributor Author

Thank you very much!

@sakkamade
Copy link
Contributor Author

@martinrotter Does not seem to work?

@martinrotter
Copy link
Owner

Will double check.

@martinrotter
Copy link
Owner

I checked and made some changes and should work, sample log from app:

time="    10.167" type="debug" -> Message custom ID: https://www.cnn.com/2021/07/11/us/eliot-middleton-mechanic-fix-donates-cars-bbq-trnd/index.html
time="    10.168" type="debug" -> message-model: Prepared query for MSG duplicate identification is: 'SELECT COUNT(*) FROM Messages WHERE custom_id = :custom_id AND account_id = :account_id AND feed = :feed;'.
time="    10.168" type="debug" -> database: Executed SQL for message duplicates check: 'SELECT COUNT(*) FROM Messages WHERE custom_id = 'https://www.cnn.com/2021/07/11/us/eliot-middleton-mechanic-fix-donates-cars-bbq-trnd/index.html' AND account_id = 1 AND feed = '2';'.

Sample filter:

function filterMessage() {
  console.log("Message custom ID: " + msg.customId);
  var isDupl = msg.isDuplicate(MessageObject.SameCustomId);
  return isDupl ? MessageObject.Ignore : MessageObject.Accept;
}

@sakkamade
Copy link
Contributor Author

Although my script does not look as neat, it does work with SameTitle, but not with the SameCustomId:

const conditions = [
  	'title1',
  	'title2'
	];
function filterMessage() {
  if (msg.isDuplicateWithAttribute(MessageObject.SameCustomId)) {
    return MessageObject.Ignore;
  } 
  else if (conditions.some(el => msg.title.includes(el)) >= 1) {
    return MessageObject.Accept;
  } 
  else {
    return MessageObject.Ignore;
  }
}

Thoughts?

@martinrotter
Copy link
Owner

martinrotter commented Oct 8, 2021

let me test your script, do you have any specific feed i can test with? (not all feed provide "id" for their articles)

@martinrotter
Copy link
Owner

btw your some usage is not entirely correct as some return Boolean, not number

else if (conditions.some(el => msg.title.includes(el))) {
    return MessageObject.Accept;
  } 

@sakkamade
Copy link
Contributor Author

sakkamade commented Oct 8, 2021

do you have any specific feed i can test with?

Well, I use it with custom feed (and I certainly know that the ids remain constant), but it would take some hours of me to rewrite it here now.
Only tomorrow, if you really need it.

btw your some usage is not entirely correct as some return Boolean,

Don't worry, my JS knowledge is awfully basic, that part was something I adopted from stackoverflow, (I mean I forget it tomorrow, or after an hour 😄 ) but thank you.

@martinrotter
Copy link
Owner

@sakkamade Just tested your script (second sync run, first sync downloaded articles, then I enabled the filter):

time="  1557.062" type="debug" -> core: Downloading URL 'https://github.com/martinrotter/rssguard/commits/master.atom' to obtain feed data.
time="  1557.063" type="debug" -> network: Settings of BaseNetworkAccessManager loaded.
time="  1557.146" type="debug" -> network: Destroying Downloader instance.
time="  1557.146" type="debug" -> network: Destroying SilentNetworkAccessManager instance.
time="  1557.168" type="debug" -> feed-downloader: Downloaded 20 messages for feed ID '6' URL: 'https://github.com/martinrotter/rssguard/commits/master.atom' title: 'Recent Commits to rssguard:master' in thread: '0x2054'. Operation took 105954 microseconds.
time="  1557.175" type="debug" -> feed-downloader: Setting up JS evaluation took 5029 microseconds.
time="  1557.175" type="debug" -> feed-downloader: Hooking message took 5 microseconds.
time="  1557.175" type="debug" -> message-model: Prepared query for MSG duplicate identification is: 'SELECT COUNT(*) FROM Messages WHERE custom_id = :custom_id AND account_id = :account_id AND feed = :feed;'.
time="  1557.176" type="debug" -> database: Executed SQL for message duplicates check: 'SELECT COUNT(*) FROM Messages WHERE custom_id = 'tag:github.com,2008:Grit::Commit/a6ea4b08a958d8b0a90030273765d33f0efbcf51' AND account_id = 1 AND feed = '6';'.
time="  1557.176" type="debug" -> core: Message 'Update Documentation (#516) ' was identified as duplicate by filter script.
time="  1557.176" type="debug" -> feed-downloader: Running filter script, it took 1089 microseconds.
time="  1557.176" type="debug" -> feed-downloader: Hooking message took 5 microseconds.
time="  1557.176" type="debug" -> message-model: Prepared query for MSG duplicate identification is: 'SELECT COUNT(*) FROM Messages WHERE custom_id = :custom_id AND account_id = :account_id AND feed = :feed;'.
time="  1557.177" type="debug" -> database: Executed SQL for message duplicates check: 'SELECT COUNT(*) FROM Messages WHERE custom_id = 'tag:github.com,2008:Grit::Commit/2f39114e8abd5b3f1aaee196019a74598360e21b' AND account_id = 1 AND feed = '6';'.
time="  1557.177" type="debug" -> core: Message 'maybe some fixups for filtering with custom id + separate widths stor… ' was identified as duplicate by filter script.
time="  1557.177" type="debug" -> feed-downloader: Running filter script, it took 854 microseconds.
time="  1557.177" type="debug" -> feed-downloader: Hooking message took 5 microseconds.
time="  1557.177" type="debug" -> message-model: Prepared query for MSG duplicate identification is: 'SELECT COUNT(*) FROM Messages WHERE custom_id = :custom_id AND account_id = :account_id AND feed = :feed;'.
time="  1557.178" type="debug" -> database: Executed SQL for message duplicates check: 'SELECT COUNT(*) FROM Messages WHERE custom_id = 'tag:github.com,2008:Grit::Commit/ff228d79920f6362fce550463ef6b4f8a86dc110' AND account_id = 1 AND feed = '6';'.
time="  1557.178" type="debug" -> core: Message 'shorter method name ' was identified as duplicate by filter script.
time="  1557.178" type="debug" -> feed-downloader: Running filter script, it took 841 microseconds.
time="  1557.178" type="debug" -> feed-downloader: Hooking message took 5 microseconds.
time="  1557.178" type="debug" -> message-model: Prepared query for MSG duplicate identification is: 'SELECT COUNT(*) FROM Messages WHERE custom_id = :custom_id AND account_id = :account_id AND feed = :feed;'.
time="  1557.178" type="debug" -> database: Executed SQL for message duplicates check: 'SELECT COUNT(*) FROM Messages WHERE custom_id = 'tag:github.com,2008:Grit::Commit/47c68677dd93f7097f25225482b1ff7931f26d72' AND account_id = 1 AND feed = '6';'.
time="  1557.178" type="debug" -> core: Message 'use same forward/back icons ' was identified as duplicate by filter script.
time="  1557.179" type="debug" -> feed-downloader: Running filter script, it took 839 microseconds.
time="  1557.179" type="debug" -> feed-downloader: Hooking message took 5 microseconds.
time="  1557.179" type="debug" -> message-model: Prepared query for MSG duplicate identification is: 'SELECT COUNT(*) FROM Messages WHERE custom_id = :custom_id AND account_id = :account_id AND feed = :feed;'.
time="  1557.179" type="debug" -> database: Executed SQL for message duplicates check: 'SELECT COUNT(*) FROM Messages WHERE custom_id = 'tag:github.com,2008:Grit::Commit/31a959efe8fc4db87a7d132691e21b6e890cedc6' AND account_id = 1 AND feed = '6';'.
time="  1557.179" type="debug" -> core: Message 'use same forward/back icons ' was identified as duplicate by filter script.
time="  1557.179" type="debug" -> feed-downloader: Running filter script, it took 835 microseconds.
time="  1557.179" type="debug" -> feed-downloader: Hooking message took 5 microseconds.
time="  1557.180" type="debug" -> message-model: Prepared query for MSG duplicate identification is: 'SELECT COUNT(*) FROM Messages WHERE custom_id = :custom_id AND account_id = :account_id AND feed = :feed;'.
time="  1557.180" type="debug" -> database: Executed SQL for message duplicates check: 'SELECT COUNT(*) FROM Messages WHERE custom_id = 'tag:github.com,2008:Grit::Commit/3da397a215939a37d8bc0f089a216c99ca36b894' AND account_id = 1 AND feed = '6';'.
time="  1557.180" type="debug" -> core: Message 'Add new skin (by sakkamade #512) ' was identified as duplicate by filter script.
time="  1557.180" type="debug" -> feed-downloader: Running filter script, it took 823 microseconds.
time="  1557.180" type="debug" -> feed-downloader: Hooking message took 5 microseconds.
time="  1557.181" type="debug" -> message-model: Prepared query for MSG duplicate identification is: 'SELECT COUNT(*) FROM Messages WHERE custom_id = :custom_id AND account_id = :account_id AND feed = :feed;'.
time="  1557.181" type="debug" -> database: Executed SQL for message duplicates check: 'SELECT COUNT(*) FROM Messages WHERE custom_id = 'tag:github.com,2008:Grit::Commit/ee2e77bb667c3780cbbdefcc8fe17177d4dca823' AND account_id = 1 AND feed = '6';'.
time="  1557.181" type="debug" -> core: Message 'Update README ' was identified as duplicate by filter script.
time="  1557.181" type="debug" -> feed-downloader: Running filter script, it took 823 microseconds.
time="  1557.181" type="debug" -> feed-downloader: Hooking message took 5 microseconds.
time="  1557.182" type="debug" -> message-model: Prepared query for MSG duplicate identification is: 'SELECT COUNT(*) FROM Messages WHERE custom_id = :custom_id AND account_id = :account_id AND feed = :feed;'.
time="  1557.182" type="debug" -> database: Executed SQL for message duplicates check: 'SELECT COUNT(*) FROM Messages WHERE custom_id = 'tag:github.com,2008:Grit::Commit/cfd7e3c9cdbc6a989e413f8fade09c8960cdb31c' AND account_id = 1 AND feed = '6';'.
time="  1557.182" type="debug" -> core: Message 'Merge branch 'master' of github.com:martinrotter/rssguard ' was identified as duplicate by filter script.
time="  1557.182" type="debug" -> feed-downloader: Running filter script, it took 826 microseconds.
time="  1557.182" type="debug" -> feed-downloader: Hooking message took 5 microseconds.
time="  1557.183" type="debug" -> message-model: Prepared query for MSG duplicate identification is: 'SELECT COUNT(*) FROM Messages WHERE custom_id = :custom_id AND account_id = :account_id AND feed = :feed;'.
time="  1557.183" type="debug" -> database: Executed SQL for message duplicates check: 'SELECT COUNT(*) FROM Messages WHERE custom_id = 'tag:github.com,2008:Grit::Commit/2bae9108837b2e2a8a0a9afa34b90b8e3aba5b54' AND account_id = 1 AND feed = '6';'.
time="  1557.183" type="debug" -> core: Message 'allow optional style file for skins ' was identified as duplicate by filter script.
time="  1557.183" type="debug" -> feed-downloader: Running filter script, it took 925 microseconds.
time="  1557.183" type="debug" -> feed-downloader: Hooking message took 5 microseconds.
time="  1557.184" type="debug" -> message-model: Prepared query for MSG duplicate identification is: 'SELECT COUNT(*) FROM Messages WHERE custom_id = :custom_id AND account_id = :account_id AND feed = :feed;'.
time="  1557.184" type="debug" -> database: Executed SQL for message duplicates check: 'SELECT COUNT(*) FROM Messages WHERE custom_id = 'tag:github.com,2008:Grit::Commit/3bfee2c8c81cc61df7273f12205515af749ebaaf' AND account_id = 1 AND feed = '6';'.
time="  1557.184" type="debug" -> core: Message 'allow optional style file for skins ' was identified as duplicate by filter script.
time="  1557.184" type="debug" -> feed-downloader: Running filter script, it took 903 microseconds.
time="  1557.184" type="debug" -> feed-downloader: Hooking message took 5 microseconds.
time="  1557.185" type="debug" -> message-model: Prepared query for MSG duplicate identification is: 'SELECT COUNT(*) FROM Messages WHERE custom_id = :custom_id AND account_id = :account_id AND feed = :feed;'.
time="  1557.185" type="debug" -> database: Executed SQL for message duplicates check: 'SELECT COUNT(*) FROM Messages WHERE custom_id = 'tag:github.com,2008:Grit::Commit/276b731e7085f585da65e40b2087a8ef7742c602' AND account_id = 1 AND feed = '6';'.
time="  1557.185" type="debug" -> core: Message 'Update README.md ' was identified as duplicate by filter script.
time="  1557.185" type="debug" -> feed-downloader: Running filter script, it took 877 microseconds.
time="  1557.185" type="debug" -> feed-downloader: Hooking message took 5 microseconds.
time="  1557.186" type="debug" -> message-model: Prepared query for MSG duplicate identification is: 'SELECT COUNT(*) FROM Messages WHERE custom_id = :custom_id AND account_id = :account_id AND feed = :feed;'.
time="  1557.186" type="debug" -> database: Executed SQL for message duplicates check: 'SELECT COUNT(*) FROM Messages WHERE custom_id = 'tag:github.com,2008:Grit::Commit/6e02f75b483219e4538e236f8c8ea1ced19e822e' AND account_id = 1 AND feed = '6';'.
time="  1557.186" type="debug" -> core: Message 'Update Documentation.md ' was identified as duplicate by filter script.
time="  1557.186" type="debug" -> feed-downloader: Running filter script, it took 864 microseconds.
time="  1557.186" type="debug" -> feed-downloader: Hooking message took 5 microseconds.
time="  1557.186" type="debug" -> message-model: Prepared query for MSG duplicate identification is: 'SELECT COUNT(*) FROM Messages WHERE custom_id = :custom_id AND account_id = :account_id AND feed = :feed;'.
time="  1557.187" type="debug" -> database: Executed SQL for message duplicates check: 'SELECT COUNT(*) FROM Messages WHERE custom_id = 'tag:github.com,2008:Grit::Commit/d8248ec51b3905192173e8741098d9bd48306c9d' AND account_id = 1 AND feed = '6';'.
time="  1557.187" type="debug" -> core: Message 'Update README.md ' was identified as duplicate by filter script.
time="  1557.187" type="debug" -> feed-downloader: Running filter script, it took 828 microseconds.
time="  1557.187" type="debug" -> feed-downloader: Hooking message took 5 microseconds.
time="  1557.187" type="debug" -> message-model: Prepared query for MSG duplicate identification is: 'SELECT COUNT(*) FROM Messages WHERE custom_id = :custom_id AND account_id = :account_id AND feed = :feed;'.
time="  1557.188" type="debug" -> database: Executed SQL for message duplicates check: 'SELECT COUNT(*) FROM Messages WHERE custom_id = 'tag:github.com,2008:Grit::Commit/c19469a9b0060fce3da1e919a42b4db587377d80' AND account_id = 1 AND feed = '6';'.
time="  1557.188" type="debug" -> core: Message 'ex bit ' was identified as duplicate by filter script.
time="  1557.188" type="debug" -> feed-downloader: Running filter script, it took 826 microseconds.
time="  1557.188" type="debug" -> feed-downloader: Hooking message took 5 microseconds.
time="  1557.188" type="debug" -> message-model: Prepared query for MSG duplicate identification is: 'SELECT COUNT(*) FROM Messages WHERE custom_id = :custom_id AND account_id = :account_id AND feed = :feed;'.
time="  1557.188" type="debug" -> database: Executed SQL for message duplicates check: 'SELECT COUNT(*) FROM Messages WHERE custom_id = 'tag:github.com,2008:Grit::Commit/9619990e50d0166acac73dde9fc80447eb48c004' AND account_id = 1 AND feed = '6';'.
time="  1557.188" type="debug" -> core: Message 'ex bit ' was identified as duplicate by filter script.
time="  1557.189" type="debug" -> feed-downloader: Running filter script, it took 830 microseconds.
time="  1557.189" type="debug" -> feed-downloader: Hooking message took 5 microseconds.
time="  1557.189" type="debug" -> message-model: Prepared query for MSG duplicate identification is: 'SELECT COUNT(*) FROM Messages WHERE custom_id = :custom_id AND account_id = :account_id AND feed = :feed;'.
time="  1557.189" type="debug" -> database: Executed SQL for message duplicates check: 'SELECT COUNT(*) FROM Messages WHERE custom_id = 'tag:github.com,2008:Grit::Commit/56e617dfa1fe97ddcc762063221d184bf035cba6' AND account_id = 1 AND feed = '6';'.
time="  1557.189" type="debug" -> core: Message 'fix plain skin ' was identified as duplicate by filter script.
time="  1557.190" type="debug" -> feed-downloader: Running filter script, it took 890 microseconds.
time="  1557.190" type="debug" -> feed-downloader: Hooking message took 5 microseconds.
time="  1557.190" type="debug" -> message-model: Prepared query for MSG duplicate identification is: 'SELECT COUNT(*) FROM Messages WHERE custom_id = :custom_id AND account_id = :account_id AND feed = :feed;'.
time="  1557.190" type="debug" -> database: Executed SQL for message duplicates check: 'SELECT COUNT(*) FROM Messages WHERE custom_id = 'tag:github.com,2008:Grit::Commit/2a172bf3ef563a408696d1ac18bd6826588930c7' AND account_id = 1 AND feed = '6';'.
time="  1557.190" type="debug" -> core: Message 'better formatting of tooltip valies in article list ' was identified as duplicate by filter script.
time="  1557.190" type="debug" -> feed-downloader: Running filter script, it took 902 microseconds.
time="  1557.191" type="debug" -> feed-downloader: Hooking message took 5 microseconds.
time="  1557.191" type="debug" -> message-model: Prepared query for MSG duplicate identification is: 'SELECT COUNT(*) FROM Messages WHERE custom_id = :custom_id AND account_id = :account_id AND feed = :feed;'.
time="  1557.191" type="debug" -> database: Executed SQL for message duplicates check: 'SELECT COUNT(*) FROM Messages WHERE custom_id = 'tag:github.com,2008:Grit::Commit/b35c9775459478397827c6d499c8b7979e7ad91c' AND account_id = 1 AND feed = '6';'.
time="  1557.191" type="debug" -> core: Message 'samecustomid ' was identified as duplicate by filter script.
time="  1557.191" type="debug" -> feed-downloader: Running filter script, it took 938 microseconds.
time="  1557.192" type="debug" -> feed-downloader: Hooking message took 5 microseconds.
time="  1557.192" type="debug" -> message-model: Prepared query for MSG duplicate identification is: 'SELECT COUNT(*) FROM Messages WHERE custom_id = :custom_id AND account_id = :account_id AND feed = :feed;'.
time="  1557.192" type="debug" -> database: Executed SQL for message duplicates check: 'SELECT COUNT(*) FROM Messages WHERE custom_id = 'tag:github.com,2008:Grit::Commit/c44d46e15dfc3af21d64c6b54466760e98b17bf3' AND account_id = 1 AND feed = '6';'.
time="  1557.192" type="debug" -> core: Message 'tooltips for all cells in msg list ' was identified as duplicate by filter script.
time="  1557.192" type="debug" -> feed-downloader: Running filter script, it took 898 microseconds.
time="  1557.192" type="debug" -> feed-downloader: Hooking message took 5 microseconds.
time="  1557.193" type="debug" -> message-model: Prepared query for MSG duplicate identification is: 'SELECT COUNT(*) FROM Messages WHERE custom_id = :custom_id AND account_id = :account_id AND feed = :feed;'.
time="  1557.193" type="debug" -> database: Executed SQL for message duplicates check: 'SELECT COUNT(*) FROM Messages WHERE custom_id = 'tag:github.com,2008:Grit::Commit/93ea31107fada1f8c49d558775a8313859faffa5' AND account_id = 1 AND feed = '6';'.
time="  1557.193" type="debug" -> core: Message 'resizable all columns ' was identified as duplicate by filter script.
time="  1557.193" type="debug" -> feed-downloader: Running filter script, it took 876 microseconds.
time="  1557.194" type="debug" -> Destroying FilterUtils instance.
time="  1557.194" type="debug" -> feed-downloader: Saving messages of feed ID '6' URL: 'https://github.com/martinrotter/rssguard/commits/master.atom' title: 'Recent Commits to rssguard:master' in thread: '0x2054'.
time="  1557.194" type="debug" -> No messages to be updated/added in DB for feed '6'.

@sakkamade
Copy link
Contributor Author

time=" 1557.194" type="debug" -> No messages to be updated/added in DB for feed '6'.

Oh, right. I forgot that it is a very specific filter for very specific feed... Later then.

@martinrotter
Copy link
Owner

No messages to be updated/added in DB for feed '6'.

This means that filter rejected all messages as their duplicate check with custom ID revealed that they already are in the DB.

@sakkamade
Copy link
Contributor Author

I am almost certain, however, that the fault is not there, since with SameTitle it works flawlessly, and the id and title of articles in that feed is identical.

@sakkamade
Copy link
Contributor Author

sakkamade commented Oct 8, 2021

This means that filter rejected all messages as their duplicate check with custom ID revealed that they already are in the DB.

Yes and no. Because the date (content, author, etc.) have not changed too, therefore they might not update.

But in that feed the date of every article changes upon every fetch.

@martinrotter
Copy link
Owner

you see the actual SQL command executed to determine the "duplicity" result, to me, the commands seem to be correct...

@sakkamade
Copy link
Contributor Author

When I say it does not work for me, I mean it does not work for me, please believe me.
You have seen the script and said that there is not any big issue about it, and I am quite confident with my feed. The rest goes tomorrow.

By the way, the last time I tested it was with this commit 2f39114.

@sakkamade
Copy link
Contributor Author

sakkamade commented Oct 8, 2021

You know, it just occurred to me, that feed is RSS 2.0, and since I have not really dealt with it I am not sure, but according to the first link I have just found https://validator.w3.org/feed/docs/rss2.html#hrelementsOfLtitemgt, the RSS 2.0 has guid instead of id element... ... Is that correct? ...

EDIT: As well as to second: https://www.w3schools.com/XML/xml_rss.asp

...

@sakkamade
Copy link
Contributor Author

I am apologise for taking your time.

@martinrotter
Copy link
Owner

martinrotter commented Oct 8, 2021

@sakkamade
RSS Guard in case of RSS 2.0 extracts "customID" from "guid", so yes, you are correct.

https://github.com/martinrotter/rssguard/blob/master/src/librssguard/services/standard/rssparser.cpp#L37

@sakkamade
Copy link
Contributor Author

Yes, of course that was "id"...
And yes, it works great now. Thank you!

so yes, you are correct.

Better late than never, I guess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component-Message-Filters Status-Fixed Ticket is resolved. Type-Enhancement This is request for brand new feature.
Projects
None yet
Development

No branches or pull requests

2 participants