Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RSS Feed Trigger node duplicates events if last entry date is not maximal #8856

Open
meownoid opened this issue Mar 10, 2024 · 13 comments
Open

Comments

@meownoid
Copy link
Contributor

meownoid commented Mar 10, 2024

Bug Description

RSS Feed Trigger node stores date of the last entry in the feed. For some feeds this date might not be the maximal date. This leads to re-triggers of the same event over and over.

For example, release feed of this repository: https://github.com/n8n-io/n8n/releases.atom. Date of the last entry is 2024-03-07T09:50:47Z, while date of the second entry is 2024-03-07T13:25:46Z. Date of the last entry is being stored and, as the date of the second entry is greater than the date of the first entry, second entry generates events on every execution.

To Reproduce

  1. Create RSS Feed Trigger using any RSS feed where date of the last entry is not the maximal date in the feed (for example https://github.com/n8n-io/n8n/releases.atom at the time of writing)
  2. Start execution
  3. Same events are triggered over and over

Expected behavior

Maximal date of the feed is stored and same events are not triggered

Operating System

Linux raspberry 6.1.0-rpi7-rpi-v8 #1 SMP PREEMPT Debian 1:6.1.63-1+rpt1 (2023-11-24) aarch64 GNU/Linux

n8n Version

1.31.2

Node.js Version

21.7.0

Database

SQLite (default)

Execution mode

main (default)

@meownoid meownoid changed the title RSS Trigger Node stores date of the last entry instead of maximum date RSS Feed Trigger node duplicates events if last entry date is not maximal Mar 10, 2024
@Joffcom
Copy link
Member

Joffcom commented Mar 10, 2024

Hey @meownoid,

I will take a look at this. It was intentionally changed as the previous version caused a lot of events to be missed so this was the best option.

We did store the current time before and compare that but some feeds published hourly.

@meownoid
Copy link
Contributor Author

@Joffcom hey! Thanks for looking into this! If I'm not mistaken, I think storing the maximum date from the feed (as opposed to the last date or date of execution) should solve the problem, but I might be missing some details here.

@Joffcom
Copy link
Member

Joffcom commented Mar 11, 2024

Hey @meownoid,

Normally the first entry in a feed should be the newest entry so I am going to dig into why the GitHub feed is out of order and see if there any preference in the RSS spec.

@yrmuq
Copy link

yrmuq commented Mar 13, 2024

same problem, a lot of duplicates events. temporary disabled problematic feeds

@Joffcom
Copy link
Member

Joffcom commented Mar 14, 2024

@yrmuq which feed are you using? Typically I would expect the first item in the feed to be the newest which is the case for most feeds but there appears to be a couple that don't want to operate that way so I will need to put in a fix for that.

@yrmuq
Copy link

yrmuq commented Mar 14, 2024

@Joffcom most duplicates found on this feed. Every x minutes get old duplicates all the time
https://github.com/arichornlover/YouTubeRebornPlus/releases.atom

@Joffcom
Copy link
Member

Joffcom commented Mar 15, 2024

@yrmuq interesting, it is another GitHub atom feed 🤔

@yrmuq
Copy link

yrmuq commented Mar 15, 2024

@Joffcom even more interesting, its sends duplicates after it's been deactivated 😃

@Joffcom
Copy link
Member

Joffcom commented Mar 15, 2024

@yrmuq that doesn't sound right, Are you running in queue mode? It is a polling based workflow so once disabled if it is still running there is either a core bug or a configuration issue like having 2 main instances running.

@yrmuq
Copy link

yrmuq commented Mar 17, 2024

@Joffcom don't worry, its just a bug, after deleting and re-adding rss node and restarting, now its normaly deactivated)
But what can we do with GitHub atom feed?

@Joffcom
Copy link
Member

Joffcom commented Mar 21, 2024

Hey @yrmuq,

I think the best option would be to update the node to loop over all of the items to work out what the newest is rather than trusting that the order will be correct.

@meownoid
Copy link
Contributor Author

meownoid commented Mar 21, 2024

Now another RSS feed gives me duplicate events, this time not GitHub-related: https://www.ableton.com/en/blog/feeds/latest/.

Is it in the RSS specification that events must be ordered? If yes, then I'm surprised tools that generate those feeds do not adhere to the standard. If not, then I don't understand why n8n makes such an assumption.

@Joffcom
Copy link
Member

Joffcom commented Mar 25, 2024

Hey @meownoid,

The RSS spec itself doesn't require events to be in order and to make things more interesting the pubDate is also optional which means a site could in theory publish an RSS feed with no order and no date which cause more problems.

We make the assumption that items will be in the correct order as that is what most implementations will do but as we can see here that isn't the case so we will need to make a change on how we handle this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants