-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataOutputAgent limits events after ordering #1444
Conversation
If a new Agent is linked to this DataOutputAgent, it's older events won't show up because of |
What if we decoupled the data output agent from the prior agent's events? On Sat, Apr 23, 2016, 1:45 PM Andrew Cantino notifications@github.com
|
@cantino That's correct. You could change events_order or events_to_show to reset last_event_id, though. |
@brianpetro Could you elaborate? Is it about making DataOutputAgent create events? |
@knu creating events or even using memory. The primary reason I say this is because I occasionally have to delete data that is showing up in the data output agent. It probably wouldn't be clear to a newbie that to do this you must dig into the events of the prior agent. Something like this might enable more sorting strategies too, but I haven't thought that through. |
I'm not sure if it's related to the original problem. Could be another issue? |
@knu probably. I'm still getting a hang of where to add my $0.02 |
This should fix #1044.
In the first run, it now checks `2 * events_to_show` events for each source. This also fixes the problem where older events are selected when `events_order` is not specified by sorting events by the `id`.
`events_order` determines how to select events for outputting, whereas `events_list_order` determines how to list selected events in the output.
b298612
to
978ec06
Compare
I renamed |
To explain the implementation, DataOutputAgent keeps a pool of events with the highest scores defined by |
|
||
unless new_events.empty? | ||
memory[:last_event_id] = new_events.last.id | ||
events.concat(new_events) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for there to be duplicates in here from new_events
+ events
, or is new_events
guaranteed not to contain any of events
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the latter, exactly. This is the only place memory[:last_event_id]
is set to a non-nil value, and it will be greater than or equal to any value in memory[:event_ids]
. new_events
are selected by id > memory[:last_event_id]
(see above) if memory[:last_event_id]
is non-nil, so there will be no duplicates between events
and new_events
.
If memory[:last_event_id]
is nil, memory[:event_ids]
is assured to be nil or empty and therefore events
is empty, so there will be no duplicates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant it is guaranteed there'd be no duplicates between new_events
and events
.
DataOutputAgent will select the last `events_to_show` entries of its received events sorted in the order specified by `events_order`, which is defaulted to the event creation time. | ||
So, if you have multiple source agents that may create many events in a run, you may want to either increase `events_to_show` to have a larger "window", or specify the `events_order` option to an appropriate value (like `date_published`) so events from various sources are properly mixed in the resulted feed. | ||
|
||
There is also an option `events_list_order` to control the order of events listed in the output, with the same format as `events_order`. It is defaulted to `#{Utils.jsonify(DEFAULT_EVENTS_ORDER['events_list_order'])}` so the latest entry is listed first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "a legacy option events_list_order
", or do you think people will keep using it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or
"There is also an option events_list_order
that only controls the order of events listed in the output, without attempting to maintain a total order of received events. It has the same format as events_order
."
I'm having trouble verbalizing the difference between these two subtly-different behaviors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! I've updated the paragraph.
When I do an |
Yes, that's how it works. An |
Ah, that makes sense. I agree that highest ID or highest date should be first in a feed. Do you have any concerns about merging this now, such as other testing or feedback we should get? |
@cantino At least simple use cases are covered by the specs and the current implementation is kind of broken for the reporters anyway, so I'm going to merge this now and wait for their feedback. |
👏 |
I found a bug in this implementation where I assumed |
Should be fixed in 12cecb8. |
This addresses #1044.