Incomplete implementation on streams/fixed-time-window #259

Closed
robkuz opened this Issue Sep 5, 2013 · 4 comments

Comments

Projects
None yet
2 participants
@robkuz

robkuz commented Sep 5, 2013

Hi,

I think there is a serious bug on the implementation of fixed-time window.
As far as I understand the implementation relies on the delivery of events in order to reset the time window to an empty state. (starting at line 332
true

    true
    (let [delta (- (:time event) @start-time)
        dstart (- delta (mod delta n))
        empties (dec (/ dstart n))
        windows (conj (repeat empties []) @buffer)]
        (alter start-time + dstart)
        (ref-set buffer [event])
        windows)))]

Now imagine the following application of the function

(fixed-time-window 30
     (smap 
          (fn [events] (check-status-if-critical-send-alert events)))

Now lets assume we have a service that is sending events that are critical with the start (just by chance with the reset) of the window. Now that service is sending these critical events for 29 seconds and then the service crashes (and obviously doesnt send any events anymore).
Our function will never be called and can't notify us of the fact that there has been (and still is) a critical condition.
So if we have no notification setup on expired streams we will never know about this critical condition.

I hope my description is understandable.

ciao robertj

ps: Is there any particular reason why this hasnt been implemented with timers?

@aphyr

This comment has been minimized.

Show comment
Hide comment
@aphyr

aphyr Sep 5, 2013

Collaborator

Complexity, performance, and correctness. All four windowing streams are pure functions in this way. If you'd like to take a stab at introducing expiry-safe timeouts into the mix, be my guest.

--Kyle

robertj notifications@github.com wrote:

Hi,

I think there is a serious bug on the implementation of fixed-time window.
As far as I understand the implementation relies on the delivery of events in order to reset the time window to an empty state. (starting at line 332
true

true (let [delta (- (:time event) @start-time) dstart (- delta (mod delta n)) empties (dec (/ dstart n)) windows (conj (repeat empties []) @buffer)](alter start-time + dstart) (ref-set buffer [event]) windows)))]

Now imagine the following application of the function

(fixed-time-window 30 (smap (fn [events](check-status-if-critical-send-alert events)))

Now lets assume we have a service that is sending events that are critical with the start (just by chance with the reset) of the window. Now that service is sending these critical events for 29 seconds and then the service crashes (and obviously doesnt send any events anymore).
Our function will never be called and can't notify us of the fact that there has been (and still is) a critical condition.
So if we have no notification setup on expired streams we will never know about this critical condition.

I hope my description is understandable.

ciao robertj

ps: Is there any particular reason why this hasnt been implemented with timers?


Reply to this email directly or view it on GitHub.

Collaborator

aphyr commented Sep 5, 2013

Complexity, performance, and correctness. All four windowing streams are pure functions in this way. If you'd like to take a stab at introducing expiry-safe timeouts into the mix, be my guest.

--Kyle

robertj notifications@github.com wrote:

Hi,

I think there is a serious bug on the implementation of fixed-time window.
As far as I understand the implementation relies on the delivery of events in order to reset the time window to an empty state. (starting at line 332
true

true (let [delta (- (:time event) @start-time) dstart (- delta (mod delta n)) empties (dec (/ dstart n)) windows (conj (repeat empties []) @buffer)](alter start-time + dstart) (ref-set buffer [event]) windows)))]

Now imagine the following application of the function

(fixed-time-window 30 (smap (fn [events](check-status-if-critical-send-alert events)))

Now lets assume we have a service that is sending events that are critical with the start (just by chance with the reset) of the window. Now that service is sending these critical events for 29 seconds and then the service crashes (and obviously doesnt send any events anymore).
Our function will never be called and can't notify us of the fact that there has been (and still is) a critical condition.
So if we have no notification setup on expired streams we will never know about this critical condition.

I hope my description is understandable.

ciao robertj

ps: Is there any particular reason why this hasnt been implemented with timers?


Reply to this email directly or view it on GitHub.

@robkuz

This comment has been minimized.

Show comment
Hide comment
@robkuz

robkuz Sep 5, 2013

;-)
Is the order of complexity, performance, correctness the development philosophy you apply when developing in Riemann?

If so then of course my idea will make the system more complex and its certainly not pure in regards to FP.
If your arguments are more geared towards the aspect of "performance" would you mind to explain where you see problems? This will be certainly helpful when I try to develop a more safer method.

As for the aspect of "correctness" I would at least suggest that this very important aspect of the implementation is documented - I was just bitten by it ;-)

robkuz commented Sep 5, 2013

;-)
Is the order of complexity, performance, correctness the development philosophy you apply when developing in Riemann?

If so then of course my idea will make the system more complex and its certainly not pure in regards to FP.
If your arguments are more geared towards the aspect of "performance" would you mind to explain where you see problems? This will be certainly helpful when I try to develop a more safer method.

As for the aspect of "correctness" I would at least suggest that this very important aspect of the implementation is documented - I was just bitten by it ;-)

@aphyr

This comment has been minimized.

Show comment
Hide comment
@aphyr

aphyr Sep 5, 2013

Collaborator

Different parts of Riemann treat time differently. It turns out that writing concurrent data structures for real-time analytics over possibly unordered streams of events is a nontrivial problem. I'm still not sure what correctness is for these types of streams, nor am I confident in my ability to write their implementations correctly. That, and frankly, there are only so many hours in my life that I can pour into the software. Stuff's not perfect yet. Sorry.

robertj notifications@github.com wrote:

;-)
Is the order of complexity, performance, correctness the development philosophy you apply when developing in Riemann?

If so then of course my idea will make the system more complex and its certainly not pure in regards to FP.
If your arguments are more geared towards the aspect of "performance" would you mind to explain where you see problems? This will be certainly helpful when I try to develop a more safer method.

As for the aspect of "correctness" I would at least suggest that this very important aspect of the implementation is documented - I was just bitten by it ;-)


Reply to this email directly or view it on GitHub.

Collaborator

aphyr commented Sep 5, 2013

Different parts of Riemann treat time differently. It turns out that writing concurrent data structures for real-time analytics over possibly unordered streams of events is a nontrivial problem. I'm still not sure what correctness is for these types of streams, nor am I confident in my ability to write their implementations correctly. That, and frankly, there are only so many hours in my life that I can pour into the software. Stuff's not perfect yet. Sorry.

robertj notifications@github.com wrote:

;-)
Is the order of complexity, performance, correctness the development philosophy you apply when developing in Riemann?

If so then of course my idea will make the system more complex and its certainly not pure in regards to FP.
If your arguments are more geared towards the aspect of "performance" would you mind to explain where you see problems? This will be certainly helpful when I try to develop a more safer method.

As for the aspect of "correctness" I would at least suggest that this very important aspect of the implementation is documented - I was just bitten by it ;-)


Reply to this email directly or view it on GitHub.

@aphyr

This comment has been minimized.

Show comment
Hide comment
@aphyr

aphyr Jan 14, 2014

Collaborator

Closing this since this behavior is a part of the windowing stream spec, but open to writing alternate implementations as well.

Collaborator

aphyr commented Jan 14, 2014

Closing this since this behavior is a part of the windowing stream spec, but open to writing alternate implementations as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment