Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Series and Frames for real-time streaming data #51
What would be the right way to use Series in a real-time environment where new data arrive asynchronously?
I have found a question (and probably a part of an answer) that describes exactly the idea. http://stackoverflow.com/questions/17941932/f-immutable-data-structures-for-high-frequency-real-time-streaming-data
The answers on SO suggest using
For streaming data we need to append existing series with new value(s) and use the new series. With current array implementation that will require copying the whole old array to the new resized array. In the first question the author mentions 5 mn data point per instrument per day (let's assume 8 bytes double + DateTime's 8 bytes), or around 80 Mb per instrument. With e.g. 100 instruments copying all arrays many times per second is probably not the best option.
Simplest use case
Will the current implementation be suitable for such workflow for hundreds of instruments, multiple calculated values for each one and sub-second frequency?
Will an implementation of Deedle's
I would love to have Deedle's abstraction and API for such use case!
P.S. An abstraction of the workflow: if
My guess is that the current implementation would not be suitable - but please feel free to try it!
As you say, copying everything doesn't make sense for this application. Ideally, you would want changes to propagate down the chain of operations and do calculations incrementally, which would require different abstractions.
We're working on some real-time stuff at BlueMountain and going about it quite a different way.