-
Notifications
You must be signed in to change notification settings - Fork 583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Usage discussion: VersionStore vs TickStore, allowed options for VersionStore.write.. #473
Comments
I'll be working on our long open issue #3 this weekend and over the holiday break. So look for some docs in the upcoming I'm going to mark this as a dupe of #3 and close it |
Thanks for the quick response! Glad to hear there are more docs coming, looking forward to that! One more question... What would you recommend for storing streaming event data, where you may have more than one row for each timestamp? (eg order data) Is this use case supported by arctic? |
What's the granularity of your timestamps. We use this library for tabular data here - so can end up with relatively wide rows. You can append multiple times with the same timestamp if you like. VersionStore we use for data that's minute bar data and lower frequency. We append rows across thousands of instruments every minute using VersionStore. I wouldn't worry too much about iterating the version count. The library should look after pruning old versions for you. The TickStore we use for streaming tick data (where we do our writes using a Java version of this API). |
Not exactly sure what you mean by granularity, but using timestamps with a one second resolution there could potentially be thousands of rows sharing a timestamp. At microsecond resolution should be more reasonable, but no guarantee of uniqueness.
I made that conclusion on the basis of a really quick test where I got an error trying to insert rows with a default settings TicksStore. I better take a closer look. |
he means how frequent are the ticks |
This is for a stream of event data - there is no fixed frequency. The events have 'real' valued timestamps, although I'm afraid I couldn't say what the precision is. |
i think tickstore is a fine choice then - but its better to batch the writes up using something like kinesis/kafka. I think @jamesblackburn has some slides or a link to slides where he describes how this is done internally. |
Thanks @bmoscon, it would be really useful to hear how you guys handle batching internally. Recently I've been worrying that because of the batching/buffering difficulties we might not be able to use Arctic after all. We don't really have a good way of batching writes prior to passing them to Arctic, and really want to treat it just like any database engine that will intelligently buffer writes. Which is a shame because it is otherwise so ideally suited to our purposes! Is the design of arctic amenable to exposing the client as a lightweight service that can do buffering? There's no good way to do it purely from the client side. It seems to me that in the niche of lightweight timeseries databases, Arctic has little competition. And that some simple buffering would put in solid competition with some of the much heavier solutions like OpenTSDB and Druid. I would be all too happy to help if it saves me from having to maintain an OpenTSDB deployment. |
That being said - version store will batch data and compress it at certain intervals so it may be fine for your use case |
VersionStore is pretty good up to minute frequency data. It can stretch further depending on how symbols you are writing simultaneously. The batching frequency is currently a hard coded constant but can be changed to suit your use case. |
Thanks for the info. I have followed the same approach and am now using Kafka to buffer writes. Any chance you guys could comment on what best practices are for using Arctic in this way? What's the time complexity for appending to tick stores? |
@rueberger How did your implementation go using Kafka. Are you still using it. I'm attempting to capture order book data and I got a little concerned when I saw the comment above about
but I think this was only referring to VersionStore and not to TickStore. How does Arctic compare to, say, just dumping the tick stream to a Cassandra cluster which also is fairly lightweight and has excellent write speeds. |
its easy to set up something like redis to batch the data for writes every minute |
Presumably that would only work if your incoming stream is bursting beyond the DB write throughput, but if it’s constantly sending hundreds of rows a second then you’ll eventually run out of memory in the redis cache if this is running constantly. |
I no longer use arctic at all - I built a lightweight timeseries store over mongo and do indeed use kafka to decouple data harvesting and insertion. Works great but it's a big bespoke solution so I'm not sure I can recommend that route unless you're doing exactly the same thing as me. |
That depends on what your doing :) sounds like we might be though given your initial comment about storing order data. |
I use redis to batch updates to arctic, it works just fine. You just need to periodically clean out the written data from redis. |
It might worth having a look at Redis new streams functionality for producer/consumer usage (similar to using Kafka). I've started to use it to batch the data in preparation for getting it to dump to Arctic. |
First of all - my thanks to the maintainers. This library is exactly what I was looking for and looks very promising.
I've been having a bit of trouble figuring how to optimally use
arctic
though. I've been following the examples in /howto which are... sparse. Is there somewhere else I might find examples or docs?Now, some dumb questions about
VersionStore
andTickStore
:VersionStore
, an entirely new version is created. Are finer-grained options for versioning available? For instance, I would like to write streaming updates to a single version, only incrementing version when manually specified. I tried just passingversion=1
tolib.write
, but this doesn't seem to be supported.VersionStore
vsTickStore
? It's not clear to me what the differences are from the README or the code.TickStore
is recommended? Is there a reason one might want to useVersionStore
for this?IsNope, not allowed byTickStore
appropriate for data which may have more than row for each timestamp (event data)?TickStore
Thanks in advance for your help and patience!
The text was updated successfully, but these errors were encountered: