Usage discussion: VersionStore vs TickStore, allowed options for VersionStore.write.. #473

rueberger · 2017-12-20T02:06:55Z

First of all - my thanks to the maintainers. This library is exactly what I was looking for and looks very promising.

I've been having a bit of trouble figuring how to optimally use arctic though. I've been following the examples in /howto which are... sparse. Is there somewhere else I might find examples or docs?

Now, some dumb questions about VersionStore and TickStore:

I've noticed that every time I write to a VersionStore, an entirely new version is created. Are finer-grained options for versioning available? For instance, I would like to write streaming updates to a single version, only incrementing version when manually specified. I tried just passing version=1 to lib.write, but this doesn't seem to be supported.
In what scenarios might one want to use VersionStore vs TickStore? It's not clear to me what the differences are from the README or the code.
My current use case is primarily as a database for streams - for this use case TickStore is recommended? Is there a reason one might want to use VersionStore for this?
~~Is TickStore appropriate for data which may have more than row for each timestamp (event data)?~~ Nope, not allowed by TickStore

Thanks in advance for your help and patience!

The text was updated successfully, but these errors were encountered:

bmoscon · 2017-12-20T12:45:59Z

I dont think you can avoid writing a new version each time. You can tell it to remove old versions with prune_previous_version=True on writes
tickstore is for constant streams of data, version store is for working with data (i.e. playing around with it). It keeps versions so you can 'undo' changes and keep track of updates overtime. It sounds like you'd want to use tickstore.
Tickstore is very limited (i.e. no strings, querying by date range is very specific, etc) and it also is only very fast and efficient if you do the "bulking" yourself. If you append an update every time you get one, performance will be very bad. You need to cache updates and write them in an interval that makes sense for your retrieval. Version Store will internally cache updates in mongo and when they are large enough, it will compress them and re-write the symbol's data
I thought tickstore supported multi indexes, but maybe not? (I dont use it much). That being said, you can also write a new symbol for new data with overlapping timestamps.

I'll be working on our long open issue #3 this weekend and over the holiday break. So look for some docs in the upcoming docs/ folder very soon . . .

I'm going to mark this as a dupe of #3 and close it

rueberger · 2017-12-22T01:02:41Z

Thanks for the quick response!

Glad to hear there are more docs coming, looking forward to that!

One more question...

What would you recommend for storing streaming event data, where you may have more than one row for each timestamp? (eg order data) Is this use case supported by arctic?

jamesblackburn · 2017-12-22T10:30:47Z

What's the granularity of your timestamps. We use this library for tabular data here - so can end up with relatively wide rows. You can append multiple times with the same timestamp if you like.

VersionStore we use for data that's minute bar data and lower frequency. We append rows across thousands of instruments every minute using VersionStore. I wouldn't worry too much about iterating the version count. The library should look after pruning old versions for you.

The TickStore we use for streaming tick data (where we do our writes using a Java version of this API).

rueberger · 2017-12-28T21:00:10Z

Not exactly sure what you mean by granularity, but using timestamps with a one second resolution there could potentially be thousands of rows sharing a timestamp. At microsecond resolution should be more reasonable, but no guarantee of uniqueness.

I thought tickstore supported multi indexes, but maybe not? (I dont use it much). That being said, you can also write a new symbol for new data with overlapping timestamps.

I made that conclusion on the basis of a really quick test where I got an error trying to insert rows with a default settings TicksStore. I better take a closer look.

bmoscon · 2017-12-29T02:06:51Z

he means how frequent are the ticks

rueberger · 2018-01-09T01:13:50Z

This is for a stream of event data - there is no fixed frequency. The events have 'real' valued timestamps, although I'm afraid I couldn't say what the precision is.

bmoscon · 2018-01-09T13:45:21Z

i think tickstore is a fine choice then - but its better to batch the writes up using something like kinesis/kafka. I think @jamesblackburn has some slides or a link to slides where he describes how this is done internally.

rueberger · 2018-01-09T20:10:23Z

Thanks @bmoscon, it would be really useful to hear how you guys handle batching internally.

Recently I've been worrying that because of the batching/buffering difficulties we might not be able to use Arctic after all. We don't really have a good way of batching writes prior to passing them to Arctic, and really want to treat it just like any database engine that will intelligently buffer writes. Which is a shame because it is otherwise so ideally suited to our purposes!

Is the design of arctic amenable to exposing the client as a lightweight service that can do buffering? There's no good way to do it purely from the client side.

It seems to me that in the niche of lightweight timeseries databases, Arctic has little competition. And that some simple buffering would put in solid competition with some of the much heavier solutions like OpenTSDB and Druid.

I would be all too happy to help if it saves me from having to maintain an OpenTSDB deployment.

bmoscon · 2018-01-09T21:07:58Z

https://www.slideshare.net/mobile/JamesBlackburn1/2015-pydata-highperformance-iot-and-financial-data-storage-with-python-and-mongodb

bmoscon · 2018-01-09T21:11:45Z

That being said - version store will batch data and compress it at certain intervals so it may be fine for your use case

jamesblackburn · 2018-01-09T21:18:23Z

VersionStore is pretty good up to minute frequency data. It can stretch further depending on how symbols you are writing simultaneously. The batching frequency is currently a hard coded constant but can be changed to suit your use case.

rueberger · 2018-01-25T23:47:44Z

Thanks for the info.

I have followed the same approach and am now using Kafka to buffer writes. Any chance you guys could comment on what best practices are for using Arctic in this way? What's the time complexity for appending to tick stores?

sfkiwi · 2018-08-28T22:30:56Z

@rueberger How did your implementation go using Kafka. Are you still using it. I'm attempting to capture order book data and I got a little concerned when I saw the comment above about

it being good up to minute frequency data

but I think this was only referring to VersionStore and not to TickStore.

How does Arctic compare to, say, just dumping the tick stream to a Cassandra cluster which also is fairly lightweight and has excellent write speeds.

bmoscon · 2018-08-28T22:43:08Z

its easy to set up something like redis to batch the data for writes every minute

sfkiwi · 2018-08-28T22:50:01Z

Presumably that would only work if your incoming stream is bursting beyond the DB write throughput, but if it’s constantly sending hundreds of rows a second then you’ll eventually run out of memory in the redis cache if this is running constantly.

rueberger · 2018-08-28T22:51:14Z

I no longer use arctic at all - I built a lightweight timeseries store over mongo and do indeed use kafka to decouple data harvesting and insertion. Works great but it's a big bespoke solution so I'm not sure I can recommend that route unless you're doing exactly the same thing as me.

sfkiwi · 2018-08-28T22:57:00Z

That depends on what your doing :) sounds like we might be though given your initial comment about storing order data.

bmoscon · 2018-08-28T23:04:05Z

I use redis to batch updates to arctic, it works just fine. You just need to periodically clean out the written data from redis.

saeedamen · 2019-04-03T21:33:22Z

It might worth having a look at Redis new streams functionality for producer/consumer usage (similar to using Kafka). I've started to use it to batch the data in preparation for getting it to dump to Arctic.

bmoscon closed this as completed Dec 20, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Usage discussion: VersionStore vs TickStore, allowed options for VersionStore.write.. #473

Usage discussion: VersionStore vs TickStore, allowed options for VersionStore.write.. #473

rueberger commented Dec 20, 2017 •

edited

Loading

bmoscon commented Dec 20, 2017

rueberger commented Dec 22, 2017

jamesblackburn commented Dec 22, 2017

rueberger commented Dec 28, 2017

bmoscon commented Dec 29, 2017

rueberger commented Jan 9, 2018

bmoscon commented Jan 9, 2018

rueberger commented Jan 9, 2018

bmoscon commented Jan 9, 2018

bmoscon commented Jan 9, 2018

jamesblackburn commented Jan 9, 2018

rueberger commented Jan 25, 2018

sfkiwi commented Aug 28, 2018

bmoscon commented Aug 28, 2018

sfkiwi commented Aug 28, 2018

rueberger commented Aug 28, 2018

sfkiwi commented Aug 28, 2018

bmoscon commented Aug 28, 2018

saeedamen commented Apr 3, 2019

Usage discussion: VersionStore vs TickStore, allowed options for VersionStore.write.. #473

Usage discussion: VersionStore vs TickStore, allowed options for VersionStore.write.. #473

Comments

rueberger commented Dec 20, 2017 • edited Loading

bmoscon commented Dec 20, 2017

rueberger commented Dec 22, 2017

jamesblackburn commented Dec 22, 2017

rueberger commented Dec 28, 2017

bmoscon commented Dec 29, 2017

rueberger commented Jan 9, 2018

bmoscon commented Jan 9, 2018

rueberger commented Jan 9, 2018

bmoscon commented Jan 9, 2018

bmoscon commented Jan 9, 2018

jamesblackburn commented Jan 9, 2018

rueberger commented Jan 25, 2018

sfkiwi commented Aug 28, 2018

bmoscon commented Aug 28, 2018

sfkiwi commented Aug 28, 2018

rueberger commented Aug 28, 2018

sfkiwi commented Aug 28, 2018

bmoscon commented Aug 28, 2018

saeedamen commented Apr 3, 2019

rueberger commented Dec 20, 2017 •

edited

Loading