-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store API design #184
Comments
For info, my current use case is importing rides from Strava, but it would be similar for importing tweets, fitbit activities, etc., that all have a well defined time of occurrence (started, posted, ...). For the time series part of the APIA few clarifications to the details of the current proposal above, assuming it follows store-timeseries:
Currently the most-used store is the store-json (API documented there) which differs in a few ways including:
time series proposalI would propose:
In addition I suggest:
key value APII haven't really used this so I won't make a concrete proposal, but it has always seemed odd to me that from a datasource perspective the datasource id IS the key, so it is really just a single value store, not a key-value store! It would seem to make more sense if there were, e.g.:
And consequently also:
I would probably look at redis for inspiration. |
The next driver I'm writing is an IMAP email driver. A number of apps will want to process the stored data filtering for addresses and/or keywords in subject and body. The only way to achieve this with the current store would be to retrieve all the emails and prase the data in the app. So the question is should the store support some kind of query beyond filtering on timestamps? |
Thanks Chris. Everything looks reasonable. I will comment further when I tackle the points in more detail in the new store. |
@jptmoore @cgreenhalgh Thoughts (some of which repeat Chris' comments):
(Other suggestions all sound reasonable though.) |
@mor1 I was going to use this https://github.com/emersion/go-imap, but if you really want an OCaml version, let me know i won't wast any time implementing it. It looks like email may form part of the risk awareness/communication studies we will be starting here at some point soon. |
@mor1 Not so bothered about OCaml version per se-- there's an OCaml IMAP implementation (a couple I think in fact), but it's the MIME parsing that I was mentioning particularly. It's insanely complicated to get right, but absolutely necessary if you want to robustly process mail contents (rather than just transport mails to/from/between servers).
|
@mor1 so your thinking about doing MIME parsing in the store? I was going to do it in the driver then link using UUIDs for the binary parts This is getting a bit off topic I will create an issue to discuss the details of the IMAP driver |
@Toshbrown not in the store per se, but you may want to explicitly put the results of having extracted content from the mail into the store (attachments, mail headers, etc). or perhaps in a derived store rather than that associated with the imap (email) driver. even just extracting attachments robustly is a surprising pita. (agree this is off-topic though.) |
See also me-box/core-export-service#28 on a possible job queue store API that might also be supported, e.g. job queue store API. Also worth noting that the current store-json subscription API isn't shown here, and will need to be supported (or something like it) |
I made some changes below (needs testing and error handing e.g. reporting path errors back to client etc) You can try out the changes from the docker client/server
Using milliseconds since epoch now
You can post with URL: /ts/[id]/at/[time] to specify your own time
Timestamps are returned with data like this [1509564588450, [1,2,3,4,5]]
The end range is now inclusive. |
does this overwrite the internal timestamp? and are there any constraints on its format? |
Yes, it overwrites the internal one. It is an integer of epoch milliseconds. |
@jptmoore On the updated API... Returning the time-value in a heterogeneous array (aka array representing a tuple, rather than an object with named fields) makes it problematic to type in some languages and more complicated to marshal/unmarshal (e.g. in go, which I'm using at the moment). It's not impossible but it is a hastle. It may also reduce consistency with the notification type/values?? The example value wasn't ever so clear but I believe (hope) [1,2,3,4,5] is a single value and every value has its own timestamp. @mor1 On the type of time I know ns since (an) Epoch was mentioned but a caution I would give about that is that a sufficient range of values can't be exactly represented in a float64 which is all that some languages will use for numbers (e.g. javascript, max int 2^^53-1). Milliseconds is OK with me. I think microseconds might also fit but is rather non-standard. When you say we can try them, I thought the new store only supported the zeromq transport, but afaik the go client library for this doesn't exist/isn't complete yet? ( @Toshbrown ?) |
@cgreenhalgh I've started updating the go library Toshbrown/lib-go-databox. I've got basic KV and TS reads and writes working with tokens inside the databox example code is Toshbrown/driver-tplink-smart-plug. It's not ready to go yet. I need to add the observe API, think about API exposed to app/driver developers, and turn the handle on the rest of the endpoints once they are stable. I'm thinking it will be mid next week before I get a chance to finish it (working on other projects until the 7th of nov) By trying I think @jptmoore is referring to the client and server he uses outside of the databox for testing here. Its all wrapped in docker containers and allows all the functionality to be tested. |
Do you have an example of some JSON you would like to be returned?
Yeh, [1,2,3,4,5] is the JSON data POSTed. The API takes any JSON as the value. |
Current store-json uses (for your example) |
I pushed a new image which returns in this format: {"timestamp":1509626879783,"data":[1,2,3]} |
@jptmoore While updateing lib-go-databox I was trying the new and requsted permissions like this from the arbitor:
These are granted by the arbiter but rejected by the store. Do you parse wildcards in the macaroon caveats? requesting permissions like this:
works fine, but this means that the macaroons can't be cached when using this endpoint. |
Yeh, currently it is matching the exact path so will need to implement wildcards. |
I have pushed a new image which should support wildcards. |
@jptmoore I'd like to push hard for a bulk add operation in the timeseries API. I know @Toshbrown hit this as a (speed) limitation with the google takeout import, and I was struggling with a simple performance test (adding 1,000s items in a reasonable time - even activity/heartrate at 1/minute = 1440/day...). Having this in the API amortises the overhead of the request/response communication and also opens the option of handling the set of values within a single transaction/commit in the datastore for further optimisation. Perhaps
I'm not sure about generating events: should each value generate an event, or only the last value, or should it generate a distinct It also raises a question (in my mind, at least) about whether the existing write entry point should change, e.g. to POST |
@cgreenhalgh could you give me a sample of the bulk JSON data you have to test with please. |
I assume the same kind of thing as you get back from a range query, e.g. for a simple value
or for a complex value
|
The API in the new store currently looks like below.
The current implementation supports POST/GET of JSON, text and binary data.
Suggestions welcome on changes/additions.
Key/Value API
Write entry
Read entry
Time series API
Write entry
Read latest entry
Read last number of entries
Read all entries since a time
Read all entries in a time range
The text was updated successfully, but these errors were encountered: