-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cache queries #12
Comments
we can probably save a document with change sequence, keyLookup, and results, this would mean we'd need to create the key lookup every time. |
I dont think we should save the results as a whole, just the output of the map'd objects, then all view lookups will be just range queries against a natural index (ie fast) |
what do you mean by output of mapped objects? |
the map will emit objects with a key, just store the objects against that key |
yeah looking back the key store doesn't give any benefits, but iif I store them against their original doc ID instead of mapped Key, I can update them incrementally |
Hmm that reminds me, not sure if they way indexeddb / couchdb handle On 6 January 2014 04:35, Calvin Metcalf notifications@github.com wrote:
|
they can, I was planning to do this in an adapter neutral way so that that kind of thing doesn't matter. |
my original plan for that was be to use seperate pouch stores to store the results, duplicated keys make that slightly but not a lot more complicated, You will need 2 indexes, one by key and one by _id either way, _id at the least for deletions, the important part is that query({startkey: ..... turns into a linear range query |
Recently I'm also thinking about this but I still don't feel like implementing b+tree in js using indexeddb (and then leveldb and so on) and it looks like it's the only way to make reduce fast enough... |
quick aside have you seen @nolanlawson's merged pull #11 that adds key index lookup. so we can save the results array, the key lookup, and an id lookup which saves which keys are produced by which id which should make deleting and updating easier, though you'd have to rebuild both the results and the key lookup if you add a document or update one and it emits new keys. |
I'd have to take a deep look at mapreduce as a whole and in partical at #11, thanks. But it looks like without trees it is impossible to do it right. And implementing trees using indexeddb looks bad. |
Just piping in, I think actually indexeddb might offer a nice solution to building up the btree. It allows complex keys, and the sorting is pretty similar to how CouchDB does it. Scroll down from here to where it says:
There are a lot of edge cases to cover (e.g. CouchDB converts any appearance of null, undefined, NaN, Infinity (but not the empty string!) to just null, it converts dates to json, idb doesn't support objects), but it seems we can basically convert one complex key format to another. @daleharvey Yeah, Couch does allow duplicate keys, but we could get around this by making each key a That being said, I think Calvin's solution is the simplest, and we'll probably have to do it anyway for websql. |
@nolanlawson can't a doc emit the same key multiple times ? |
yeh I imagined it would be more like I cant think if any way that saving the results array as a whole is going to be much of an optimisation, it still requires loading the entire resultset to be able to filter by key which is the main part that needs removed |
@calvinmetcalf oops, didn't think of that edge case! Testing it out, it seems CouchDB considers @daleharvey that works too, but the only downside is that you'll need to do a secondary sort on You're right about saving the entire results array; the only optimization is in terms of coding time. ;) |
if |
check the cache branch for a super rough way of implementing this (how rough ? doesn't actually work). The idea is to refactor it so that the buildIndices function builds a cacheable object and then buildQuarry is the function that can either use a fresh or cached version. Will take a fresh look in the morning. |
@daleharvey : sorry, I made a mistake - our implementation actually seems to be correct. I'll write a unit test to confirm, but I don't think we need to write any code to chase this edge case. |
Okay, the case of multiple emits with the same key was much less serious than I thought. It's just a one-line code change, doesn't really impact performance. See #18. |
@daleharvey, would it be possible to implement |
oh yes sorry I am talking about the map side of things, the reduce is harder to optimise, particularly without a btree, but I think getting map only fast for people is still a pretty big win, they are used far more often than reduce queries as far as I can tell |
Same here, I was only thinking of the map side, although I did look around and find this breakdown of CouchDB's map/reduce implementation if anyone's curious. Unfortunately I also discovered that neither IE 10 nor 11 support complex keys in IDB (SO question, a script to confirm). I also tried the spec's suggestion of using a |
Ok so we need a storage design for the cache, we can't use a strait up object because it will coerce all the objects to strings, but we can't use an es6 map because it stores objects by reference not by value. We also need it sorted on key based on the pouchCollate algorithm, updatable based on the document _id (a string) and possibly a joined document. I can start on that, it might end up living in another repo. Also is there a data structure that is good for this that i'm not thinking of ? |
After I learned we can't use complex keys in IE, I had a crazy idea to convert all the keys to a string representation that would sort lexically by pouchCollate, which would allow us to sort in websql/idb rather than in memory. Obviously there would be an upper limit on the size of strings that we could use in it. And obviously, it's crazy. But other than that, I had no ideas. |
Oh yeah, and don't forget that docs in the mapreduce index are technically sorted by |
The problem there is that (and I considered it) is it would consider [1] I have the beginnings of am idea which could work.
|
I wasn't thinking of just sorting based on the stringified form; I'm thinking of going nuts and building up a base64-encoded string that sorts by pouchCollate. I've already started hacking up an example. Basically it builds up a string that looks like this:
where collationIndex is e.g. 1 for nulls, 2 for booleans, 3 for numbers, etc. The encoded value can just be As long as we choose an alphabet that agrees with the database, though, and as long as the user doesn't have super-long strings or crazy nested arrays/objects or anything like that, I think sorting in the database on a string could be an easy duct-tape solution. This algorithm, btw, is already correctly sorting this list of heterogeneous values correctly, which is kind of neat. |
@nolanlawson fyi this conditional doesn't do what you think it does, I'm online most week days (irc) if you have javascript questions |
Good to know; there are lots of subtleties with browserify I was missing. I can add some fixes to make btoa work cross-platform. |
If we want to have this as separate stuff we'd need to have some eventEmitter for |
I've been meening to open a pull to make PouchDB databases event emitters. On Thu, Jan 30, 2014 at 1:37 PM, Tomasz Kołodziejski <
-Calvin W. Metcalf |
For anyone interested: I have first implementation of cached-views. It passes the tests but it's not incremental at all. I plan to add the incremental update feature tomorrow. |
About these lines I'm also +1 for using the |
I'm also more inclined to think we should imitate CouchDB's API in terms of having an explicit split between "temporary views" and "saved views." Magically caching the views may be simpler from the user's point of view, but it's a leaky abstraction when dealing with the http adapter. I'd be more inclined to have something like query() // temp view or lookup by name, i.e. status quo createIndex() // create a new saved view, equivalent to POSTing a _design doc |
In general, though, awesome work, and I'm still psyched that you made that crazy indexable string actually work. I'm gonna keep posting comments here, but I think at this point you can also just make a PR and we'll proceed from there. |
About these lines Question: how can we look up a view row by its document (e.g. when updating/deleting docs)? Per my spec it looks like you've implemented the first table type, but not the second one. |
Yeah, that
The main part is already written so I hope to do it soon! |
I faced the following problem:
Only solution I came up with is adding some unique id to pouchdb so that someone can remember it and be sure that these db differ. It could be returned by db.info. Any ideas? |
well no because there's nothing to prevent them from disabling map reduce On Tue, Feb 11, 2014 at 1:10 PM, Tomasz Kołodziejski <
-Calvin W. Metcalf |
My mapreduce implementation is using changes feed and it is updated only when db.query is issued. So there's no such thing as disabling mapreduce because it's not continuously looking for changes. The problem is about mapreduce having no mechanism of knowing whether it's talking with the same db or db with the same name because it missed the destroy command |
you probably want to store a cache id of some sort in the db as a On Tue, Feb 11, 2014 at 2:07 PM, Tomasz Kołodziejski <
-Calvin W. Metcalf |
Yeah, if using _local is safe (may someone be angry when I use his _local/_pouchdb_mapreduce key?) then it's probably good idea. |
_local better be safe because we use it during replication On Tue, Feb 11, 2014 at 2:54 PM, Tomasz Kołodziejski <
-Calvin W. Metcalf |
@neojski Isn't there a global db |
@nolanlawson Are you talking about me using setSeq and getSeq? I have to remember that by myself because it's neither the seq of db nor seq of view index. I remember it so that I can retrieve changes starting from that seq. (incremental update) If not - seq does not solve that problem because it does not help me distinguish between db with the same name but completely different. (problem described 5 comments above) |
Yeah, I forgot that |
Following daleharvey/pouchdb#1658 and pouchdb/mapreduce#12, this adds the toIndexableString method, which allows us to emulate CouchDB's standard collation to a reasonable degree of fidelity (e.g. no ICU ordering for strings).
Following daleharvey/pouchdb#1658 and pouchdb/mapreduce#12, this adds the toIndexableString method, which allows us to emulate CouchDB's standard collation to a reasonable degree of fidelity (e.g. no ICU ordering for strings).
Following daleharvey/pouchdb#1658 and pouchdb/mapreduce#12, this adds the toIndexableString method, which allows us to emulate CouchDB's standard collation to a reasonable degree of fidelity (e.g. no ICU ordering for strings).
Following daleharvey/pouchdb#1658 and pouchdb/mapreduce#12, this adds the toIndexableString method, which allows us to emulate CouchDB's standard collation to a reasonable degree of fidelity (e.g. no ICU ordering for strings).
Following daleharvey/pouchdb#1658 and pouchdb/mapreduce#12, this adds the toIndexableString method, which allows us to emulate CouchDB's standard collation to a reasonable degree of fidelity (e.g. no ICU ordering for strings).
#68. Congratulations @nolanlawson :-) |
You too @neojski. After three months of debate and about a half-dozen false starts, we finally figured it out. I think the key breakthrough was your insight with \u0000 in the indexable string; I was convinced it was a dumb idea until you fixed it. 😃 |
currently all non-http queries are done from scratch each time, we could save the result from the map query to a _local document and the sequence number, subsequent queries to avoid having to iterate through the whole database, we could even listen to the changes feed and update cache every time a document is created/updated.
The text was updated successfully, but these errors were encountered: