Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Opt out of tree based RidBags at the query level #2315

Closed
phpnode opened this issue May 5, 2014 · 21 comments
Closed

Opt out of tree based RidBags at the query level #2315

phpnode opened this issue May 5, 2014 · 21 comments
Assignees

Comments

@phpnode
Copy link
Contributor

phpnode commented May 5, 2014

While cool in many ways, the tree based RidBag feature can be a pain to deal with for languages that enforce or encourage asynchronous APIs for IO (e.g. node.js). It's painful because if I want to offer a consistent API between embedded RidBags and tree based RidBags, I have to make the embedded API async even though I already have the data. This means that I cannot, for example, reliably JSON.stringify() a record containing a bag.

For the most common cases, I also think that the default bonsai tree threshold is too low. At 80 items and after base64 encoding, the tree weighs in at little more than 1Kb, this is not a lot of data, and from the clients point of view it's going to be more efficient to lazily decode that blob than to fetch it remotely from the server. This will hold true even with 10,000 records in the bag.

It would be awesome if it was possible to selectively increase the threshold for the tree based RidBag feature or skip it entirely for certain clients or queries. I think it's really useful only for very huge data sets.

@phpnode
Copy link
Contributor Author

phpnode commented Nov 26, 2014

@laa @lvca this is causing problems for basically everyone using Oriento, and it's not something we can trivially solve. I think increasing the default threshold would be a good step to at least mitigate the issue. From what I've seen, virtually everyone will hit the 80 item limit in production but not necessarily in testing/development, so it's a nasty surprise waiting to bite people.

@lvca lvca assigned laa Nov 26, 2014
@lvca
Copy link
Member

lvca commented Nov 26, 2014

@phpnode you're right. @laa WDYT?

@StarpTech
Copy link

@lvca @laa How is the status of this issue? thanks.

@phpnode
Copy link
Contributor Author

phpnode commented May 6, 2015

One unpleasant part of this issue is that if you specify a fetch plan that fetches a tree larger than the threshold, you'll get the fetched records back but have no way of reconstructing the structure because the ridbag is still on the server

@StarpTech
Copy link

Status?

@seeden
Copy link

seeden commented Jun 15, 2015

+1

2 similar comments
@dehbmarques
Copy link

+1

@IgitDanny
Copy link

+1

@a-unite
Copy link

a-unite commented Jul 25, 2015

we are dependent on this issue too

@seeden
Copy link

seeden commented Jul 25, 2015

+1

@whatyouhide
Copy link

Gonna have to leave my 👍 here as well, this API is very painful to work with from the perspective of a binary driver.

@lvca
Copy link
Member

lvca commented Jul 26, 2015

What about if we could provide a C/C++ driver with such API so all the drivers can use it and it would be also super fast?

@whatyouhide
Copy link

Mmm I don't think adding a dependency can make things much better. Drivers should still have to work on bindings to such C driver.

I'm sure a very big first step would be providing a decent documentation (if not thorough :D) for this part, which as of now is inexistent (LinkBag is an emtpy section in the schemaless serialization section of the guide. Updating the docs doesn't require any version bump and doesn't introduce any bugs so I guess it could be done relatively quickly.

That said, I think that the problem is that the API is too low level and not customizable enough. For example, a way to tell the server the embedded threshold per wurry would already be a big step but I realize the protocol changes wouldn't be trivial. I'm sorry but this is a problem for which I only offer complaints, no solutions :).

@hilkeheremans
Copy link

Another +1 here. This is quite a PITA.

@lvca I don't mind an extra dependency, as long as we get a driver with a reliable and consistent API.

@gustavolanna
Copy link

Status?

@austinsmorris
Copy link

Any update on this?

@smolinari
Copy link
Contributor

👍

Scott

@saeedtabrizi
Copy link
Contributor

@laa , @lvca , @maggiolo00 Is there any plan to close this issue ? is there any new status ?

@andreafalzetti
Copy link

+1

@laa
Copy link
Member

laa commented Mar 6, 2017

Hi guys,

Could you explain to me how do you see this will happen, is it correct that main idea of this change to convert on the fly tree based rid bags into embedded ridbags on query level ?

@smolinari
Copy link
Contributor

smolinari commented Mar 6, 2017

@laa

I am no expert, but this is my take with some questions.

I don't think anyone was asking for an "on-the-fly" conversion between a tree-based and embedded ridbag. Could that be done and also perform well, if the ridbag is big?

When do embedded ridbags start to become a performance concern? That should be the (much higher) default threshold in ODB to switch over the ridbag to the tree type. That was the original request. Actually, the request was to selectively change the threshold. However, I can imagine the threshold would basically be based on when embedded rigbags become "too heavy".

Or asking with another possibility in mind...how bad is the overhead, when using the tree based ridbag compared to the embedded ridbag? If it is negligible for smaller data sets, then maybe the embedded ridbag should be completely dropped? This would require a migration of data from older ODB versions into 3.0, but it might also solve this problem completely for the future.

Scott

@laa laa removed the storage team label Sep 30, 2019
@laa laa closed this as completed Aug 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests