Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancing Data Delivery Efficiency in chainHead_unstable_storage Calls #1600

Closed
josepot opened this issue Jan 23, 2024 · 4 comments · Fixed by #1605
Closed

Enhancing Data Delivery Efficiency in chainHead_unstable_storage Calls #1600

josepot opened this issue Jan 23, 2024 · 4 comments · Fixed by #1605

Comments

@josepot
Copy link
Contributor

josepot commented Jan 23, 2024

Issue Overview

While the current behavior of the chainHead_unstable_storage call in fetching descendantValues is compliant with the specification, I propose a discussion on improving the client-side experience. This is particularly relevant when dealing with large storage entries, such as Staking.Nominators on Polkadot.

Current Behavior

Contrary to my initial expectations, requesting the descendantValues of a large storage entry does not result in multiple operationStorageItems events over time. Instead, it yields a single operationStorageItems event with a vast array of items, followed immediately by an operationStorageDone event.

On the client side, handling this extensive data in one go has proven to be challenging. The main issue I've encountered is the intensive resource requirement to decode a vast number of items simultaneously, which leads to browser performance issues. I recognize that part of this challenge stems from my approach to data handling. Optimizing my method to process such large datasets in a more efficient manner is a key area of improvement on my end.

Proposal for Progressive Data Delivery

Despite the above, I propose considering a more progressive approach to data delivery. Delivering data in smaller chunks could significantly enhance the efficiency of client-side processing. This method would allow for a more responsive user experience, as data can be presented and handled incrementally.

Suggestion for Specification Enhancement

In light of this, revisiting the specification to encourage or facilitate progressive data delivery for large datasets might be beneficial. Such a change could help developers handle data more effectively, leading to better performance across various client applications.

Requesting Feedback

I'm interested in your thoughts on this suggestion and any additional insights you may have.

@tomaka
Copy link
Contributor

tomaka commented Jan 23, 2024

Giving the data progressively is completely intended, it's simply not 100% properly implemented in smoldot at the moment.

In light of this, revisiting the specification to encourage or facilitate progressive data delivery for large datasets might be beneficial.

You mean reverting paritytech/json-rpc-interface-spec#47 ?

@josepot
Copy link
Contributor Author

josepot commented Jan 23, 2024

Giving the data progressively is completely intended

This is what I'm requesting, and what I don't currently see happening 🙂.

it's simply not 100% properly implemented in smoldot at the moment

Which is why I'm opening this issue. Because, even if the current behaviour is "spec compliant", it's far from ideal.

Just to be clear: the issue is that I don't receive any items notifications for a long time (between 30 to 50 secs) and then I suddenly receive just one items notification containing thousands of rows, and immediately after that I receive the operationStorageDone event.

Ideally, I would like to receive a notification with a few hundred items every now and then and/or whenever possible, rather than waiting to receive them all in one go once the whole data-set is ready.

You mean reverting paritytech/json-rpc-interface-spec#47 ?

There is no need to revert paritytech/json-rpc-interface-spec#47.

Even if it was reverted, if smoldot behaved in the same way: wait until it has all the items and then bomb the client with thousands of events all at once, then we would be in a situation even more difficult to deal with from the client side. So, it's not about reverting that, it's about asking the server to send the data progressively whenever possible.

IMO this should also be beneficial for smoldot, as it would be able to free up memory faster.

@tomaka
Copy link
Contributor

tomaka commented Jan 24, 2024

Even if it was reverted, if smoldot behaved in the same way: wait until it has all the items and then bomb the client with thousands of events all at once, then we would be in a situation even more difficult to deal with from the client side.

If you received thousands of events, you could parse each event individually and add a delay between parsing.

What you seem to want is for smoldot to wait a little bit before sending chunks of items. This is not something I will ever implement. I will implement sending items as soon as they're received, which will indeed lead to multiple small chunks, but this is in no way a guarantee that there will be a delay between these chunks.

@josepot
Copy link
Contributor Author

josepot commented Jan 24, 2024

Even if it was reverted, if smoldot behaved in the same way: wait until it has all the items and then bomb the client with thousands of events all at once, then we would be in a situation even more difficult to deal with from the client side.

If you received thousands of events, you could parse each event individually and add a delay between parsing.

Not really, as each callback gets invoked by the event-queue. So, having ~50K callbacks invoked "all at once" (one immediately after the other) in a queue of microtasks that I have no means of knowing its size, is a lot more annoying to deal with than 1 callback with 50K items. However, once again, the issue is IMO the fact that the data is not being delivered progressively.

EDIT: actually, with the perf optimization that I plan on using it would be about the same. However, the whole point is that as I said in my previous comment: "it's not about reverting that, it's about asking the server to send the data progressively".

What you seem to want is for smoldot to wait a little bit before sending chunks of items.

Nope, that's not what I want.

This is not something I will ever implement.

Phew!

I will implement sending items as soon as they're received

This is exactly what I would like to have! 🙌

Once again, as I already said in my initial message:

I recognize that part of this challenge stems from my approach to data handling. Optimizing my method to process such large datasets in a more efficient manner is a key area of improvement on my end.

So, yeah, I should improve the way that handle dealing with large data-sets, no doubt. However, that doesn't change the fact that ideally smoldot should try not to hoard all the items until its done receiving them, that's all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants