-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: add items
notification for chainHead_unstable_storage
#47
Comments
If it does indeed create a lot of overhead, then what you propose could make sense, but I'd very much prefer to have actual numbers rather than guesses. |
Fair enough. I will create a sandboxed environment that compares the 2 solutions and I will share my findings. |
Here: Stackblitz with the experiment.What I've done is: I've created a node project which has 2 files: one for a worker and one for the main-thread. The worker generates Once the worker has prepared all the data, it then waits for a signal from the main-thread. As soon as that signal arrives, the worker stores the starting time and it synchronously sends all the messages that have been previously prepared. The only thing that the client does is to check whether the Once the main-thread/client has received the And then we do the exact same thing but for the batched messages. The results of this experiment show that batching the messages is between 6 and 8 times faster. Which IMO proves that there is a significant overhead, which could be easily reduced with the addition of the Please, feel free to tweak the code, review it, run it locally, whatever... Thanks! |
You're measuring the overhead of To give another example, I quickly wrote this, which measures just const nonBatched = [];
for (var i = 0; i < 50000; ++i) {
nonBatched.push(JSON.stringify({ item: Math.random() }));
}
const batchedItems = [];
for (var i = 0; i < 50000; ++i) {
batchedItems.push(Math.random());
}
const batched = JSON.stringify({ items: batchedItems });
const beforeNonBatched = performance.now();
let dummyCounter1 = 0;
for (var item of nonBatched) {
const value = JSON.parse(item).item;
if (value > 0.5) dummyCounter1 += 1; // Prevent the optimizer from optimizing this loop
}
console.log("Non batched took: " + (performance.now() - beforeNonBatched) + "ms (" + dummyCounter1 + ")");
const beforeBatched = performance.now();
let dummyCounter2 = 0;
const batchedParsed = JSON.parse(batched).items;
for (var value of batchedParsed) {
if (value > 0.5) dummyCounter2 += 1; // Prevent the optimizer from optimizing this loop
}
console.log("Batched took: " + (performance.now() - beforeBatched) + "ms (" + dummyCounter2 + ")"); And the result on my machine is around 17ms for the non-batched and 7ms for the batched. If we were to look only at this other benchmark, I would say that the batching isn't worth it. |
Is it, though? Because I'm afraid that what it's actually slow is registering thousands of callbacks on the main-thread when we could be registering just one. That is IMO what can create a significant overhead.
why isn't it relevant?
So, if we were using
I don't find it surprising that if you decide to just measure the overhead of |
But to me you're registering thousands of callbacks because you've decided to do so. Why are you unregistering then re-registering the callback at every event? It seems that you're shooting yourself in the foot just to prove a point. If for example the JSON-RPC calls are done over WebSocket, you have one callback (on message) and you never unregister it. I adjusted the example to mimic roughly what smoldot does, and the results are almost exactly the same (20ms and 8ms instead of 17ms and 7ms): function fakeClientNonBatched() {
const nonBatched = [];
for (var i = 0; i < 50000; ++i) {
nonBatched.push(JSON.stringify({ item: Math.random() }));
}
return {
async nextMessage() {
if (nonBatched.length != 0)
return nonBatched.pop();
await new Promise((_resolve) => {}) // Infinite
}
}
}
function fakeClientBatched() {
const batchedItems = [];
for (var i = 0; i < 50000; ++i) {
batchedItems.push(Math.random());
}
let batched = JSON.stringify({ items: batchedItems });
return {
async nextMessage() {
if (batched) {
const b = batched;
delete batched
return b;
}
await new Promise((_resolve) => {}) // Infinite
}
}
}
(async () => {
const cNonBatchd = fakeClientNonBatched();
const cBatched = fakeClientBatched();
const beforeNonBatched = performance.now();
let dummyCounter1 = 0;
for (var i = 0; i < 50000; ++i) {
const value = JSON.parse(await cNonBatchd.nextMessage()).item;
if (value > 0.5) dummyCounter1 += 1; // Prevent the optimizer from optimizing this loop
}
console.log("Non batched took: " + (performance.now() - beforeNonBatched) + "ms (" + dummyCounter1 + ")");
const beforeBatched = performance.now();
let dummyCounter2 = 0;
const items = await JSON.parse(await cBatched.nextMessage()).items;
for (const value of items) {
if (value > 0.5) dummyCounter2 += 1; // Prevent the optimizer from optimizing this loop
}
console.log("Batched took: " + (performance.now() - beforeBatched) + "ms (" + dummyCounter2 + ")");
})();
Extracting the JSON-RPC responses from the smoldot client wasm is very fast. Giving these JSON-RPC responses to the API user of smoldot is very fast. The only thing that might be slow (would need to benchmark first) is the situation where you pass a |
This is just false. The code that I shared doesn't do that. Please have a closer look at it.
Apparently I didn't explain myself very well. What I meant is that when in JS we do this: const socket = new WebSocket("ws://localhost:8080");
function messageListener(event) {
// SOME LOGIC HERE
}
socket.addEventListener("message", messageListener); Then we are attaching a listener into the What happens, though, is that every time that the server sends a message, then that messages doesn't get processed immediately. Instead, the JS engine queues a callback into the queue of callbacks that have to be processed in the next iteration of the event-loop, and it does that for each message that it receives from an external process. This is what I mean when I say that the overhead comes from registering thousands of callbacks, rather than just registering one callback. Meaning that this overhead also happens with websocket messages.
That's because you are using microtasks (rather than using macrotasks), over the same main-thread. However, when an external I/O process registers a callback into the event-queue, then that callback is always going to be a macrotask. If I change your code to use a macrotask instead, then the results prove that there is indeed a very significant overhead: const createMacrotask = (response) => new Promise(res => setTimeout(() => res(response), 0))
function fakeClientNonBatched() {
const nonBatched = [];
for (var i = 0; i < 50000; ++i) {
nonBatched.push(JSON.stringify({ item: Math.random() }));
}
return {
async nextMessage() {
if (nonBatched.length != 0)
return createMacrotask(nonBatched.pop());
await new Promise((_resolve) => {}) // Infinite
}
}
}
function fakeClientBatched() {
const batchedItems = [];
for (var i = 0; i < 50000; ++i) {
batchedItems.push(Math.random());
}
let batched = JSON.stringify({ items: batchedItems });
return {
async nextMessage() {
if (batched) {
const b = batched;
delete batched
return createMacrotask(b);
}
await new Promise((_resolve) => {}) // Infinite
}
}
}
(async () => {
const cNonBatchd = fakeClientNonBatched();
const cBatched = fakeClientBatched();
const beforeBatched = performance.now();
let dummyCounter2 = 0;
const items = await JSON.parse(await cBatched.nextMessage()).items;
for (const value of items) {
if (value > 0.5) dummyCounter2 += 1; // Prevent the optimizer from optimizing this loop
}
console.log("Batched took: " + (performance.now() - beforeBatched) + "ms (" + dummyCounter2 + ")");
const beforeNonBatched = performance.now();
let dummyCounter1 = 0;
for (var i = 0; i < 50000; ++i) {
const value = JSON.parse(await cNonBatchd.nextMessage()).item;
if (value > 0.5) dummyCounter1 += 1; // Prevent the optimizer from optimizing this loop
}
console.log("Non batched took: " + (performance.now() - beforeNonBatched) + "ms (" + dummyCounter1 + ")");
})(); I mean, in all fairness, this code that I shared is worse than what would actually happen in reality. Because this code is not registering the next message into the callback queue, until the previous message has been processed. So, in reality things wouldn't be as bad as they may seem with the code that I shared. Feel free to adjust it further to a code that mimics registering thousands of macrotasks from an external I/O process. However, if you think about it, that's exactly what the code of my initial experiment is doing, so... 🤷♂️
Hopefully, my previous explanations would have clarified the fact that the overhead that I'm referring to is the overhead of registering thousands of macrotasks into the event-loop via I/O operations. Which is something that even the most improved version of smoldot will run into if that instance is running outside of the main-thread. |
Well, no, because smoldot would batch the messages, send them between threads, then un-batch them. While I'm not convinced that it's an actual performance issue with smoldot, the WebSocket message reception thing might be a good reason to actually do the change. |
🙌
Oh! I see what you are saying. I was under the wrong impression that when using I didn't realize that the However, as you already mentioned, this is something quite specific of this implementation of smoldot and I think that it's fair to assume that's not going to be the case for other clients. Also, perhaps this change would allow for smoldot to move even more logic into the worker? I'm not sure if that makes sense, though... Anyways, I think that we both agree now that this change does makes sense. |
The code running in the worker thread is creating the JSON-RPC messages and the main thread only receives them. But smoldot knows how many messages are available in the queue and can send between threads the entire queue of messages at once as an array of string.
I disagree. I think that the slowness in the WebSocket situation is a JavaScript issue. From a purely theoretical standpoint, sending many small WebSocket messages or sending one big WebSocket message should be roughly the same speed. All you're doing when receiving WebSocket messages is parse the data coming from a TCP stream. The parsing speed should be roughly proportional to the total number of bytes in all the messages combined, but not to the number of messages. The fact that in JS the speed is roughly proportional to the number of WebSocket messages should be considered as an issue, not as something normal. |
Using
chainHead_unstable_storage
with the typedescendants-values
ordescendants-hashes
will in many cases yield thousands of results. Sending anitem
notification for each result creates a lot of unnecessary overhead. That's why I think that it would be a good idea to have anitems
notification for these 2 types.The text was updated successfully, but these errors were encountered: