Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: add items notification for chainHead_unstable_storage #47

Closed
josepot opened this issue Jun 13, 2023 · 10 comments · Fixed by #51
Closed

RFC: add items notification for chainHead_unstable_storage #47

josepot opened this issue Jun 13, 2023 · 10 comments · Fixed by #51

Comments

@josepot
Copy link
Contributor

josepot commented Jun 13, 2023

Using chainHead_unstable_storage with the type descendants-values or descendants-hashes will in many cases yield thousands of results. Sending an item notification for each result creates a lot of unnecessary overhead. That's why I think that it would be a good idea to have an items notification for these 2 types.

@tomaka
Copy link
Contributor

tomaka commented Jun 14, 2023

If it does indeed create a lot of overhead, then what you propose could make sense, but I'd very much prefer to have actual numbers rather than guesses.

@josepot
Copy link
Contributor Author

josepot commented Jun 14, 2023

Fair enough. I will create a sandboxed environment that compares the 2 solutions and I will share my findings.

@josepot
Copy link
Contributor Author

josepot commented Jun 14, 2023

If it does indeed create a lot of overhead, then what you propose could make sense, but I'd very much prefer to have actual numbers rather than guesses.

Here:

Stackblitz with the experiment.

What I've done is: I've created a node project which has 2 files: one for a worker and one for the main-thread.

The worker generates 50_000 of random hexadecimal strings, which represent the item(s) that are going to be sent to the client/main-thread. Then the worker creates all the messages that will be sent (the batched and the non batched). Therefore, we are just measuring the overhead on the side of the consumer.

Once the worker has prepared all the data, it then waits for a signal from the main-thread. As soon as that signal arrives, the worker stores the starting time and it synchronously sends all the messages that have been previously prepared.

The only thing that the client does is to check whether the done event has arrived, and if it hasn't, then it puts those messages on an Array. So, we are not taking into account the possible overhead of matching that subscription with its listener, we are just measuring the overhead of receiving the message, parsing it and storing its result, nothing more.

Once the main-thread/client has received the done event, it then sends a signal to the worker to inform it that all the data has been processed, when the worker receives that messages it stores the finish time and it logs how long it took.

And then we do the exact same thing but for the batched messages.

The results of this experiment show that batching the messages is between 6 and 8 times faster. Which IMO proves that there is a significant overhead, which could be easily reduced with the addition of the items event.

Please, feel free to tweak the code, review it, run it locally, whatever...

Thanks!

@tomaka
Copy link
Contributor

tomaka commented Jun 15, 2023

You're measuring the overhead of MessagePort, which is indeed very very slow, but that's not really relevant here. Yes smoldot uses a MessagePort internally, but this could easily be optimized by batching items internally in smoldot.

To give another example, I quickly wrote this, which measures just JSON.parse:

const nonBatched = [];
for (var i = 0; i < 50000; ++i) {
  nonBatched.push(JSON.stringify({ item: Math.random() }));
}

const batchedItems = [];
for (var i = 0; i < 50000; ++i) {
  batchedItems.push(Math.random());
}
const batched = JSON.stringify({ items: batchedItems });

const beforeNonBatched = performance.now();
let dummyCounter1 = 0;
for (var item of nonBatched) {
  const value = JSON.parse(item).item;
  if (value > 0.5) dummyCounter1 += 1;  // Prevent the optimizer from optimizing this loop
}
console.log("Non batched took: " + (performance.now() - beforeNonBatched) + "ms (" + dummyCounter1 + ")");

const beforeBatched = performance.now();
let dummyCounter2 = 0;
const batchedParsed = JSON.parse(batched).items;
for (var value of batchedParsed) {
  if (value > 0.5) dummyCounter2 += 1;  // Prevent the optimizer from optimizing this loop
}
console.log("Batched took: " + (performance.now() - beforeBatched) + "ms (" + dummyCounter2 + ")");

And the result on my machine is around 17ms for the non-batched and 7ms for the batched. If we were to look only at this other benchmark, I would say that the batching isn't worth it.

@josepot
Copy link
Contributor Author

josepot commented Jun 15, 2023

You're measuring the overhead of MessagePort, which is indeed very very slow,

Is it, though? Because I'm afraid that what it's actually slow is registering thousands of callbacks on the main-thread when we could be registering just one. That is IMO what can create a significant overhead.

but that's not really relevant here

why isn't it relevant?

Yes smoldot uses a MessagePort internally, but this could easily be optimized by batching items internally in smoldot.

So, if we were using smoldot in a Worker (which is what SC is about to land), then what mechanism would smoldot use to pass the messages to the main-thread? What good does it serve for smoldot to batch messages, if then later smoldot has to send them one by one to the main-thread? 🤔 I'm not following. Could you please elaborate?

To give another example, I quickly wrote this, which measures just JSON.parse:

I don't find it surprising that if you decide to just measure the overhead of JSON.parse the overhead is not as significant. Still is somewhat significant IMO (>2x slower), but let's say that it is insignificant. Because the real overhead comes IMO from registering thousands of callbacks on the main-thread, when we could be registering just one.

@tomaka
Copy link
Contributor

tomaka commented Jun 15, 2023

what it's actually slow is registering thousands of callbacks on the main-thread when we could be registering just one

But to me you're registering thousands of callbacks because you've decided to do so. Why are you unregistering then re-registering the callback at every event? It seems that you're shooting yourself in the foot just to prove a point.

If for example the JSON-RPC calls are done over WebSocket, you have one callback (on message) and you never unregister it.
In the case of smoldot, it returns the response within a Promise after a function call.

I adjusted the example to mimic roughly what smoldot does, and the results are almost exactly the same (20ms and 8ms instead of 17ms and 7ms):

function fakeClientNonBatched() {
    const nonBatched = [];
    for (var i = 0; i < 50000; ++i) {
      nonBatched.push(JSON.stringify({ item: Math.random() }));
    }

    return {
        async nextMessage() {
            if (nonBatched.length != 0)
                return nonBatched.pop();

            await new Promise((_resolve) => {})  // Infinite
        }
    }
}

function fakeClientBatched() {
    const batchedItems = [];
    for (var i = 0; i < 50000; ++i) {
      batchedItems.push(Math.random());
    }
    let batched = JSON.stringify({ items: batchedItems });

    return {
        async nextMessage() {
            if (batched) {
                const b = batched;
                delete batched
                return b;
            }

            await new Promise((_resolve) => {})  // Infinite
        }
    }
}

(async () => {
    const cNonBatchd = fakeClientNonBatched();
    const cBatched = fakeClientBatched();

    const beforeNonBatched = performance.now();
    let dummyCounter1 = 0;
    for (var i = 0; i < 50000; ++i) {
        const value = JSON.parse(await cNonBatchd.nextMessage()).item;
        if (value > 0.5) dummyCounter1 += 1;  // Prevent the optimizer from optimizing this loop
    }
    console.log("Non batched took: " + (performance.now() - beforeNonBatched) + "ms (" + dummyCounter1 + ")");

    const beforeBatched = performance.now();
    let dummyCounter2 = 0;
    const items = await JSON.parse(await cBatched.nextMessage()).items;
    for (const value of items) {
        if (value > 0.5) dummyCounter2 += 1;  // Prevent the optimizer from optimizing this loop
    }
    console.log("Batched took: " + (performance.now() - beforeBatched) + "ms (" + dummyCounter2 + ")");

})();

So, if we were using smoldot in a Worker (which is what SC is about to land), then what mechanism would smoldot use to pass the messages to the main-thread? What good does it serve for smoldot to batch messages, if then later smoldot has to send them one by one to the main-thread? thinking I'm not following. Could you please elaborate?

Extracting the JSON-RPC responses from the smoldot client wasm is very fast. Giving these JSON-RPC responses to the API user of smoldot is very fast. The only thing that might be slow (would need to benchmark first) is the situation where you pass a portToWorker in the config. In that case, smoldot sends items one by one through a MessagePort.
This however could easily be changed in the code that I've linked to send multiple responses at the same time.

@josepot
Copy link
Contributor Author

josepot commented Jun 15, 2023

Why are you unregistering then re-registering the callback at every event? It seems that you're shooting yourself in the foot just to prove a point.

This is just false. The code that I shared doesn't do that. Please have a closer look at it.

If for example the JSON-RPC calls are done over WebSocket, you have one callback (on message) and you never unregister it.

Apparently I didn't explain myself very well. What I meant is that when in JS we do this:

const socket = new WebSocket("ws://localhost:8080");

function messageListener(event) {
  // SOME LOGIC HERE
}
socket.addEventListener("message", messageListener);

Then we are attaching a listener into the message event of the socket.addEventListener. This -which is not relevant to this discussion- is only done once, yes.

What happens, though, is that every time that the server sends a message, then that messages doesn't get processed immediately. Instead, the JS engine queues a callback into the queue of callbacks that have to be processed in the next iteration of the event-loop, and it does that for each message that it receives from an external process. This is what I mean when I say that the overhead comes from registering thousands of callbacks, rather than just registering one callback. Meaning that this overhead also happens with websocket messages.

I adjusted the example to mimic roughly what smoldot does, and the results are almost exactly the same

That's because you are using microtasks (rather than using macrotasks), over the same main-thread. However, when an external I/O process registers a callback into the event-queue, then that callback is always going to be a macrotask.

If I change your code to use a macrotask instead, then the results prove that there is indeed a very significant overhead:

const createMacrotask = (response) =>  new Promise(res => setTimeout(() => res(response), 0))

function fakeClientNonBatched() {
  const nonBatched = [];
  for (var i = 0; i < 50000; ++i) {
    nonBatched.push(JSON.stringify({ item: Math.random() }));
  }

  return {
      async nextMessage() {
          if (nonBatched.length != 0)
              return createMacrotask(nonBatched.pop());

          await new Promise((_resolve) => {})  // Infinite
      }
  }
}

function fakeClientBatched() {
  const batchedItems = [];
  for (var i = 0; i < 50000; ++i) {
    batchedItems.push(Math.random());
  }
  let batched = JSON.stringify({ items: batchedItems });

  return {
      async nextMessage() {
          if (batched) {
              const b = batched;
              delete batched
              return createMacrotask(b);
          }

          await new Promise((_resolve) => {})  // Infinite
      }
  }
}

(async () => {
  const cNonBatchd = fakeClientNonBatched();
  const cBatched = fakeClientBatched();

  const beforeBatched = performance.now();
  let dummyCounter2 = 0;
  const items = await JSON.parse(await cBatched.nextMessage()).items;
  for (const value of items) {
      if (value > 0.5) dummyCounter2 += 1;  // Prevent the optimizer from optimizing this loop
  }
  console.log("Batched took: " + (performance.now() - beforeBatched) + "ms (" + dummyCounter2 + ")");

  const beforeNonBatched = performance.now();
  let dummyCounter1 = 0;
  for (var i = 0; i < 50000; ++i) {
      const value = JSON.parse(await cNonBatchd.nextMessage()).item;
      if (value > 0.5) dummyCounter1 += 1;  // Prevent the optimizer from optimizing this loop
  }
  console.log("Non batched took: " + (performance.now() - beforeNonBatched) + "ms (" + dummyCounter1 + ")");
})();

I mean, in all fairness, this code that I shared is worse than what would actually happen in reality. Because this code is not registering the next message into the callback queue, until the previous message has been processed. So, in reality things wouldn't be as bad as they may seem with the code that I shared. Feel free to adjust it further to a code that mimics registering thousands of macrotasks from an external I/O process. However, if you think about it, that's exactly what the code of my initial experiment is doing, so... 🤷‍♂️

Extracting the JSON-RPC responses from the smoldot client wasm is very fast. Giving these JSON-RPC responses to the API user of smoldot is very fast. The only thing that might be slow (would need to benchmark first) is the situation where you pass a portToWorker in the config. In that case, smoldot sends items one by one through a MessagePort.
This however could easily be changed in the code that I've linked to send multiple responses at the same time.

Hopefully, my previous explanations would have clarified the fact that the overhead that I'm referring to is the overhead of registering thousands of macrotasks into the event-loop via I/O operations. Which is something that even the most improved version of smoldot will run into if that instance is running outside of the main-thread.

@tomaka
Copy link
Contributor

tomaka commented Jun 15, 2023

Which is something that even the most improved version of smoldot will run into if that instance is running outside of the main-thread.

Well, no, because smoldot would batch the messages, send them between threads, then un-batch them.

While I'm not convinced that it's an actual performance issue with smoldot, the WebSocket message reception thing might be a good reason to actually do the change.

@josepot
Copy link
Contributor Author

josepot commented Jun 15, 2023

the WebSocket message reception thing might be a good reason to actually do the change.

🙌

Well, no, because smoldot would batch the messages, send them between threads, then un-batch them.

Oh! I see what you are saying.

I was under the wrong impression that when using smoldot over a Worker, that the messages that were passed from the worker to the main-thread were JSON-RPC messages that were compliant with this spec.

I didn't realize that the smoldot code that's meant to run on the main-thread is also the one responsible for creating the JSON-RPC messages.... I see. I thought that the "only" responsibility of the code on the main-thread was to proxy the WebRTC API to the worker, and that what was coming out of the worker were already these JSON-RPC messages... But now that I think about it, that doesn't have to be the case at all, b/c in the end is only the code in the main-thread can create client instance, so yeah. I now understand what you were saying. 🤝

However, as you already mentioned, this is something quite specific of this implementation of smoldot and I think that it's fair to assume that's not going to be the case for other clients. Also, perhaps this change would allow for smoldot to move even more logic into the worker? I'm not sure if that makes sense, though...

Anyways, I think that we both agree now that this change does makes sense.

@tomaka
Copy link
Contributor

tomaka commented Jun 15, 2023

I didn't realize that the smoldot code that's meant to run on the main-thread is also the one responsible for creating the JSON-RPC messages....

The code running in the worker thread is creating the JSON-RPC messages and the main thread only receives them. But smoldot knows how many messages are available in the queue and can send between threads the entire queue of messages at once as an array of string.

However, as you already mentioned, this is something quite specific of this implementation of smoldot and I think that it's fair to assume that's not going to be the case for other clients.

I disagree. I think that the slowness in the WebSocket situation is a JavaScript issue.

From a purely theoretical standpoint, sending many small WebSocket messages or sending one big WebSocket message should be roughly the same speed. All you're doing when receiving WebSocket messages is parse the data coming from a TCP stream. The parsing speed should be roughly proportional to the total number of bytes in all the messages combined, but not to the number of messages.

The fact that in JS the speed is roughly proportional to the number of WebSocket messages should be considered as an issue, not as something normal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants