Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Threaded dispatcher loop #23

Closed

Conversation

Archmonger
Copy link
Contributor

@Archmonger Archmonger commented Oct 27, 2021

This PR attempts runs the dispatcher in a thread.

Merging this pull request will significantly increase scalability. For example, would allow servicing requests in scenarios where there's hundreds of clients connected to a single webserver.

Details

  • Run the renderer within a single self._idom_dispatcher_thread
  • Kill off render threads upon websocket disconnection.
  • Replace asyncio.Queue with a thread-safe async queue janus.Queue().async_q
  • Remove useless super() in disconnect()

@Archmonger
Copy link
Contributor Author

Archmonger commented Oct 27, 2021

It's a little tricky to do this since we need a good method of exiting the _run_dispatch_loop upon disconnect.

I was investigating wrappers for Thread that incorporate termination, but still debugging some issues with that implementation.

@Archmonger
Copy link
Contributor Author

Archmonger commented Oct 28, 2021

Putting the dispatch into a Thread(target=asyncio.run, ...) appears to prevent components from updating when new items are added to the _idom_recv_queue.

My research shows that asyncio.Queue is not thread-safe.

@Archmonger
Copy link
Contributor Author

Archmonger commented Oct 28, 2021

Queue() has been replaced with an thread safe queue library (janus).

@Archmonger Archmonger marked this pull request as ready for review October 28, 2021 02:24
@Archmonger Archmonger requested a review from a team as a code owner October 28, 2021 02:24
@Archmonger Archmonger linked an issue Oct 28, 2021 that may be closed by this pull request
1 task
@Archmonger Archmonger self-assigned this Oct 28, 2021
requirements/pkg-deps.txt Outdated Show resolved Hide resolved
src/django_idom/websocket_consumer.py Outdated Show resolved Hide resolved
src/django_idom/websocket_consumer.py Outdated Show resolved Hide resolved
@Archmonger
Copy link
Contributor Author

After some testing, janus.Queue performance isn't too great...
I'll toy around with ways to use the asyncio queue in some thread-safe fashion.

@Archmonger
Copy link
Contributor Author

I'm wondering if the threading belongs inside of IDOM core's render_json_patch rather than up here in Django land.

Would allow us to use the more performant queue up in Django-IDOM, and avoid the mess of running a full async coroutine inside of a thread.

@Archmonger Archmonger marked this pull request as draft October 30, 2021 08:23
@rmorshea
Copy link
Contributor

You need to thread the whole dispatch loop because each render is inherently sequential.

I'm starting to wonder whether this is premature optimization. Perhaps it would be best to wait on this until there's a demonstrable need for it.

@Archmonger
Copy link
Contributor Author

Archmonger commented Oct 30, 2021

Summary

The current implementation takes roughly 1.4 - 2.3s to render the page while blocking (given the four modules on the test server).

Threaded version seems render in 0.5 - 1.2s and a quick test showed those numbers weren't impacted by adding a second client.

First number in estimates below were automatically generated by Chromium Lighthouse, second number was created by comparing django-idom-client.js load completion vs VictoryBar first paint.

django-idom@main web module uncached

image
Takes 1.6 - 2.3s to render. In this time, further requests are blocked due to the bulk of this time being spent on sync code.

archmonger/django-idom@threaded-dispatcher-loop web module uncached

image
Takes 1.0 - 1.2s to render non-blocking.

django-idom@main web module cached

image
Takes 1.4 - 1.9s to render. This is the current implementation in the best case scenario.

archmonger/django-idom@threaded-dispatcher-loop web module cached

image
Takes 0.5 - 0.6s to render non-blocking. A rudimentary test of running two clients at once didn't seem to change these numbers.

@Archmonger
Copy link
Contributor Author

Archmonger commented Oct 30, 2021

I'm starting to wonder whether this is premature optimization. Perhaps it would be best to wait on this until there's a demonstrable need for it.

Being limited to half a page load per second for a relatively simple web layout is pretty rough.

Would definitely limit Django-IDOM adoption.

@rmorshea
Copy link
Contributor

rmorshea commented Nov 1, 2021

This doesn't quite make sense to me - rendering the page with IDOM should be a CPU limited task so threading shouldn't make the page load any faster. Maybe we should meet to discuss these results?

@Archmonger
Copy link
Contributor Author

Archmonger commented Nov 1, 2021

Well in the case of web modules that is IO bound. And any sync code completely demolishes Django's Websocket Consumer event loop.

We can schedule some time to discuss.

Weekdays are going to be rough for me, usually only free after 8. But I should have better availability on weekends.

@rmorshea
Copy link
Contributor

rmorshea commented Nov 1, 2021

I could meet after 8 at some point this week if you're able. I think I just want to understand exactly what's happening so I can come up with the best solution to this.

@Archmonger
Copy link
Contributor Author

Based on this gist I've determined I'm doing the whole asyncio-in-thread thing wrong. Will rewrite this PR to do it properly sometime this week.

I currently don't have any plans for after 8 M-Th, so feel free to throw something down on my calendar to discuss.

@Archmonger
Copy link
Contributor Author

Archmonger commented Nov 1, 2021

I'm not sure if this threading implementation will ever get merged, mostly because thread safe queue performance is awful.

First paint times are improved a lot by this, but IDOM component updates take a huge hit.

I believe the current performance boost comes from being able to not block the WS event loop, which allows for rendering during IO operations such as Websocket send/receive and file read/write.

I've tried writing an async wrapper for queue.Queue, but it had similarly bad performance. Also, the thread safe queue in aioprocessing had roughly the same performance as janus.

My last shot is going on be funneling all queue operations through a dedicated asyncio.Queue thread.

@rmorshea
Copy link
Contributor

rmorshea commented Nov 1, 2021

I wonder if this problem could be eased by introducing some utilities to make rendering some parts of the UI in a thread easier. For example, maybe introducing a use_thread hook:

@component
def MyComponent():
    view = use_thread(MainView, FallbackView)
    ...

that could be used in a higher-level decorator:

@component
@threaded(FallbackView)
def MainView():
    ...

@Archmonger
Copy link
Contributor Author

Archmonger commented Nov 1, 2021

I'm not sure if that could solve the situation. If Thread.join() is called within those decorators/hooks, then we're back to square one since join blocks the event loop.

We'd either need to write things in a way that doesn't rely on joining threads, or write some kind of custom thread wrapper that somehow has a nonblocking join.

@rmorshea
Copy link
Contributor

rmorshea commented Nov 2, 2021

Ok, so the threading solution would have some limitations I'm realizing (in short the decorator idea wouldn't work) but it could still work for some things. Basically use_thread would look something like this:

def use_thread(default, function):
    state, set_state = use_state(default)

    @use_effect
    async def run_in_thread():
        loop = asyncio.get_event_loop()
        queue = asyncio.Queue()
        Thread(target=lambda: loop.call_soon_threadsafe(queue.put, function()), daemon=True)
        set_state(await queue.get())
  
    return state

Usage would look like:

@component
def MyComponent():
    value = use_thread(default=None, function=compute_expensive_value)
    if value is None:
        return FallbackView()
    else:
       return TheActualView(value)

Initially we'd display the FallbackView, but once compute_expensive_value completes then value will be updated to its result and we'll display TheActualView

@rmorshea
Copy link
Contributor

rmorshea commented Nov 2, 2021

Another way to improve this situation could be to simply yield control back to the event loop more frequently while rendering. Presently, when the layout begins a rendering task it does not yield control back to the event loop until the update is complete. Would could however, make it yield back after the render of each component completes. This would prevent the server from locking up once a render starts.

You'll note that this call within Layout.render is to a sync function, so we basically lock the event loop until that completes. By making that, and some other internal methods async instead, we can yield back more often.

@Archmonger
Copy link
Contributor Author

Archmonger commented Nov 2, 2021

I think we should handle performance instances like this within the IDOM framework as best we can, rather than putting the burden on the developer to optimize. So the hook isn't something that I'd prefer.

The options I would consider are:

  1. Continue down the path of threading specific to DJ IDOM
    • Only reasonable if I can figure out how to get cross-thread queues working in a performant manner without monkey-patching IDOM Core
    • Won't be able to provide any performance benefit for the rest of IDOM's supported servers
  2. Threading the entire TaskGroup within dispatch_single_view (if possible)
    • Good option that would also prevent bad developer decisions (such as sleeps) from impacting performance.
    • Would still run into the same limitation of asyncio.Queue not being thread-safe, but at least it's much easier to resolve that within IDOM Core (since that's where Queue.get() is actually being called) than up here in DJ IDOM
  3. Yielding to the event loop in between renders
    • Doesn't protect against "bad code" in component def (ex. sleeps, IO bound operations)
    • Doesn't allow renders to occur during WS receive/send
    • Note: The idea of protecting against bad code is the main reason the Django team is pushing to use thread pools for running sync HTTP views

@Archmonger
Copy link
Contributor Author

I'm going to put this PR on hiatus to focus on Conreq Core. Trying to get a release out by the end of December.

@Archmonger
Copy link
Contributor Author

Archmonger commented Nov 19, 2021

Just an idea...

A fix for this could also be done upstream in django/channels. It looks like the channels team expects async websockets to have no sync code within their methods, Perhaps WS receive/send execution belongs in a thread?

Django 4.0 comes out December 6th and it makes improvements to the request handler by running each independent request within it's own thread. I don't know if that extends to the websocket request handler though, so I suppose I'll wait and see.

@Archmonger
Copy link
Contributor Author

Archmonger commented Dec 22, 2021

New implementation avoids Janus.

Utilizes a backhaul thread (running an asyncio loop) to execute _run_dispatch_loop and _recv_queue_put.

Unfortunately same results as the previous implementation. Vastly improves concurrency, but adds latency to layout updates.

@rmorshea
Copy link
Contributor

Given that this is a trade-off, can we add an option which would allow someone to choose what they want to do? Maybe IDOM_RUN_IN_THREAD?

@Archmonger
Copy link
Contributor Author

Archmonger commented Dec 23, 2021

I'd say yes depending on how things turn out on options two and three.

I'm going to put together a quick test to measure the performance gains via this implementation. Simplest form would be rendering 100+ components on the page and comparing the threaded and non-threaded implementations.

Side Note (pure speculation)

It seems like the the first layout render (dispatch_single_view execution stack) takes significantly longer than subsequent layout updates. Blocking the WS event loop for that extended duration appears to be the main problem here in terms of concurrency. Subsequent layout update times appear to be (untested) trivial in execution time.

Is there a significant amount of sync code that runs on the first layout paint? Could it be possible to asyncifiy some parts of the first paint stack, or thread parts that need to be sync?

I might need to profile the dispatch loop using Yappi to dig deeper on this.

@Archmonger
Copy link
Contributor Author

Archmonger commented Dec 23, 2021

Test Case

  <h1>IDOM Test Page</h1>
  <!-- repeat_val = range(25) -->
  {% for x in repeat_val %}
  <div>{% idom_component "test_app.components.HelloWorld" class="hello-world" %}</div>
  <div>{% idom_component "test_app.components.Button" class="button" %}</div>
  <div>{% idom_component "test_app.components.ParametrizedComponent" class="parametarized-component" x=123 y=456 %}</div>
  <div>{% idom_component "test_app.components.SimpleBarChart" class="simple-bar-chart" %}</div>
  {% endfor %}

Results

Test Case Seconds Elapsed
100 Components Threaded 5.22s
100 Components (Main branch) 5.88s

Analysis

With a performance gain of only 11% on first paint, I'm going to close off this PR.

The results are more noticeable at smaller quantities (as seen in the previous test), but bulk testing is more important in this scenario. Perhaps it's the first WS handshake with the server that is expensive?

Not worth adding noticeable latency on subsequent layout updates for trivial performance improvements.

These results also tell us that Django-IDOM can first paint roughly 18 components per second (using the development webserver).

@Archmonger Archmonger closed this Dec 23, 2021
@rmorshea
Copy link
Contributor

Thanks for looking into this!

@Archmonger Archmonger deleted the threaded-dispatcher-loop branch July 3, 2022 08:49
@Archmonger Archmonger mentioned this pull request Aug 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants