Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor Performance of MariaDB in Multi-thread Scenario #853

Closed
qijiax opened this issue Aug 22, 2022 · 42 comments
Closed

Poor Performance of MariaDB in Multi-thread Scenario #853

qijiax opened this issue Aug 22, 2022 · 42 comments

Comments

@qijiax
Copy link

qijiax commented Aug 22, 2022

Description of the problem

We use sysbench to made a benchmark on comparing the performance of MariaDB on non-gramine, gramine-direct and gramine-sgx. We use different thread in sysbench to modify different workload. The statistics of TPS shows a poor performance of MariaDB in Gramine :

Threads  1 2 4 8 16 32 64
Non-Gramine 1471.55 2893.56 5793.04 11036 22239.17 43167.14  
Gramine-Direct 1220.77 2395.46 4482.95 5936.35 5936.22 5359.13  
Gramine-SGX 802.45 1407.57 2873.46 4293.81 5897.95 3048.24 1549.59
Gramine-SGX w/ rpc_thread 1270.04   3698.22        

In 1 or 2 threads, gramine-sgx has 40-50% performance drop compare with non-gramine. We consider that is the overhead of gramine and SGX. However, with the threads growing, TPS of the gramine reaches its top and drop rapidly.
We tied setup the rpc_thread, but that has no help on the max TPS.

Is the bottleneck in gramine or SGX enclave? How can we avoid that?

The perf top record:
gramine-sgx 8 threads:
Picture1
gramine-sgx 32 threads:
Picture2
gramine-sgx 64 threads:
Picture3

Steps to reproduce

System: CentOS8

Setup steps: refer to link .

Some setting in manifest:

loader.insecure__use_cmdline_argv = true
sys.enable_sigterm_injection = true
sgx.nonpie_binary = true
sgx.enclave_size = "64G"
sgx.thread_num = 64
sgx.rpc_thread_num = 64
sgx.require_avx = true
sgx.require_avx512 = true
sgx.require_pkru = true
sgx.require_amx = true
libos.check_invalid_pointers = false
sys.stack.size = "16M"
sgx.preheat_enclave = true
loader.pal_internal_mem_size = "32G"
sgx.file_check_policy = "allow_all_but_log"

@svenkata9
Copy link
Contributor

I think this is due to the involvement of frequent file access related system calls that need to cross the trusted boundary and served by the untrusted. Also, as we discussed via email, the unavailability of unix domain sockets also adds additional overhead.

Can you please capture strace log to confirm?

@sahason

@dimakuv
Copy link
Contributor

dimakuv commented Aug 22, 2022

Interesting performance numbers. Thanks for doing this effort @qijiax.

  1. First, I would concentrate on the poor performance of gramine-direct. Already with 2 threads, its max achievable performance is 2395.46 / 2893.56 = 0.83, or 83% of native performance. And it only gets worse with 4, 8, ... threads. So, I would analyze the performance of gramine-direct first, without the additional effect of gramine-sgx.
  2. You use the non-instrumented perf record on gramine-sgx, which only captures the non-SGX-enclave behavior. What you really want to do is to capture the SGX-enclave behavior, and for this you need to enable Gramine accordingly, please see https://gramine.readthedocs.io/en/latest/performance.html#sgx-profiling

@qijiax
Copy link
Author

qijiax commented Aug 22, 2022

Interesting performance numbers. Thanks for doing this effort @qijiax.

  1. First, I would concentrate on the poor performance of gramine-direct. Already with 2 threads, its max achievable performance is 2395.46 / 2893.56 = 0.83, or 83% of native performance. And it only gets worse with 4, 8, ... threads. So, I would analyze the performance of gramine-direct first, without the additional effect of gramine-sgx.
  2. You use the non-instrumented perf record on gramine-sgx, which only captures the non-SGX-enclave behavior. What you really want to do is to capture the SGX-enclave behavior, and for this you need to enable Gramine accordingly, please see https://gramine.readthedocs.io/en/latest/performance.html#sgx-profiling

I grab the perf record on sgx-direct. There is a spin lock there.

1 thread:
direct 1thread
4 threads:
direct 4thread
16 threads:
direct 16thread

@boryspoplawski
Copy link
Contributor

@qijiax What build are you using? Could you try current master with --buildtype=debugoptimized and then add --call-graph dwarf to perf record? Would be useful to have call graphs at the interesting points.

I grab the perf record on sgx-direct. There is a spin lock there.

This is spinlock inside host kernel. It really doesn't matter (at least for now). Btw what kernel do you use exactly?

Judging from these traces it's most likely high contention on some lock in LibOS (which translates to PalEvent{Wait,Set} -> host futex)

@qijiax
Copy link
Author

qijiax commented Aug 22, 2022

I think this is due to the involvement of frequent file access related system calls that need to cross the trusted boundary and served by the untrusted. Also, as we discussed via email, the unavailability of unix domain sockets also adds additional overhead.

Can you please capture strace log to confirm?

@sahason

I tied run the sysbench in the same enclave as the server by warping up the command into a script. The performance has about 10% improvement, but still far away from non-gramine.
As my reply to dimakuv, the performance gap between non-gramine and gramine-direct seems because of the spin lock.
From the hotspot of gramine-sgx, the performance per thread is highly related to the percentage of "do_syncall". So I pretty sure the poor SGX performance is because of the same reason.
sgx multi-thread

@boryspoplawski
Copy link
Contributor

As my reply to dimakuv, the performance gap between non-gramine and gramine-direct seems because of the spin lock.

What spinlock? Why do you keep talking about some spinlock?
Also these traces do not make sense, they only measure untrusted part. See @dimakuv comment

@qijiax
Copy link
Author

qijiax commented Aug 22, 2022

@qijiax What build are you using? Could you try current master with --buildtype=debugoptimized and then add --call-graph dwarf to perf record? Would be useful to have call graphs at the interesting points.

I grab the perf record on sgx-direct. There is a spin lock there.

This is spinlock inside host kernel. It really doesn't matter (at least for now). Btw what kernel do you use exactly?

Judging from these traces it's most likely high contention on some lock in LibOS (which translates to PalEvent{Wait,Set} -> host futex)

I'm now using the latest master branch. And I using 5.17.0-intel-next+ kernel.
Can these frame graphs help you?
With the growing of the threads, percentage of MariaDB drops.

Snipaste_2022-08-22_19-20-04
Snipaste_2022-08-22_19-19-25
Snipaste_2022-08-22_19-20-54

@boryspoplawski
Copy link
Contributor

No, they don't really help. All we can see is that there is a lot of host level futex calls, which we already knew before.

@dimakuv
Copy link
Contributor

dimakuv commented Aug 22, 2022

@qijiax The flame graphs show that there is indeed some problem with futexes (it is clearly seen on the last graph with 16 threads).

However, these flame graphs do not show from where inside Gramine these futex usages originate. The graphs only show [unknown], which is not helpful. (I don't know why flame graphs can't "grab" Gramine debug symbols, but this is irrelevant for now.)

Could you try current master with --buildtype=debugoptimized and then add --call-graph dwarf to perf record? Would be useful to have call graphs at the interesting points.

I would kindly ask to try this suggestion from @boryspoplawski. This will give us some insights into what part of Gramine calls these futexes.

@svenkata9
Copy link
Contributor

@qijiax MariaDB server can run with just 2G enclave size. Could you reduce the enclave size in the manifest. And also the threads to 128? I think that could be the cause for futexes.

@svenkata9
Copy link
Contributor

I could repro the perf issue - adding strace and vtune dat
strace.txt
vtune_results_hotspots.txt

@qijiax
Copy link
Author

qijiax commented Aug 24, 2022

@boryspoplawski @dimakuv Here's the hotspot on gramine-direct, with Gramine built with --buildtype=debugoptimized and using --call-graph dwarf in perf record.
direct perf call-graph

I don't know why flame graphs can't "grab" Gramine debug symbols, but this is irrelevant for now.

There is an error when I set sgx.debug = true, sgx.profile.enable = "main" and sgx.profile.with_stack = true to the manifest.
error: sgx_profile_report_elf([vdso_libos]): realpath failed

@dimakuv
Copy link
Contributor

dimakuv commented Aug 24, 2022

Looks good now. Could you look deeper into the "16 threads" report? In particular, could you unwrap the first 2-3 items, so that we can see the stack traces (= the callers of these funcs libos_syscall_entry, libos_emulate_syscall, etc.)?

@qijiax
Copy link
Author

qijiax commented Aug 24, 2022

Looks good now. Could you look deeper into the "16 threads" report? In particular, could you unwrap the first 2-3 items, so that we can see the stack traces (= the callers of these funcs libos_syscall_entry, libos_emulate_syscall, etc.)?

gramine-direct perf call-graph 16thread.txt

@dimakuv
Copy link
Contributor

dimakuv commented Aug 24, 2022

Thanks @qijiax, now this gives us a lot of interesting info.

I am sure the problem is in our sub-optimal locking during send/recv on TCP/IP sockets. One exerpt from the perf-report:

         - 41.87% libos_syscall_recvfrom
            - 38.67% do_recvmsg
               - 38.50% recv
                  - 16.67% malloc
                     - slab_alloc (inlined)
                        - 9.06% lock (inlined)

So we have this malloc in recv() callback of TCP/IP:

struct pal_iovec* pal_iov = malloc(iov_len * sizeof(*pal_iov));

Which calls the Slab allocator of LibOS, which grabs the lock:

SYSTEM_LOCK();

The SYSTEM_LOCK macro in this case is this global lock:

#define SYSTEM_LOCK() lock(&slab_mgr_lock)

This is an interesting avenue for optimizations. Our memory allocator is too dumb and uses global locking on every malloc/free, and we have a lot of malloc/free in the send/recv LibOS paths. @boryspoplawski @mkow @kailun-qin What do you think?

@svenkata9
Copy link
Contributor

I tried couple of things -
With restricting to one NUMA node and using OMP options OMP_NUM_THREADS=32 KMP_AFFINITY=granularity=fine,verbose,compact,1,0 KMP_BLOCKTIME=1 KMP_SETTINGS=1 DEBUG=0 16T did little better and 32T did 2x better (~11000 tps). But post that with 64 threads no improvement and it comes down drastically. But even with this 2x improvement for 32T, gramine-sgx's throughput is only 25% of what native run yields.

Further, I when I used tcmalloc and that gives an additional boost of ~500 tps.

@svenkata9
Copy link
Contributor

Let me add this observation also - the above one is with gramine-sgx. None of these helped gramine-direct. The number remains very poor. So, the above bottlenecks may not be the same for gramine-sgx

@qijiax
Copy link
Author

qijiax commented Aug 24, 2022

With the suggestion from @svenkata9 , I re-benchmark on my environment.

Threads 1 2 4 8 16 32 64
non-gramine 1625.24 3214.88 6360 12646.55 24800.75 46771.27 67288.27
gramine-direct 1457.85 2765.94 4896.05 7745.49 10147.87 8660.04 3959.38
gramine-sgx 914.35 1727.94 3162.01 5278.77 8319.01 10798 4520.98

This result meet Senkar's observation. The performance is scaling from 1-16 threads, but another bottleneck shown when threads exceed 32.

@svenkata9
Copy link
Contributor

svenkata9 commented Aug 24, 2022

This is an extract from lscpu in my system

NUMA node0 CPU(s):               0-35,72-107
NUMA node1 CPU(s):               36-71,108-143

There is a strong correlation here between the NUMA cores and the performance. When threads passed to sysbench exceeds 35 (cores per socket is 36), I see the tps starts dropping.

@qijiax Could you try the same in your system? I think your system has 56 cores per socket. Perhaps, you can try sysbench --threads=55 ... and see if this theory is correct in your side?

BTW, I could achieve the same results with KMP_AFFINITY=disabled numactl --cpunodebind=0 --membind=0 gramine-sgx mysqld

@boryspoplawski
Copy link
Contributor

The problem is as @dimakuv mentioned, you cannot work around it using any kind of cpu pinning.

The most straight forward thing to do is to remove:

/* This could be just `struct iovec` from Linux, but we don't want to set a precedent of using Linux
(or at least make sure it's binary compatible and force a type cast). This would get rid of all the mallocs and should bring down the overhead. Let me create a PR for you to test.

@boryspoplawski
Copy link
Contributor

@svenkata9
Copy link
Contributor

In my quick testing, this doesn't make the numbers different for gramine-direct. For gramine-sgx it makes it little worse than the numbers that @qijiax mentione first. With the OMP_NUM_THREADS option even, the numbers are little worse than what I see in my system (with this branch)

@boryspoplawski
Copy link
Contributor

It doesn't make any sense, this commit shouldn't worsen things, that shouldn't even be possible...
Can you provide perf report with call graph (similar to before)?

@qijiax
Copy link
Author

qijiax commented Aug 25, 2022

Thanks for the patch @boryspoplawski .
I am using the command numactl --cpubind=0 --membind=0 gramine-direct mariadbd --user=root
I have a similar result as Sankar's that the performance would have a small drop, but only when threads exceed 32. With the patch, it reaches a higher maximum performance and the performance scaling is better.

Threads 1 2 4 8 16 32 64
gramine-direct 1473.38 2815.31 5400.74 9691.98 12125.77 7306.34 3682.22
gramine-sgx 917.08 1790.93 3505.13 6787.85 12531.68 7782.85 4211.91

I recorded the perf info of gramine-direct in 32T:

- 90.64% libos_syscall_entry                                                
   - 90.60% libos_emulate_syscall                                           
      - 51.46% libos_syscall_clock_gettime                                  
         - 50.64% is_in_adjacent_user_vmas                                  
              49.53% spinlock_lock (inlined)                                
            + 1.10% _traverse_vmas_in_range                                 
         - 0.81% _PalSystemTimeQuery                                        
            - do_syscall (inlined)                                          
               - 0.73% entry_SYSCALL_64_after_hwframe                       
                  - 0.72% do_syscall_64                                     
                       0.59% syscall_enter_from_user_mode                   
      - 24.51% libos_syscall_recvfrom                                       
         - 22.29% is_in_adjacent_user_vmas                                  
              21.81% spinlock_lock (inlined)                                
         + 1.89% do_recvmsg                                                 
      - 12.86% libos_syscall_sendto                                         
         - 10.26% is_in_adjacent_user_vmas                                  
              10.04% spinlock_lock (inlined)                                
         + 2.39% do_sendmsg                                                 
      + 1.64% libos_syscall_poll                                            

The percentage of do_sendmsg and do_recvmsg decrease a lot, but now it becomes is_in_adjacent_user_vmas

@dimakuv
Copy link
Contributor

dimakuv commented Aug 25, 2022

@qijiax Can you set libos.check_invalid_pointers = false in your manifest and re-run? This is_in_adjacent_user_vmas() function is called to check for invalid pointers in syscall arguments.

See https://gramine.readthedocs.io/en/latest/manifest-syntax.html#check-invalid-pointers

@dimakuv
Copy link
Contributor

dimakuv commented Aug 25, 2022

libos.check_invalid_pointers = false

Oh, wait, looks like you already have this option set? This cannot be true. Please verify your manifest file again. I don't think you correctly set this option.

@qijiax
Copy link
Author

qijiax commented Aug 25, 2022

Oh, wait, looks like you already have this option set? This cannot be true. Please verify your manifest file again. I don't think you correctly set this option.

@dimakuv You are right, this param was not set. It was commented last time for debuging.
After adding this to manifest, the `is_in_adjacent_user_vmas()' hotspot gone and performance improves. The TPS now is:

Threads 8 16 32 64
before  9691.98 12125.77 7306.34 3682.22
check_invalid_pointers = false 9701.95 14954.93 10903.3 5695.37

There is still a spin_lock in 64T, the perf record:

93.62% libos_syscall_entry                                  
 - 93.59% libos_emulate_syscall                             
    - 60.96% libos_syscall_recvfrom                         
       - 60.82% do_recvmsg                                  
          - 60.46% recv                                     
             - 60.45% recv                                  
                - 30.09% malloc                             
                   - 30.07% slab_alloc (inlined)            
                        29.79% spinlock_lock (inlined)      
                - 29.72% free                               
                   - slab_free (inlined)                    
                        29.65% spinlock_lock (inlined)      
                + 0.64% do_syscall (inlined)                
    - 31.88% libos_syscall_sendto                           
       - 31.63% do_sendmsg                                  
          - 31.62% send                                     
             - 31.61% send                                  
                - 15.10% malloc                             
                   + 15.09% slab_alloc (inlined)            
                - 14.92% free                               
                   + slab_free (inlined)                    

The hotspot come back to do_sendmsg and do_recvmsg. But it is different from pervious, instead of lock (inlined), it it now the spinlock_lock (inlined) under malloc/free

@dimakuv
Copy link
Contributor

dimakuv commented Aug 25, 2022

@boryspoplawski Any idea where these malloc/free come from now?

@boryspoplawski
Copy link
Contributor

Ah, yes, I only removed the translation in LibOS, forgot about PAL. @qijiax please try the same branch, I've pushed additional changes

@svenkata9
Copy link
Contributor

svenkata9 commented Aug 25, 2022

This improves the performance along with libos.check_invalid_pointers = false and numactl setting restricting it to one NUMA node. Till 64T, there is a good scaling as well (but still way off in perf w.r.t. native numbers).

But, without numactl, I don't see any improvement.

@qijiax
Copy link
Author

qijiax commented Aug 25, 2022

I also tried on my environment, the performance is improves. The TPS is 17217 in 32T, but drops to 16417 in 64T.
Below is the perf record of 64T gramine-direct. The hotspot do_recvmsg and do_sendmsg is removed. However, get_fd_handle shows up, and libos_emulate_syscall is still high.

82.36% libos_syscall_entry                                       
 - 82.28% libos_emulate_syscall                                  
    - 52.49% libos_syscall_recvfrom                              
       - 51.36% get_fd_handle                                    
          - 28.61% lock (inlined)                                
             - 28.28% _PalEventWait                              
                - 20.74% do_syscall (inlined)                    
                   + 20.70% entry_SYSCALL_64_after_hwframe       
                  7.14% spinlock_lock (inlined)                  
          - 22.64% unlock (inlined)                              
             + 22.58% _PalEventSet                               
       - 1.10% do_recvmsg                                        
          + 0.99% recv                                           
    - 28.82% libos_syscall_sendto                                
       - 25.54% get_fd_handle                                    
          - 14.22% lock (inlined)                                
             - 14.04% _PalEventWait                              
                - 10.15% do_syscall (inlined)                    
                   - 10.12% entry_SYSCALL_64_after_hwframe       
                      + 10.12% do_syscall_64                     
                  3.70% spinlock_lock (inlined)                  
          - 11.27% unlock (inlined)                              
             + 11.23% _PalEventSet                               
       - 3.25% do_sendmsg                                        
          + 3.21% send                                           
    + 0.63% libos_syscall_clock_gettime                          

@boryspoplawski
Copy link
Contributor

libos_emulate_syscall is entrypoint to a syscall, it will always show up.
For the current perf graph, it's fd handling that the issue (basically converting fd to a handle). I can take a look at this some day, maybe convert it to some lockless structure, but unfortunately not now, I have other tasks to do.

@qijiax
Copy link
Author

qijiax commented Aug 26, 2022

libos_emulate_syscall is entrypoint to a syscall, it will always show up. For the current perf graph, it's fd handling that the issue (basically converting fd to a handle). I can take a look at this some day, maybe convert it to some lockless structure, but unfortunately not now, I have other tasks to do.

Thanks for helping @boryspoplawski . BTW, will this patch merge the the master branch?

@svenkata9
Copy link
Contributor

@qijiax How is it performing with gramine-sgx? In my setup, it performs well upto 76 threads after which it starts dropping down. Again, my results are with numactl restricting it to one node for CPU and memory. Without it, the performance remains poor.

sysbench --threads=76 --tables=8 --table-size=1000000 --rand-type=uniform --report-interval=10 --time=60  --mysql-user=root --mysql-password=test --mysql-port=3306 --mysql-host=127.0.0.1 --mysql-db=sbtest /usr/share/sysbench/oltp_read_only.lua run
sysbench 1.0.18 (using system LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 76
Report intermediate results every 10 second(s)
Initializing random number generator from current time


Initializing worker threads...

Threads started!

[ 10s ] thds: 76 tps: 23533.54 qps: 376589.27 (r/w/o: 329515.49/0.00/47073.78) lat (ms,95%): 5.37 err/s: 0.00 reconn/s: 0.00
[ 20s ] thds: 76 tps: 25831.34 qps: 413309.30 (r/w/o: 361646.01/0.00/51663.29) lat (ms,95%): 3.62 err/s: 0.00 reconn/s: 0.00
[ 30s ] thds: 76 tps: 25819.82 qps: 413117.80 (r/w/o: 361478.15/0.00/51639.65) lat (ms,95%): 3.55 err/s: 0.00 reconn/s: 0.00
[ 40s ] thds: 76 tps: 26062.72 qps: 417004.37 (r/w/o: 364879.04/0.00/52125.33) lat (ms,95%): 3.55 err/s: 0.00 reconn/s: 0.00
[ 50s ] thds: 76 tps: 25602.93 qps: 409640.21 (r/w/o: 358434.45/0.00/51205.76) lat (ms,95%): 3.62 err/s: 0.00 reconn/s: 0.00
[ 60s ] thds: 76 tps: 25759.87 qps: 412161.90 (r/w/o: 360642.56/0.00/51519.34) lat (ms,95%): 3.55 err/s: 0.00 reconn/s: 0.00
SQL statistics:
    queries performed:
        read:                            21369418
        write:                           0
        other:                           3052774
        total:                           24422192
    transactions:                        1526387 (25432.48 per sec.)
    queries:                             24422192 (406919.76 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)

General statistics:
    total time:                          60.0111s
    total number of events:              1526387

Latency (ms):
         min:                                    1.59
         avg:                                    2.99
         max:                                   50.20
         95th percentile:                        3.75
         sum:                              4557869.54

Threads fairness:
    events (avg/stddev):           20084.0395/58.70
    execution time (avg/stddev):   59.9720/0.00

@qijiax
Copy link
Author

qijiax commented Aug 26, 2022

@svenkata9 I can't reach your performance on 76 threads. I took a detailed testing on current patch.

Threads 1 2 4 8 16 32 64
non-gramine 1625.24 3214.88 6360 12646.55 24800.75 46771.27 67288.27
gramine-direct 1494.82 2901.3 5646.37 10371.68 12942.06 15539.51 17982.64
gramine-sgx 945.21 1847.56 3600.19 6924.44 12633.09 21653.21 15422.34

Note: Gramine runs in one NUMA domain numactl --cpubind=0 --membind=0 gramine-direct mariadbd --user=root

For gramine-direct, performance scaling from 1-8 threads.
For gramine-sgx, performance scaling from 1-16 threads.

An interesting observation that in 32T gramine-sgx performs better than gramine-direct.
The perf graph of gramine-direct shows a _libos_syscall_poll hotspot. That could be the next bottleneck of Gramine.
image

@dimakuv
Copy link
Contributor

dimakuv commented Aug 26, 2022

Thank you @qijiax and @svenkata9 for this analysis! It will help us work on the bottlenecks, one by one.

I believe we can merge @boryspoplawski's patch into Gramine, just need to make it better. In principal, I don't see problems with being binary-compatible with the Linux structs, where it makes significant perf difference.

@boryspoplawski
Copy link
Contributor

What build are you using?

I'm now using the latest master branch.

@qijiax I was just made aware that you are using a patched version of Gramine, yet you've never mentioned that. Due to that we cannot reason about performance, or even correctness of Gramine.

@qijiax
Copy link
Author

qijiax commented Aug 28, 2022

What build are you using?

I'm now using the latest master branch.

@qijiax I was just made aware that you are using a patched version of Gramine, yet you've never mentioned that. Due to that we cannot reason about performance, or even correctness of Gramine.

What do you mean patched version? I used to install Gramine form yum, but the Gramine using now is built and installed from master branch and with your patch.

@svenkata9
Copy link
Contributor

@qijiax But, haven't you faced an error with Gramine and I asked you to apply the workaround through email -

Also, please make these edits in please just comment out checks in "libos/src/net/unix.c", if (force_nonblocking), lines 387-396 (for recv) and 327-336 (for send), on the current master, and you should be able to connect to the server with mysql client.

@qijiax
Copy link
Author

qijiax commented Sep 13, 2022

libos_emulate_syscall is entrypoint to a syscall, it will always show up. For the current perf graph, it's fd handling that the issue (basically converting fd to a handle). I can take a look at this some day, maybe convert it to some lockless structure, but unfortunately not now, I have other tasks to do.

@boryspoplawski Do you have any update on the fd lock?

@boryspoplawski
Copy link
Contributor

No, and unfortunately nothing will happen until after the next release, which is happening soon. Also this might be non-trivial amount of work.

@dimakuv
Copy link
Contributor

dimakuv commented Sep 25, 2024

This issue is 2.5 years old, I'm closing it.

@dimakuv dimakuv closed this as completed Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants