Git storage file system watcher is leaked #750

geigerzaehler · 2021-08-11T09:21:21Z

librad leaks file system watchers. When a librad Peer is dropped there remain threads which watch the file system. When a lot of Peers are created e.g. in tests this leads to “Too many open files” errors. This issue may also cause “Bad file descriptor errors” upstream is seeing on CI.

The leak is caused by spawned threads here

radicle-link/librad/src/net/protocol/cache.rs

Lines 89 to 92 in bfdc3ec

    
           thread::spawn({ 
        
               let filter = Arc::clone(&inner); 
        
               move || recache_thread(storage, filter, events, observe) 
        
           });

and here

radicle-link/librad/src/net/protocol/cache.rs

Lines 139 to 142 in bfdc3ec

    
           let bob = thread::spawn({ 
        
               let span = span.clone(); 
        
               let shutdown = Arc::clone(&shutdown); 
        
               let rebuild = Arc::clone(&rebuild);

The text was updated successfully, but these errors were encountered:

kim · 2021-08-11T10:38:26Z

Dropping a Peer does not drop the protocol state.

geigerzaehler · 2021-08-11T13:25:46Z

Dropping a Peer does not drop the protocol state.

Not sure about the exact mechanism but the daemon, if it is used correctly, does make an effort to properly shutdown the protocol.

kim · 2021-08-11T14:10:39Z

Not sure about the exact mechanism but the daemon, if it is used correctly, does make an effort to properly shutdown the protocol.

The `Peer` holds on to `Caches` in order to share it across protocol restarts. Since the protocol itself also holds a reference to it, the file watchers won't be dropped until the protocol is dropped. From the linked code, you can see that the thread stops once the events iterator is exhausted (which it is when the `Watcher` is dropped, and there is no other way because Rust developers like their RAII). Iow, if the protocol outlives the `Peer`, it will leak resources (and not limited to caches). Unfortunately, I don't know of a way to ensure this statically. The only ways I can think of to fix this would be to either not share the caches (which can incur significant startup delay on large repositories), persist caches to disk (uh-oh), or tie the lifetime of the `Peer` handle to that of the protocol it spawned (ie. disallow protocol restarts). The latter would mean that callers wishing to outlive the `Peer` need to manage a mutable reference to it, respectively guard it behind a mutex, which isn't great either.

geigerzaehler · 2021-08-11T15:02:22Z

Thanks for clarifying this. I guess this means that the daemon doesn’t shut down the protocol properly. Am I right in assuming that dropping the Peer and calling the function returned by Bound::accept() while completing the future shuts down everything? Or is there something missing from radicle_daemon::peer::Peer::start()?

kim · 2021-08-11T15:20:42Z

Am I right in assuming that dropping the Peer and calling the function returned by Bound::accept() while completing the future shuts down everything?

It should.

Or is there something missing from radicle_daemon::peer::Peer::start()?

Hm, I see that this actually consumes self which is not Clone. So I’m not sure how we could end up with a “dangling” Peer. Is this used somehow differently in upstream?

geigerzaehler · 2021-08-17T13:46:09Z

It should.

And it looks like it does on latest master. I’m able to reproduce the “Too many open files” error on a previous version of radicle-proxy but not anymore. Using lsof also confirmed that inotify was the issue.

geigerzaehler mentioned this issue Aug 11, 2021

CI fails with “Bad file descriptor” error radicle-dev/radicle-upstream#2216

Closed

geigerzaehler closed this as completed Aug 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Git storage file system watcher is leaked #750

Git storage file system watcher is leaked #750

geigerzaehler commented Aug 11, 2021

kim commented Aug 11, 2021

geigerzaehler commented Aug 11, 2021

kim commented Aug 11, 2021 via email

geigerzaehler commented Aug 11, 2021

kim commented Aug 11, 2021

geigerzaehler commented Aug 17, 2021

Git storage file system watcher is leaked #750

Git storage file system watcher is leaked #750

Comments

geigerzaehler commented Aug 11, 2021

kim commented Aug 11, 2021

geigerzaehler commented Aug 11, 2021

kim commented Aug 11, 2021 via email

geigerzaehler commented Aug 11, 2021

kim commented Aug 11, 2021

geigerzaehler commented Aug 17, 2021