Skip to content

File descriptor leak causing connection stalls #315

@shazow

Description

@shazow

After some period of time, new people are unable to connect to the server.

I suspect there's a socket we're not closing somewhere. Possibly related to banned connections? (Maybe not, just guessing since we're getting spammed a lot by some bots.)

Failed to accept connection: accept tcp [::]:22: accept4: too many open files

This has happened several times over the past couple of years, but haven't been able to reproduce it super reliably. Will probably need to run with pprof enabled and try out various connection patterns to see what's causing a leak.

Also worth checking the open fds: lsof -nP -p $PID (thanks @Low-power)

Some related logging messages from the offending IPs that might be trigger:

Apr 18 10:55:55 ssh-chat2 ssh-chat[4726]: [sshd] 2019/04/18 10:55:55 [y.y.y.y:45830] Failed to handshake: read tcp x.x.x.x:22->y.y.y.y:45830: read: connection reset by peer
Apr 18 10:54:44 ssh-chat2 ssh-chat[4726]: [sshd] 2019/04/18 10:54:44 [y.y.y.y:59129] Failed to handshake: ssh: disconnect, reason 11:
Apr 18 11:11:51 ssh-chat2 ssh-chat[4726]: [sshd] 2019/04/18 11:11:51 [y.y.y.y:44004] Failed to handshake: read tcp x.x.x.x:22->y.y.y.y:44004: read: connection timed out
Apr 18 11:58:15 ssh-chat2 ssh-chat[4726]: [sshd] 2019/04/18 11:58:15 [y.y.y.y:41568] Failed to handshake: EOF
Apr 18 12:10:47 ssh-chat2 ssh-chat[4726]: [sshd] 2019/04/18 12:10:47 [y.y.y.y:58610] Failed to handshake: ssh: disconnect, reason 11:

A lot of connection reset by peer in general, need to reproduce this.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions