Skip to content
This repository has been archived by the owner on Dec 20, 2018. It is now read-only.

glibc: corrupted double-linked list #74

Closed
phillipp opened this issue Aug 3, 2015 · 19 comments
Closed

glibc: corrupted double-linked list #74

phillipp opened this issue Aug 3, 2015 · 19 comments

Comments

@phillipp
Copy link
Contributor

phillipp commented Aug 3, 2015

In one of your deployments we saw that all SSL websites stopped working. Attempts to connect failed:

curl -v https://www.[foobar].net/
* About to connect() to www.[foobar] port 443 (#0)
*   Trying 91.216.248.**... connected
* successfully set certificate verify locations:
*   CAfile: none
  CApath: /etc/ssl/certs
* SSLv3, TLS handshake, Client hello (1):
* Unknown SSL protocol error in connection to www.[foobar].net:443 
* Closing connection #0
curl: (35) Unknown SSL protocol error in connection to www.[foobar].net:443 

The bud log showed the following output:

(wrn) [9845] client 0x4c9acd0 on frontend SNI from json failed: "Failed to load or parse JSON: <SNI Response>"
*** glibc detected *** bud: corrupted double-linked list: 0x00000000116daa00 ***

Kernel ring buffer/dmesg:

[345812.084141] traps: bud[3256] general protection ip:5cad48 sp:7fffdcf1ca30 error:0 in bud[400000+2ae000]

A rebstart of bud solved the problem. What do you need to dig further into the problem?

@indutny
Copy link
Owner

indutny commented Aug 3, 2015

@phillipp thanks for reporting this!

Do you have a core dump of this crash?

@phillipp
Copy link
Contributor Author

phillipp commented Aug 3, 2015

Unfortunately not. What would be the best way to setup this up for the next time?

@odeke-em
Copy link
Contributor

odeke-em commented Aug 3, 2015

@phillipp if it is possible, would you mind giving the config or a redacted config. I just would like to be able to reproduce this.

@indutny
Copy link
Owner

indutny commented Aug 3, 2015

@phillipp I would do something like http://askubuntu.com/questions/53956/how-can-i-enable-core-dump . Please be aware that it will create large files per each crashing process. If you don't have much of them - it should not be a big deal, though.

@phillipp
Copy link
Contributor Author

phillipp commented Aug 3, 2015

The config is:

{
  "workers": 4,

  "restart_timeout": 250,

  "log": {
    "level": "info",
    "facility": "user",
    "stdio": true,
    "syslog": true
  },

  "availability": {
    "max_retries": 5,
    "retry_interval": 250,
    "death_timeout": 1000,
    "revive_interval": 2500
  },

  "frontend": {
    "interfaces": [
      { "port": 443, "host": "::" }
    ],
    "keepalive": 3600,
    "server_preference": true,
    "cert": "default.crt",
    "key": "default.key"
  },

  "user": "bud",
  "group": "bud",

  "backend": [{
    "port": 10010,
    "host": "127.0.0.1",
    "keepalive": 3600,
    "x-forward": true
  }],

  "sni": {
    "enabled": true,
    "port": 9000,
    "host": "127.0.0.1",
    "url": "/bud/sni/%s"
  },

  "stapling": {
    "enabled": false,
    "port": 9000,
    "host": "127.0.0.1",

    "url": "/bud/stapling/%s"
  },

  "contexts": []
}

@phillipp
Copy link
Contributor Author

We saw this error message again and despite bud is running with the ulimit here was no core dump taken. Will there really be taken a core dump on this error? bud is not crashing, just showing the message and behaving incorrectly after.

@indutny
Copy link
Owner

indutny commented Sep 30, 2015

@phillipp is it always preceded by that JSON warning? Do you have multiple logs for this?

@phillipp
Copy link
Contributor Author

phillipp commented Oct 1, 2015

Today we had another one. bud continues to run, as before.

(dbg) [24208] client 0x82dabe0 on backend ssl_cert_cb {2}
(dbg) [26505] client 0x11ff1f70 on frontend SSL_read() => -1
(dbg) [26505] client 0x11ff1f70 on frontend uv_write(1802) iovcnt: 1
(dbg) [26505] client 0x11ff1f70 on frontend immediate write
(dbg) [26505] client 0x11ff1f70 on frontend write_cb => 1802
(dbg) [26505] client 0x11ff1f70 on backend read_start
(dbg) [26505] client 0x11ff1f70 on frontend recycle
(dbg) [26505] client 0x11ff1f70 on frontend SSL_read() => -1
(dbg) [24208] client 0x82dabe0 on frontend SSL_read() => -1
(dbg) [24208] client 0x82dabe0 on frontend uv_write(5838) iovcnt: 1
(dbg) [24208] client 0x82dabe0 on frontend immediate write
(dbg) [24208] client 0x82dabe0 on frontend write_cb => 5838
(dbg) [24208] client 0x82dabe0 on backend read_start
(dbg) [24208] client 0x82dabe0 on frontend recycle
(dbg) [24208] client 0x82dabe0 on frontend SSL_read() => -1
*** glibc detected *** bud: corrupted double-linked list: 0x00000000084fe560 ***
(dbg) [26506] client 0x12dd6010 on frontend SSL_read() => -1
(dbg) [26506] client 0x12dd6010 on frontend uv_write(5275) iovcnt: 1
(dbg) [26506] client 0x12dd6010 on frontend immediate write
(dbg) [26506] client 0x12dd6010 on frontend write_cb => 5275
(dbg) [26506] client 0x12dd6010 on backend read_start

Core dumps are enabled and apport is running, but didn't save a report. So I'm thinking this is just a glibc warning/error. The core may not be dumped because the program is not crashing.

SO answers suggest that there is a race condition where multiple callers free an object at the same time.

@indutny
Copy link
Owner

indutny commented Oct 1, 2015

@phillipp the problem is that I don't use threads in bud at all. I have one idea that might be worth exploring.

Let's try building bud with Address Sanitizer. I have just pushed 1c45275 to the master branch. Please give a try to it:

./gyp_bud asan
make -C out/ -j8
./out/Release/bud # <- binary

@indutny
Copy link
Owner

indutny commented Oct 1, 2015

Ideally I expect it to give us more information on where this error is happening.

@indutny
Copy link
Owner

indutny commented Oct 1, 2015

Thanks!

@odeke-em
Copy link
Contributor

odeke-em commented Oct 1, 2015

Ooh this is so cool, I am acquainted with one or two of the Address Sanitizer authors so can't wait to see the results of using it.

@indutny
Copy link
Owner

indutny commented Nov 14, 2016

@phillipp I believe we may have fixed this in the recent release.

@indutny indutny closed this as completed Nov 14, 2016
@phillipp
Copy link
Contributor Author

                                                                                  I think I'll again grep the logs for this just to be sure. You think this could be related to the use-after-free?

@indutny
Copy link
Owner

indutny commented Nov 14, 2016

@phillipp it is very very likely. Please let me know if we haven't fixed it!

@phillipp
Copy link
Contributor Author

Yes, now I see that this looks like the same problem. LGTM!

@indutny
Copy link
Owner

indutny commented Nov 15, 2016

Fantastic! Still no crashes, right?

@phillipp
Copy link
Contributor Author

                                                                                  None so far. On none of the servers. Looks great, I think it's fixed! Thanks for the effort!

@indutny
Copy link
Owner

indutny commented Nov 16, 2016

Hooray! Two issues in a row 👍

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants