Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

double free or corruption (!prev) bug #26

Closed
brizzbane opened this issue Feb 19, 2016 · 19 comments
Closed

double free or corruption (!prev) bug #26

brizzbane opened this issue Feb 19, 2016 · 19 comments

Comments

@brizzbane
Copy link

Created by request from saghul/pyuv#214.

To your question in that thread:

saghul
Seems to be a pycares issue, can you please open the issue there? The problem seems to be caused by parsing the answer, can you please tell me what domain you're trying to resolve?

Is there a way that I can figure out the domain that crashes it...besides just systematically narrowing down the domains that don't?

Right now I'm just processing a large list of email addresses for many different domains.

@saghul
Copy link
Owner

saghul commented Feb 19, 2016

Is there a way that I can figure out the domain that crashes it...besides just systematically narrowing down the domains that don't?

Right now I'm just processing a large list of email addresses for many different domains.

I'm afraid you'll have to "bisect" the list, since the crash happens in processing, which is triggered for more than one query at once.

@brizzbane
Copy link
Author

Ok, will report back! Thanks.

@brizzbane
Copy link
Author

I'm having trouble. I get to a super small list of domains. It crashes as expected. Then I double check before I post list here--and it doesn't crash on that list anymore.

Is it possible pycares is 'caching' the results of the domain or something?

Also if its not super important to narrow it down to one single domain, I can just post the list of domains that it definitely crashes on.

[edit] but can definitely find a domain, or small list of domains [ie not being lazy]. just not sure if above is an issue. before I query pycares, did a test on 'all.txt' domains, added domains to a 'tried.txt' before query, and after response a 'successful.txt'.

but when I double check that set of domains (when it crashes), it no longer crashes for that set.

@saghul
Copy link
Owner

saghul commented Feb 20, 2016

Nope, there is no caching. If you run the script multiple times, does it
crash?
On Feb 20, 2016 01:32, "brizzbane" notifications@github.com wrote:

I'm having trouble. I get to a super small list of domains. It crashes as
expected. Then I double check before I post list here--and it doesn't crash
on that list anymore.

Is it possible pycares is 'caching' the results of the crash or something?


Reply to this email directly or view it on GitHub
#26 (comment).

@brizzbane
Copy link
Author

When I get it narrowed down to like 8 domains. It will crash ~2 times, and then when I run it a third time, it won't (for that set of domains). And I'm trying to narrow down that small set (by re-running), so I can get a specific domain that it happens on..

@saghul
Copy link
Owner

saghul commented Feb 22, 2016

Doh. Can you send me the list of domains and the test program? (feel free to email if you don't want to post it here)

@brizzbane
Copy link
Author

So I'm working on putting together a self contained example.. It's not crashing when I don't have my custom client do something on the pycares callback. (i.e. only resolve mx records)

I'll figure it out. May have to email you some parts of it.

[edit].. Yea.. it's definitely something with combining my client and with the pycares callback. By itself, pycares runs fine. And it also runs fine if I don't use pycurl.PROXY options. It's gonna take me a bit to put together something that you can actually reproduce on your end.

@brizzbane
Copy link
Author

Have done more testing inside of gdb, see this on every crash thus far:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff3a6d63f in ?? () from /usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4

@brizzbane
Copy link
Author

this backtrace is from the longest runinng instance I've had my client going (by disabling most libcurl verification SSL things):

#0 0x00007ffff396d63f in ?? () from /usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4
#1 0x00007ffff396d814 in ?? () from /usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4
#2 0x00007ffff396d1ae in ?? () from /usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4
#3 0x00007ffff394b1db in ?? () from /usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4
#4 0x00007ffff395bd76 in ?? () from /usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4
#5 0x00007ffff395c6c3 in ?? () from /usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4
#6 0x00007ffff395c867 in curl_multi_socket_action () from /usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4
#7 0x00007ffff3bacd6c in ?? () from /usr/lib/python2.7/dist-packages/pycurl.x86_64-linux-gnu.so
#8 0x00000000004bbc2a in call_function (oparg=, pp_stack=0x7fffffffd420) at ../Python/ceval.c:4350
#9 PyEval_EvalFrameEx () at ../Python/ceval.c:2987
#10 0x00000000004b9416 in PyEval_EvalCodeEx () at ../Python/ceval.c:3582
#11 0x00000000004d5269 in function_call.lto_priv () at ../Objects/funcobject.c:526
#12 0x00000000004edbce in PyObject_Call (kw=0x0, arg=0x7ffff0276638, func=0x7ffff3e4b050) at ../Objects/abstract.c:2546
#13 instancemethod_call.lto_priv () at ../Objects/classobject.c:2602
#14 0x00000000004aefc3 in PyObject_Call (kw=0x0, arg=0x7ffff0292d90, func=0x7ffff02b5730) at ../Objects/abstract.c:2546
#15 PyObject_CallFunctionObjArgs () at ../Objects/abstract.c:2773
#16 0x00007ffff6b23ad0 in pyuv__timer_cb (handle=0x7ffff3dc2ae8) at src/timer.c:15
#17 0x00007ffff6b4bc16 in uv__run_timers (loop=0x7ffff6d69180 <default_loop_struct>) at src/unix/timer.c:165
#18 0x00007ffff6b3bbef in uv_run (loop=0x7ffff6d69180 <default_loop_struct>, mode=UV_RUN_ONCE) at src/unix/core.c:355
#19 0x00007ffff6b2ecb8 in Loop_func_run (self=0xe90a60, args=) at src/loop.c:62
#20 0x00000000004bbc2a in call_function (oparg=, pp_stack=0x7fffffffda90) at ../Python/ceval.c:4350
#21 PyEval_EvalFrameEx () at ../Python/ceval.c:2987
#22 0x00000000004b9416 in PyEval_EvalCodeEx () at ../Python/ceval.c:3582
#23 0x00000000004c16f3 in fast_function (nk=, na=, n=1, pp_stack=0x7fffffffdc90, func=0x7ffff3e4e578)
at ../Python/ceval.c:4446
#24 call_function (oparg=, pp_stack=0x7fffffffdc90) at ../Python/ceval.c:4371
#25 PyEval_EvalFrameEx () at ../Python/ceval.c:2987
#26 0x00000000004b9416 in PyEval_EvalCodeEx () at ../Python/ceval.c:3582
#27 0x00000000004e9f4f in PyEval_EvalCode (locals=0x7ffff7f63168, globals=0x7ffff7f63168, co=0x7ffff7e7da30)
at ../Python/ceval.c:669
#28 run_mod.lto_priv () at ../Python/pythonrun.c:1370
#29 0x00000000004e4b42 in PyRun_FileExFlags () at ../Python/pythonrun.c:1356
#30 0x00000000004e3406 in PyRun_SimpleFileExFlags () at ../Python/pythonrun.c:948
#31 0x0000000000492de3 in Py_Main () at ../Modules/main.c:640
#32 0x00007ffff6f13870 in __libc_start_main (main=0x492830

, argc=2, argv=0x7fffffffe0d8, init=,
fini=, rtld_fini=, stack_end=0x7fffffffe0c8) at libc-start.c:291
#33 0x0000000000492759 in _start ()

is above helpful at all? is it related to pyuv @ all or libcurl?

@saghul
Copy link
Owner

saghul commented Feb 23, 2016

The crash happens in a call to libcurl. I suggest you simplify your script to the bare minimum, and add compnents until it blows up, otherwise it's going to be tricky to narrow it down.

@brizzbane
Copy link
Author

ugh, sorry for all my posts. I think that its one thing, then I test more to try and confirm, and it ends up not holding true.

I'm almost certain (maybe 80%), that the issue is with my client trying to re-use connections (i.e. it does not hard crash if pycurl.FRESH_CONNECT, 1). If this assumption is valid, do you have any insight as to if the problem be with my client implementation, or a bug with libuv or libcurl?

Or, suggestions where to look now? Since it does not hard crash with pycurl.FRESH_CONNECT, I believe I'm at a tipping point of where the problem lies.

@saghul
Copy link
Owner

saghul commented Feb 23, 2016

It's very hard to tell. The issue you initially opened was a crash in c-ares, maybe because you called some function which is not allowed to run in a callback, like changing the nameservers. Now you get a crash in libgnutls, ...

I suggest you start replacing the components until you find the culprit. I know it's hard, but without a reliable way of reproducing the issue I can't really tell if there is a bug and in which library :-)

@brizzbane
Copy link
Author

sending you an email now. will include a zip file with the files, and also made you can account on my server, so you can just log in and run it (if you prefer). If you go that route, and need me to install anything as root, let me know!

so much appreciate your help.

@brizzbane
Copy link
Author

The IP/port combo in the code that I gave you is no longer valid. If you are OK w/using the account I created on my server, I'll make sure it always has one that crashes..

@saghul
Copy link
Owner

saghul commented Feb 25, 2016

Will do, but it'll take me a few days to find the time.

@brizzbane
Copy link
Author

Just curious, got your response in email.

From your experience debugging things.. Is it possible (or likely?) that the issue I originally posted in the pyuv repository, and the backtrace from the program you ran on the server, are the same 'root' bug? Originally in that thread you said that you thought that it might be because of the way pycares was parsing the response.. Then as I removed components, pycares wasn't even a factor..

I guess I'm asking because I wonder that if I'm able to find someone to track down this bug, if it will resolve this hard crash, or if I'm just in a giant rat's nest.

in case this is helpful to anyone in the future, this is that backtrace:

#0 0x00007ffff5b6e63f in ?? () from /usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4
#1 0x00007ffff5b6e814 in ?? () from /usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4
#2 0x00007ffff5b6e1ae in ?? () from /usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4
#3 0x00007ffff5b4c1db in ?? () from /usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4
#4 0x00007ffff5b5cd76 in ?? () from /usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4
#5 0x00007ffff5b5d6c3 in ?? () from /usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4
#6 0x00007ffff5b5d867 in curl_multi_socket_action () from /usr/lib/x86_64-linux-gnu/libcurl-gnutls.so.4
#7 0x00007ffff5dadd6c in ?? () from /usr/lib/python2.7/dist-packages/pycurl.x86_64-linux-gnu.so
#8 0x00000000004beb0a in call_function (oparg=, pp_stack=0x7fffffffd630) at ../Python/ceval.c:4350
#9 PyEval_EvalFrameEx () at ../Python/ceval.c:2987
#10 0x00000000004bc326 in PyEval_EvalCodeEx () at ../Python/ceval.c:3582
#11 0x00000000004d7fa9 in function_call.lto_priv () at ../Objects/funcobject.c:523
#12 0x00000000004f09be in PyObject_Call (kw=0x0, arg=0x7ffff20e77e8, func=0x7ffff1ea9500) at ../Objects/abstract.c:2546
#13 instancemethod_call.lto_priv () at ../Objects/classobject.c:2602
#14 0x00000000004b0633 in PyObject_Call (kw=0x0, arg=0x7ffff7e89cd0, func=0x7ffff1eb8820) at ../Objects/abstract.c:2546
#15 PyObject_CallFunctionObjArgs () at ../Objects/abstract.c:2773
#16 0x00007ffff6b23ad0 in pyuv__timer_cb (handle=0x7ffff1e893b0) at src/timer.c:15
#17 0x00007ffff6b4bc16 in uv__run_timers (loop=0x7ffff6d69180 <default_loop_struct>) at src/unix/timer.c:165
#18 0x00007ffff6b3bbef in uv_run (loop=0x7ffff6d69180 <default_loop_struct>, mode=UV_RUN_ONCE) at src/unix/core.c:355
#19 0x00007ffff6b2ecb8 in Loop_func_run (self=0xbed050, args=) at src/loop.c:62
#20 0x00000000004beb0a in call_function (oparg=, pp_stack=0x7fffffffdca0) at ../Python/ceval.c:4350
#21 PyEval_EvalFrameEx () at ../Python/ceval.c:2987
#22 0x00000000004bc326 in PyEval_EvalCodeEx () at ../Python/ceval.c:3582
#23 0x00000000004ecf9f in PyEval_EvalCode (locals=0x7ffff7f63168, globals=0x7ffff7f63168, co=0x7ffff7e78db0)
at ../Python/ceval.c:669
#24 run_mod.lto_priv () at ../Python/pythonrun.c:1370
#25 0x00000000004e7b72 in PyRun_FileExFlags () at ../Python/pythonrun.c:1356
#26 0x00000000004e6416 in PyRun_SimpleFileExFlags () at ../Python/pythonrun.c:948
#27 0x0000000000494a87 in Py_Main () at ../Modules/main.c:640
#28 0x00007ffff6f13870 in __libc_start_main (main=0x494490

, argc=2, argv=0x7fffffffe0e8, init=,
fini=, rtld_fini=, stack_end=0x7fffffffe0d8) at libc-start.c:291
---Type to continue, or q to quit---
#29 0x00000000004943b9 in _start ()

@saghul
Copy link
Owner

saghul commented Mar 1, 2016

I don't think tehy are related. The first thing I'd recommend is that you install the -dbg versions ob gnutls and curl, so the backtrace has actual symbols, instead of the ??, and start from there.

@brizzbane
Copy link
Author

I will get an example if they are not related.

@saghul
Copy link
Owner

saghul commented May 24, 2016

No follow-up, closing. If you manage to reliably reproduce it, holler and I'll reopen.

@saghul saghul closed this as completed May 24, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants