Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread-local storage: handle failure cases #5722

Merged
merged 4 commits into from May 13, 2023
Merged

Conversation

ethomson
Copy link
Member

@ethomson ethomson commented Dec 9, 2020

We never error-check our "threadstate" thread-local storage, where we keep a few reusable buffers and the thread's current error message. Make sure that we have a threadstate data instance before reading to it or writing from it.

Depends on #5720

@arroz
Copy link
Contributor

arroz commented Dec 9, 2020

Hey Edward,

Since you're working on the thread-local stuff, I've hit recently a problem related to freeing a statically allocated error message when a thread is disposed of. I'm testing a small patch although I'm not sure if it could be made better (I'm not yet familiar enough with the code base). Maybe you can consider this for this patch as well. Here's my description of the problem:

Fix for thread cleanup crash.

The following sequence causes a crash:

Connect to an HTTPS server with an invalid certitficate;
Having the certCallback returning 0, meaning it’s actually valid;
Going ahead with an otherwise error free connection;
Waiting for the thread to be disposed of.

Freeing the global state message causes an error as it tries to free statically allocated memory (turns out to be
git_buf__initbuf). This is because after the buffer is cleaned, that value was copied to error->message.
This patch sets error->message to NULL in this situation, avoiding the crash.

And the patch I wrote:

index b34aa3abb..3cc003fdd 100644
--- a/src/errors.c
+++ b/src/errors.c
@@ -25,7 +25,11 @@ static void set_error_from_buffer(int error_class)
 	git_error *error = &GIT_GLOBAL->error_t;
 	git_buf *buf = &GIT_GLOBAL->error_buf;
 
-	error->message = buf->ptr;
+	if (buf->ptr != NULL && buf->ptr[0] == '\0') {
+        error->message = NULL;
+    } else {
+        error->message = buf->ptr;
+    }
 	error->klass = error_class;
 
 	GIT_GLOBAL->last_error = error;

It does prevent the crash, but it's probably not the best way to solve it.

@arroz
Copy link
Contributor

arroz commented Dec 9, 2020

Just to clarify: my doubt on the patch is I don't know if it's a more general problem (more statically allocated strings being placed on the thread-local error) or just a one off. Hence maybe it's appropriate to tackle it on this PR.

@ethomson
Copy link
Member Author

Hi @arroz - very interesting. Can you clarify what you mean when you say that the thread is disposed? Are you kicking off a new thread to run some libgit2 operations and then it's ending, and the thread exists...

What is calling git_error_last in this case? The original thread? Has that thread ever called a libgit2 operation?

@arroz
Copy link
Contributor

arroz commented Dec 10, 2020

Hey @ethomson!

Yeah: I'm using libgit2 in a macOS app. When a thread sits idle for a few seconds, the system (GCD-related stuff, I suppose) will dispose of it, and that will cause the cb__free_status to run. Here's a stack trace of the crash (the error.c line number is not accurate as I have my patch there commented):

Thread 3#0	0x00007fff2028f7d8 in __abort ()
#1	0x00007fff2028f72f in abort ()
#2	0x00007fff20170430 in malloc_vreport ()
#3	0x00007fff201734c8 in malloc_report ()
#4	0x0000000107d044cc in git__global_state_cleanup [inlined] at libgit2/src/global.c:63
#5	0x0000000107d044b9 in cb__free_status at libgit2/src/global.c:243
#6	0x000000010895319d in _pthread_tsd_cleanup ()
#7	0x000000010895584d in _pthread_exit ()
#8	0x0000000108952f94 in _pthread_wqthread_exit ()
#9	0x0000000108951bb1 in _pthread_wqthread ()
#10	0x0000000108950ae3 in start_wqthread ()

IIRC the error is added to the thread state in the httpclient.c check_certificate function. I think the lastError set by st->certificate contains a statically allocated '\0' that ends up being captured, and causes the crash when git__global_state_cleanup tries to deallocate it. I can't recall all the details, but it was something like that. :) Let me know if you want me to dig further.

I just confirmed the crash steps again. The callback needs to be called in a situation where openssl believes the certificate is invalid (meaning "valid" is false when the user callback is called) and the callback must override and return 0 to indicate the certificate is valid. All I have to do after that happens is wait a few seconds, and my app will crash with the above stack trace.

@arroz
Copy link
Contributor

arroz commented Dec 10, 2020

A little more detail: when it crashes in git__global_state_cleanup, specifically git__free(st->error_t.message);, st->error_t.message is pointing to the git_buf__initbuf. That causes the crash as git_buf__initbuf is statically allocated.

I don't know yet what puts that in the st->error_t.message. Maybe related to git_error_state_capture being called without the balancing git_error_state_restore, as the returned value from the callback is not GIT_PASSTHROUGH.

@arroz
Copy link
Contributor

arroz commented Dec 10, 2020

OK, here's the whole sequence:

  1. check_certificate calls git_stream_certificate which puts an error in the thread state, with a valid error message.
  2. git_stream_certificate calls git_error_state_capture.
  3. git_error_state_capture ends with a call to git_error_clear.
  4. git_error_clear checks if the global last_error is null. It's not, so it calls set_error(0, NULL).
  5. set_error ends with a call to set_error_from_buffer.
  6. set_error_from_buffer will grab &GIT_GLOBAL->error_buf->ptr, which at that point already points to git_buf__initbuf and puts it in the &GIT_GLOBAL->error_t.message.

If the certificate is approved by the callback, nothing else touches &GIT_GLOBAL->error_t. Later, when the OS decides to get rid of the thread, its cleanup method will try to dealloc error_t.message, which still points to git_buf__initbuf and crashes.

So, this is where I'm not sure what to do, since I still don't know the whole mental model around how libgit2 manages errors and the data stored in the thread local. My patch sort of works by chance. I probably should compare the ptr to git_buf__initbuf instead, but it's still kinda hacky. I'm sure there's a better fix, hence I'm taking this to the experts, and that's where you come in! :) I'm not sure if the fix should be part of this PR or not, it depends on the scope you want to give it.

@arroz
Copy link
Contributor

arroz commented Dec 10, 2020

And I forgot to answer part of your original question 😄 Yeah, I'm running a git operation (just a remote ls) on a separate thread (that gets created by Combine and the whole GCD system, but that shouldn't be important). I was testing the certificate handling, and have a web server with a self signed certificate running on a Pi. So the operation runs on that thread, getting the invalid certificate from the Pi which the callback then accepts, and then the thread sits there idle until the OS terminates it about 5-10 seconds later.

Base automatically changed from master to main January 7, 2021 10:09
Thread-local storage data may fail to initialize; in this case, do not
try to set the error message into it.  When the thread state has not
been initialized, return a hardcoded message to that affect.
git_oid_tostr_s could fail if thread-local state initialization fails.
In that case, it will now return `NULL`.  Callers should check for
`NULL` and propagate the failure.
Now that we've reduced the usage of GIT_THREADSTATE, remove it entirely
in favor of git_threadstate_get().
@ethomson ethomson merged commit 905e4d1 into main May 13, 2023
26 checks passed
@ethomson ethomson deleted the ethomson/tlsdata_fail branch May 13, 2023 13:33
@ethomson ethomson added the bug label Jul 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants