Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nmux crash (+fix) #40

Closed
szpajder opened this issue Sep 14, 2018 · 9 comments
Closed

nmux crash (+fix) #40

szpajder opened this issue Sep 14, 2018 · 9 comments

Comments

@szpajder
Copy link

Just came across this while deploying OpenWebRx on Arch Linux x86_64. The first client connects to nmux, then disconnects, the second client connects and then I get a 100% reproducible crash:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fd653522fcd in __memmove_ssse3 () from /usr/lib/libc.so.6
(gdb) bt
#0  0x00007fd653522fcd in __memmove_ssse3 () from /usr/lib/libc.so.6
#1  0x00005562be082fac in std::__copy_move<true, true, std::random_access_iterator_tag>::__copy_m<tsmthread_s*> (__result=<optimized out>,
    __last=<optimized out>, __first=<optimized out>) at /usr/include/c++/8.2.0/bits/stl_algobase.h:479
#2  std::__copy_move_a<true, tsmthread_s**, tsmthread_s**> (__result=<optimized out>, __last=<optimized out>, __first=<optimized out>)
    at /usr/include/c++/8.2.0/bits/stl_algobase.h:386
#3  std::__copy_move_a2<true, __gnu_cxx::__normal_iterator<tsmthread_s**, std::vector<tsmthread_s*, std::allocator<tsmthread_s*> > >, __gnu_cxx::__normal_iterator<tsmthread_s**, std::vector<tsmthread_s*, std::allocator<tsmthread_s*> > > > (__result=..., __last=..., __first=...)
    at /usr/include/c++/8.2.0/bits/stl_algobase.h:422
#4  std::move<__gnu_cxx::__normal_iterator<tsmthread_s**, std::vector<tsmthread_s*, std::allocator<tsmthread_s*> > >, __gnu_cxx::__normal_iterator<tsmthread_s**, std::vector<tsmthread_s*, std::allocator<tsmthread_s*> > > > (__result=..., __last=..., __first=...) at /usr/include/c++/8.2.0/bits/stl_algobase.h:487
#5  std::vector<tsmthread_s*, std::allocator<tsmthread_s*> >::_M_erase (__position=..., this=0x5562be48c1a0) at /usr/include/c++/8.2.0/bits/vector.tcc:163
#6  std::vector<tsmthread_s*, std::allocator<tsmthread_s*> >::erase (__position=..., this=0x5562be48c1a0) at /usr/include/c++/8.2.0/bits/stl_vector.h:1318
#7  tsmpool::remove_thread (this=0x5562be48c1a0, thread=0x5562be48c250) at tsmpool.cpp:52
#8  0x00005562be082873 in main (argc=<optimized out>, argv=<optimized out>) at nmux.cpp:206

This might be a clue:

tsmpool.cpp: In member function ‘int tsmpool::remove_thread(tsmthread_t*)’:
tsmpool.cpp:56:1: warning: no return statement in function returning non-void [-Wreturn-type]
 }

Indeed, I changed int tsmpool::remove_thread to void and voila, no more crashes.

@ha7ilm
Copy link
Owner

ha7ilm commented Sep 14, 2018

Thanks for finding that! Yes, I think remove_thread should be void. I don't know why I left that as int. Strange though that this makes the program crash, so maybe there's still a bug around. It looks like that it really crashes at the middle of the for loop, so maybe a nonexistent thread is deleted.

@mpbraendli
Copy link

Hi,
I also ran into this, changing the return value from int to void doesn't fix it however.

I compiled nmux with -g -fsanitize=address and get the following:

    #0 0x7fc31f51fcf8 in operator delete(void*, unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xebcf8)
    #1 0x55d3f34562bb in tsmpool::remove_thread(tsmthread_s*) /home/bram/ham/csdr/tsmpool.cpp:52
    #2 0x55d3f345449a in main /home/bram/ham/csdr/nmux.cpp:206
    #3 0x7fc31ed15b16 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x22b16)
    #4 0x55d3f34550e9 in _start (/usr/local/bin/nmux+0x50e9)

Address 0x0fff84cf31e0 is located in the high shadow area.
SUMMARY: AddressSanitizer: bad-free (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xebcf8) in operator delete(void*, unsigned long)
==15785==ABORTING

The following patch fixes this

diff --git a/nmux.h b/nmux.h
index 038bc51..1116075 100644
--- a/nmux.h
+++ b/nmux.h
@@ -24,18 +24,18 @@ typedef enum client_status_e
 } client_status_t;
 
 
-typedef struct client_s
+struct client_t
 {
 	struct sockaddr_in addr;
 	int socket;
 	int error; //set to non-zero on error (data transfer failed)
 	pthread_t thread;
-    tsmthread_t* tsmthread;
+	tsmthread_t tsmthread;
 	client_status_t status;
     //the following members are there to give access to some global variables inside the thread:
     tsmpool* lpool; 
     int sleeping;
-} client_t;
+};
 
 void print_exit(const char* why);
 void sig_handler(int signo);
diff --git a/tsmpool.cpp b/tsmpool.cpp
index d70e8ba..53ec340 100644
--- a/tsmpool.cpp
+++ b/tsmpool.cpp
@@ -1,4 +1,5 @@
 #include "tsmpool.h"
+#include <stdexcept>
 
 tsmpool::tsmpool(size_t size, int num) :
 	size(size), 
@@ -31,44 +32,46 @@ void* tsmpool::get_write_buffer()
 	return to_return;
 }
 
-tsmthread_t* tsmpool::register_thread()
+tsmthread_t tsmpool::register_thread()
 {
-	if(!ok) return NULL;
+	if(!ok) throw std::runtime_error("not ok");
 	pthread_mutex_lock(&this->mutex);
-	tsmthread_t* thread = new tsmthread_t();
-	thread->read_index = index_before(write_index);
+	tsmthread_t thread;
+	thread.id = next_free_id++;
+	thread.read_index = index_before(write_index);
 	threads.push_back(thread);
 	pthread_mutex_unlock(&this->mutex);
 	return thread;
 }
 
-int tsmpool::remove_thread(tsmthread_t* thread)
+void tsmpool::remove_thread(tsmthread_t& thread)
 {
 	pthread_mutex_lock(&this->mutex);
 	for(int i=0;i<threads.size();i++)
+	{
 		if(threads[i] == thread)
 		{
-			delete threads[i];
 			threads.erase(threads.begin()+i);
 			break;
 		}
+	}
 	pthread_mutex_unlock(&this->mutex);
 }
 
-void* tsmpool::get_read_buffer(tsmthread_t* thread)
+void* tsmpool::get_read_buffer(tsmthread_t& thread)
 {
 	pthread_mutex_lock(&this->mutex);
-	int* actual_read_index = (thread==NULL) ? &my_read_index : &thread->read_index;
-	if(*actual_read_index==index_before(write_index)) 
+	int& actual_read_index = thread.read_index;
+	if(actual_read_index==index_before(write_index))
 	{
 		if(TSM_DEBUG) fprintf(stderr, "grb: fail,"
-			"read_index %d is just before write_index\n", *actual_read_index);
+			"read_index %d is just before write_index\n", actual_read_index);
 		pthread_mutex_unlock(&this->mutex);
 		return NULL;
 	}
-	void* to_return = buffers[*actual_read_index];
-	*actual_read_index=index_next(*actual_read_index);
+	void* to_return = buffers[actual_read_index];
+	actual_read_index=index_next(actual_read_index);
 	pthread_mutex_unlock(&this->mutex);
-	if(TSM_DEBUG) fprintf(stderr, "grb: read_index = %d\n", *actual_read_index);
+	if(TSM_DEBUG) fprintf(stderr, "grb: read_index = %d\n", actual_read_index);
 	return to_return;
 }
diff --git a/tsmpool.h b/tsmpool.h
index d24cc4e..4f0205d 100644
--- a/tsmpool.h
+++ b/tsmpool.h
@@ -11,15 +11,21 @@
 
 using namespace std;
 
-typedef struct tsmthread_s
+struct tsmthread_t
 {
 	int read_index; //it always points to the next buffer to be read
-} tsmthread_t;
+    size_t id;
+
+    bool operator==(const tsmthread_t& other) {
+        return id == other.id;
+    }
+
+};
 
 class tsmpool
 {
 private:
-	vector<tsmthread_t*> threads;
+	vector<tsmthread_t> threads;
 	vector<void*> buffers;
 	int threads_cntr;
 	pthread_mutex_t mutex;
@@ -28,6 +34,7 @@ private:
 	int write_index; //it always points to the next buffer to be written
 	int lowest_read_index; //unused
 	int my_read_index; //it is used when tsmpool is used as a single writer - single reader circular buffer
+    size_t next_free_id = 0;
 
 public:
 	const size_t size;
@@ -35,9 +42,9 @@ public:
 	int is_ok();
 	tsmpool(size_t size, int num);
 	void* get_write_buffer();
-	tsmthread_t* register_thread();
-	int remove_thread(tsmthread_t* thread);
-	void* get_read_buffer(tsmthread_t* thread);
+	tsmthread_t register_thread();
+	void remove_thread(tsmthread_t& thread);
+	void* get_read_buffer(tsmthread_t& thread);
 	int index_next(int index) { return (index+1==num)?0:index+1; }
 	int index_before(int index) { return (index-1<0)?num-1:index-1; }
 };

I hope this compiles without having to specify -std=c++11 to the CXXFLAGS.

@fventuri
Copy link

fventuri commented Jan 4, 2019

I had a similar issue with Fedora 29 and nmux (in my case as a subprocess of OpenWebRX). My specific error message was: free(): double free detected in tcache 2
I applied the patch above by Matthias and now nmux does not crash and OpenWebRX runs as expected.
Thanks,
Franco

@ha7ilm
Copy link
Owner

ha7ilm commented Jan 24, 2019

My observations:

  • If I switch optimizations off (e.g. -O3), then this problem goes away.
  • If I fix all compiler warnings related to printf format mismatch, then this problem goes away.

Thanks @mpbraendli for debugging this, and the patch! To be honest, I don't understand yet why it works... I mean, why it fixes the problem.

I was thoroughly looking into my code, but I found no reason why couldn't I allocate memory dynamically here. The thread object is created once and erased once, so it looks correct. However, it seems like that with optimizations on and with printf warnings unfixed, the flow of the program is confused around remove_thread: the internal for loop is not executed correctly, and this somehow results in a double free or segfault. This looks a kind of problem that is especially hard to debug.

Back to printf-s: I've recently switched from a 32-bit distro to a 64-bit one. On the 32-bit distro %x printed a memory address correctly, and the program otherwise compiled without warnings and ran correctly. On 64-bit, it looks like that I'm expected to change this to %lx to correctly display memory addresses. The architecture-independent solution was to use %p everywhere.

As it is documented that format mismatch at fprintf might lead to undefined behaviour, I think it is possible this was the source of the problem. I've also fixed the void return value of remove_thread, thanks @szpajder for pointing that out. I've pushed the changes and please let me know if it works for you.

@mpbraendli
Copy link

I must admit, I also didn't understand what triggered the issue, but replacing manual resource management using pointers with C++ RAII already takes away many opportunities to shoot oneself in the foot. And the code is easier to reason about.

+1 on using %p to print memory addresses. Maybe that was causing UB. The compiler flag -Wformat=2 (gcc and clang) can be useful to detect such issues.

@cfenell
Copy link

cfenell commented Jan 24, 2019

Thanks, I have pulled the changes and it works perfectly now!

For info, the compiler warnings about vectorization and trigraphs still appear as in ### #19 but should not affect normal operation.

@szpajder
Copy link
Author

But does printing these pointer values serve any purpose in debugging in the first place?

And this:

fprintf(stderr, MSG_START "pthread_create() done, clients now: %d\n", (int)clients.size());

is a narrowing conversion. size_t won't always fit in an int. It probably won't ever fail in this particular place, however it's an indication that the programmer probably doesn't know what he's doing.

Seriously, build this with -Wall -Wextra at least once. It shows screens of warnings about various unsafe constructs still present in this code. The following seem to be of particular importance:

libcsdr.c: At top level:
libcsdr.c:1428:60: warning: multi-line comment [-Wcomment]
     { .code = 0b111101111,  .bitcount=9,    .ascii=0x5c }, //\

Yep, the next line is essentially a comment. It's not there.

In file included from libcsdr_wrapper.c:1:0:
libcsdr.c: In function ‘firdes_cosine_f’:
libcsdr.c:2480:1: warning: control reaches end of non-void function [-Wreturn-type]
 }
 ^
libcsdr.c: In function ‘firdes_rrc_f’:
libcsdr.c:2497:1: warning: control reaches end of non-void function [-Wreturn-type]
 }
 ^
libcsdr.c: In function ‘add_ff’:
libcsdr.c:2521:1: warning: control reaches end of non-void function [-Wreturn-type]
 }
 ^
libcsdr.c: In function ‘add_const_cc’:
libcsdr.c:2530:1: warning: control reaches end of non-void function [-Wreturn-type]
 }
 ^

Should these be void or return something?

libcsdr.c: In function ‘pack_bits_8to1_u8_u8’:
libcsdr.c:1823:15: warning: ‘output’ is used uninitialized in this function [-Wuninitialized]
         output<<=1;
         ~~~~~~^~~~

This computation is made on a random garbage.

csdr.c: In function ‘errhead’:
csdr.c:207:1: warning: control reaches end of non-void function [-Wreturn-type]
 }
 ^
csdr.c: In function ‘parse_env’:
csdr.c:417:1: warning: control reaches end of non-void function [-Wreturn-type]
 }

As above.

etc, etc.

Maybe they don't bite today, but they might bite tomorrow.

@ha7ilm
Copy link
Owner

ha7ilm commented Jan 25, 2019

it's an indication that the programmer probably doesn't know what he's doing

@szpajder You are using this code free of charge with no warranty, and I'm volunteering my time to support it here. On the other hand, thank you for your suggestions.

size_t won't always fit in an int.

In this application, I don't expect more than INT_MAX clients (and thus threads). If there will be more clients, then yes, it will overflow. This is for my internal debugging purposes anyway.

But does printing these pointer values serve any purpose in debugging in the first place?

Yes.

The following seem to be of particular importance:

That's a good find! Probably the [ Varicode character could not be decoded as a result.

Update: this has just been fixed in 17b8c7b.

warning: control reaches end of non-void function

Yes, those should be fixed as well.

This computation is made on a random garbage.

On that particular occasion it is not a problem.

@ha7ilm
Copy link
Owner

ha7ilm commented Feb 27, 2019

I think the original problem has been solved.
Let me know if it still persists for someone.

@ha7ilm ha7ilm closed this as completed Feb 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants