Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi-dict and multi-thread tests crash with a segmentation fault on macOS when built with pcre2 #1514

Open
ryandesign opened this issue Apr 25, 2024 · 2 comments

Comments

@ryandesign
Copy link
Contributor

With the latest code in master, make check fails on macOS if link-grammar is built with pcre2 support. The multi-dict and multi-thread tests crash with a segmentation fault, which doesn't happen if I disable pcre2 with the --with-regexlib=c configure argument. From the crash log, multi-dict crashed here:

Thread 6 Crashed:
0   libpcre2-8.0.dylib            	       0x1029e08a7 match + 49451
1   libpcre2-8.0.dylib            	       0x1029d4324 pcre2_match_8 + 4536
2   liblink-grammar.5.dylib       	       0x102886fbf reg_match + 39 (regex-morph.c:239) [inlined]
3   liblink-grammar.5.dylib       	       0x102886fbf match_regex + 207 (regex-morph.c:428)
4   liblink-grammar.5.dylib       	       0x1028b4b06 regex_guess + 12 (tokenize.c:377) [inlined]
5   liblink-grammar.5.dylib       	       0x1028b4b06 separate_word + 886 (tokenize.c:2681)
6   liblink-grammar.5.dylib       	       0x1028b4489 separate_sentence + 1257 (tokenize.c:3090)
7   liblink-grammar.5.dylib       	       0x1028ae13a sentence_split + 74 (sentence.c:93)
8   multi-dict                    	       0x1026fd5ba parse_one_sent(char const*) + 31 (multi-dict.cc:40) [inlined]
9   multi-dict                    	       0x1026fd5ba parse_sents(int, int) + 122 (multi-dict.cc:82)
10  multi-dict                    	       0x1026fd7e0 decltype(static_cast<void (*>(fp)(static_cast<int>(fp0), static_cast<int>(fp0))) std::__1::__invoke<void (*)(int, int), int, int>(void (*&&)(int, int), int&&, int&&) + 4 (type_traits:3918) [inlined]
11  multi-dict                    	       0x1026fd7e0 void std::__1::__thread_execute<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(int, int), int, int, 2ul, 3ul>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(int, int), int, int>&, std::__1::__tuple_indices<2ul, 3ul>) + 4 (thread:287) [inlined]
12  multi-dict                    	       0x1026fd7e0 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(int, int), int, int> >(void*) + 48 (thread:298)
13  libsystem_pthread.dylib       	    0x7ff800d7b4e1 _pthread_start + 125
14  libsystem_pthread.dylib       	    0x7ff800d76f6b thread_start + 15

while multi-thread crashed here:

Thread 3 Crashed:
0   libpcre2-8.0.dylib            	       0x10da218e6 match + 362
1   libpcre2-8.0.dylib            	       0x10da21324 pcre2_match_8 + 4536
2   liblink-grammar.5.dylib       	       0x10d8d3fbf reg_match + 39 (regex-morph.c:239) [inlined]
3   liblink-grammar.5.dylib       	       0x10d8d3fbf match_regex + 207 (regex-morph.c:428)
4   liblink-grammar.5.dylib       	       0x10d901b06 regex_guess + 12 (tokenize.c:377) [inlined]
5   liblink-grammar.5.dylib       	       0x10d901b06 separate_word + 886 (tokenize.c:2681)
6   liblink-grammar.5.dylib       	       0x10d901489 separate_sentence + 1257 (tokenize.c:3090)
7   liblink-grammar.5.dylib       	       0x10d8fb13a sentence_split + 74 (sentence.c:93)
8   multi-thread                  	       0x10d74a233 parse_one_sent(Dictionary_s*, Parse_Options_s*, char const*) + 51 (multi-thread.cc:34)
9   multi-thread                  	       0x10d74a042 parse_sents(Dictionary_s*, Parse_Options_s*, int, int) + 1378 (multi-thread.cc:125)
10  multi-thread                  	       0x10d74a445 decltype(static_cast<void (*>(fp)(static_cast<Dictionary_s*>(fp0), static_cast<Parse_Options_s*>(fp0), static_cast<int>(fp0), static_cast<int>(fp0))) std::__1::__invoke<void (*)(Dictionary_s*, Parse_Options_s*, int, int), Dictionary_s*, Parse_Options_s*, int, int>(void (*&&)(Dictionary_s*, Parse_Options_s*, int, int), Dictionary_s*&&, Parse_Options_s*&&, int&&, int&&) + 3 (type_traits:3918) [inlined]
11  multi-thread                  	       0x10d74a445 void std::__1::__thread_execute<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(Dictionary_s*, Parse_Options_s*, int, int), Dictionary_s*, Parse_Options_s*, int, int, 2ul, 3ul, 4ul, 5ul>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(Dictionary_s*, Parse_Options_s*, int, int), Dictionary_s*, Parse_Options_s*, int, int>&, std::__1::__tuple_indices<2ul, 3ul, 4ul, 5ul>) + 17 (thread:287) [inlined]
12  multi-thread                  	       0x10d74a445 void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, void (*)(Dictionary_s*, Parse_Options_s*, int, int), Dictionary_s*, Parse_Options_s*, int, int> >(void*) + 53 (thread:298)
13  libsystem_pthread.dylib       	    0x7ff800d7b4e1 _pthread_start + 125
14  libsystem_pthread.dylib       	    0x7ff800d76f6b thread_start + 15

@ampli said in #1505 (comment) that this is because:

The current regex-morph.c PCRE2 code doesn't support using multi-threading without threads.h.

Possible solutions:

  • Document that, maybe add a warning in configure, and modify the multi-threading tests to print a warning and exit.
  • Modify the regex-morph.c PCRE2 code to use C++ threads, and modify configure.ac and link-grammar/Makefile.am accordingly.
  • Modify the regex-morph.c PCRE2 code to use Pthreads.

My brief searching suggests that C11 threads (threads.h) are not well supported and pthreads is suggested as the recommended alternative. pthreads are already used elsewhere in the code:

Maybe using a single threading library for the entire code base would be a good idea. I can't help with that, however, as I haven't written any multithreaded code before.

@ampli
Copy link
Member

ampli commented Apr 25, 2024

The problem with PCRE2 in compilers that don't support c11 threads happens only with multi-threaded programs that use the LG library in different threads simultaneously. I think most users never do that, and since PCER2 is faster and better than the alternative libraries, it should remain the default.

If desired, configure can warn that multithreading library use is not supported with PCRE2 on such systems.

@linas
Copy link
Member

linas commented Apr 25, 2024

Hi @ryandesign -- if you will allow me, I'd like to provide some deep history and general, opinionated twitter-thread commentary ...

  • pthreads aka POSIX Threads dates back to about 1991. When it appeared, it was an important step forward, and was a technical masterpiece. Not without controversy: there were many opinions and arguments, with "green threads" being the most virulent and longest lasting.

  • The API was full, complete, well-designed and worked well. Most or "almost all" or literally "all"(?) other threading packages are built as a layer on top of POSIX threads. That's because its fast, optimized, and does everything. There's no mis-designed bulging crapola in there.

  • However, there are some common patterns of "things people typically do", and those are not in pthreads. They were common and predictable and regular enough that they became a part of the C and C++ language standards.

  • call_once is typical: make sure you call some function once and only once, no matter what. In C11 its a one-liner. The pthreads variant would be 10-20 lines, with a good chance that the programmer introduces a hard-to-find, subtle bug. The pseudocode for it would be something like:

bool did_we_run_yet = false;
pthread_mutex lock;
pthread_init(lock);
void func_to_call_once() {
    pthread_lock(lock)
    if (did_we_run_yet) return;
    did_we_run_yet = true;
    pthread_unlock(lock)
    rest of subr
}
  • The above could be added, with an #ifdef APPLE around it. But damned if I'll add it: I've written this code snippet maybe 10 or 20 times in my life and I really really have no desire to reinvent it, again.

  • The C11 standard is freakin 13 years old. That's a long time. Apple s a big company. It should get off its fat lazy bum and implement it. Why hasn't it done so yet? Well, typical "I'm the final boss" behavior commonly seen in larger companies. Microsoft did this in spades, and became widely hated for it. They screwed up html, on purpose. Kerberos, on purpose, SSL, on purpose. Both the C and the C++ standards, on purpose. I stopped counting there. Eventually, they came around: but it took ten years. Microsoft eventually shipped standard versions of these systems, but only after much torture and trade-press abuse and developer complaints.

  • I've worked at large companies, and understand exactly how this dynamic happens. I could write a long long blog post on exactly how and why executives and managers make these decisions, and why the executives/managers think they're doing the right thing, and what it was that they failed to anticipate or understand about how the world works. Heh. Actually small companies do crap like this too, usually because they miscalculate how expensive it will be. Pisses off their customers, which is why small companies struggle and go bankrupt... but I digress.

  • The right thing to do is to write to Apple and complain about C11 support. Pester them. Get the developer support team to pay attention. Stuff does bubble up, eventually.

If you want to learn multi-threaded programming, above is a good starter project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants