Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optionally disable thread-local storage. #735

Merged
merged 4 commits into from
Apr 3, 2023
Merged

Optionally disable thread-local storage. #735

merged 4 commits into from
Apr 3, 2023

Conversation

fgvanzee
Copy link
Member

Details:

  • Implemented a new configure option, --disable-tls, which allows the user to optionally disable the use of thread-local storage qualifiers on static variables in BLIS. This option will rarely be needed, but in some situations may allow BLIS to compile when TLS is unavailable. Like the --disable-system option, --disable-tls forcibly disables threading (e.g. forcibly uses --disable-threading). Thanks to Nick Knight for suggesting this option.
  • Modified frame/include/bli_lang_defs.h so that BLIS_THREAD_LOCAL is defined to nothing when BLIS_ENABLE_TLS is not defined.
  • Edited --disable-system configure status output for clarity.

@nick-knight @hominhquan You may wish to discuss this feature since you both make use of a similar option, --disable-system. 馃檪

Details:
- Implemented a new configure option, --disable-tls, which allows the
  user to optionally disable the use of thread-local storage qualifiers
  on static variables in BLIS. This option will rarely be needed, but
  in some situations may allow BLIS to compile when TLS is unavailable.
  Like the --disable-system option, --disable-tls forcibly disables
  threading (e.g. forcibly uses --disable-threading). Thanks to Nick
  Knight for suggesting this option.
- Modified frame/include/bli_lang_defs.h so that BLIS_THREAD_LOCAL is
  defined to nothing when BLIS_ENABLE_TLS is not defined.
- Edited --disable-system configure status output for clarity.
@hominhquan
Copy link
Contributor

Thanks @fgvanzee for the ping. I don't know if I have given my opinion before, if not, I'd like to express my disagreement to the use of TLS, not only in any highly parallel and performant code, but also generic one.

From experience and personal reasoning, I think we can always find an alternative to TLS, and building contextless API (all behaviors are defined by arguments and not by any global variable, or if global they must be Read-only) is the safest way to deal with parallel execution on unknown/future/novel/exotic architectures.

BLIS already defined a lot of logic information and parameters in arguments (cntl_t, cntx_t, rntm_t, ...) and I like it. And now I find it a bit strange to say: "We disable multi-threading because we don't have TLS". It's like leaving other low-end embedded architectures behind the door, just because they are not "well-equipped" enough.

Anyway, if we can't do differently, in the current state, this PR is OK to me.

@fgvanzee
Copy link
Member Author

And now I find it a bit strange to say: "We disable multi-threading because we don't have TLS". It's like leaving other low-end embedded architectures behind the door, just because they are not "well-equipped" enough.

@hominhquan Thank you for sharing your honest opinion! I'm happy to report that my choice to disable threading when disabling TLS was not one I was in any way confident of -- I only did it out of (a) caution, and (b) I was copying and pasting code from the part of configure that dealt with --disable-system. So I'm happy to remove that part if it makes sense to someone with more experience in these matters than me. 馃檪

@devinamatthews
Copy link
Member

@hominhquan Is there such as system that supports multithreading but not TLS? It seems like the latter is a prerequisite for most implementations of the former? I certainly have no objection to disabling it if possible, but there is some useful functionality which it enables. Even if the library were architected to be free of global state, a globally-stateful interface would be the most natural for a lot of people, especially when they have to access BLIS through the BLAS interface for legacy reasons.

Copy link

@nick-knight nick-knight left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fgvanzee for adding this configure knob.

I don't have any strong opinions (yet) on the broader question of how BLIS uses TLS, since I don't fully understand how it's used.

I think it is reasonable to have this configuration option, since TLS support can be quite involved. For example, GNU docs say

It requires significant support from the linker (ld), dynamic linker (ld.so), and system libraries (libc.so and libpthread.so), so it is not available everywhere.

@leekillough
Copy link
Collaborator

leekillough commented Mar 28, 2023

@devinamatthews : pthreads does not support TLS as we think of it today, e.g., thread_local variables accessed through a thread pointer register defined in the ABI, but instead you have to jump through hoops like pthread_key_create, pthread_getspecific, pthread_setspecific, and pthread_key_delete.

But thread_local has been standard since C11/C++11 and interoperates with pthreads and OpenMP.

@hominhquan
Copy link
Contributor

I only did it out of (a) caution, and (b) I was copying and pasting code from the part of configure that dealt with --disable-system.

@fgvanzee You are right to care about race-condition, but here I'm curious about write access to bli_l3_ind_oper_st[] being already protected (I would protect the read-access as well, or use lock-free atomic), but the array is still put in TLS, why ? because it is not needed to protect a TLS variable, unless that TLS variable is shared to another thread ? (nooooo ... )

@hominhquan Is there such as system that supports multithreading but not TLS? It seems like the latter is a prerequisite for most implementations of the former?

TLS is not a must-have feature to run multithreading code. The proof is BLIS has run well in either OpenMP or Pthread on C6x, or MPPA and some others before. On another hand, a sequential single-thread code can totally put a variable in TLS, and that code will compile and work on most CPU today. Most, but not all.

Supporting TLS on new architectures/OS/compilers is often a hazardous and hackish task, which implies many architecture-specific handling, e.g. zephyrproject-rtos/zephyr#22185

I certainly have no objection to disabling it if possible, but there is some useful functionality which it enables. Even if the library were architected to be free of global state, a globally-stateful interface would be the most natural for a lot of people, especially when they have to access BLIS through the BLAS interface for legacy reasons.

Totally agree and I have no problem (for legacy and historical reasons) with these global structures between the BLAS and BLIS API.

My rationale is: Today we are writing code that will run on who-knows-how-many different architectures in the next years/decades. I don't know if they will be extremely sophisticated, or contrastingly very simple and poorly-equipped for power reason and/or application-domain. Using TLS is one way to make our code less portable.

@fgvanzee
Copy link
Member Author

@fgvanzee You are right to care about race-condition, but here I'm curious about write access to bli_l3_ind_oper_st[] being already protected (I would protect the read-access as well, or use lock-free atomic), but the array is still put in TLS, why ? because it is not needed to protect a TLS variable, unless that TLS variable is shared to another thread ? (nooooo ... )

IIRC, our collaborators at AMD once pointed out that TLS was desired to enforce some determinism on static variables such as bli_l3_ind_oper_st[] (the other example being error checking level I think?) because of shenanigans that could take place in application-level threading (rather than BLIS-level threading). It was certainly not a mainstream use case, but rather a situation where they wanted to prevent accidental misuse.

@devinamatthews
Copy link
Member

@hominhquan @leekillough thanks for pointing out that TLS is more difficult to support than multithreading. If TLS were to be disabled but not multithreading, as a user would you consider it acceptable for thinks like number of threads to spawn, 1m enable/disabled, error checking level, etc. to be global (and possibly subject to race conditions)? Is it instead possible to fake the TLS behavior using the ptheads functions mentions by @leekillough ?

@hominhquan
Copy link
Contributor

TLS was desired to enforce some determinism on static variables such as bli_l3_ind_oper_st[] (the other example being error checking level I think?) because of shenanigans that could take place in application-level threading (rather than BLIS-level threading)

OK, thanks for explanation @fgvanzee.

@hominhquan @leekillough thanks for pointing out that TLS is more difficult to support than multithreading. If TLS were to be disabled but not multithreading, as a user would you consider it acceptable for thinks like number of threads to spawn, 1m enable/disabled, error checking level, etc. to be global (and possibly subject to race conditions)?

@devinamatthews @fgvanzee Between (1) disable both TLS and multithreading for safety and (2) disable TLS but not multithreading at the risk of data-race. I would regrettingly choose the solution (1). In any case, functionality comes first. Debugging multithreading data-race is horribly a headache.

I still hope these runtime logic modifications can be reimplemented differently without recourse to TLS. At least it will be the first thing I do when if happens to me to rebase the latest BLIS version.

Is it instead possible to fake the TLS behavior using the ptheads functions mentions by @leekillough ?

IMHO, supporting explicit thread_local using pthread primitives mentioned by @leekillough is equivalent to using malloc/free or internal memory pool mechanism in BLIS, which is also probably what I would use to get rid of TLS.

@fgvanzee
Copy link
Member Author

fgvanzee commented Mar 29, 2023

@devinamatthews @fgvanzee Between (1) disable both TLS and multithreading for safety and (2) disable TLS but not multithreading at the risk of data-race. I would regrettingly choose the solution (1). In any case, functionality comes first. Debugging multithreading data-race is horribly a headache.

In the abstract, I would agree. But I could also make the case that disabling TLS is a very "I know what I'm doing" kind of thing. The only people who would be doing so should be aware of the possible consequences for multithreading, especially if I include a warning in the configure --help text, which I am happy to do.

I still hope these runtime logic modifications can be reimplemented differently without recourse to TLS. At least it will be the first thing I do when if happens to me to rebase the latest BLIS version.

I'm always open to making BLIS more stateless, where possible. Maybe such hypothetical changes don't completely eliminate the applicability of TLS, but they may meaningfully reduce it.

@hominhquan
Copy link
Contributor

hominhquan commented Mar 29, 2023

In the abstract, I would agree. But I could also make the case that disabling TLS is a very "I know what I'm doing" kind of thing. The only people who would be doing so should be aware of the possible consequences for multithreading, especially if I include a warning in the configure --help text, which I am happy to do.

So OK to keep multithreading and add a warning message about possible data-race, and accordingly recommend the developer to try --disable-threading if (s)he encounters any correctness or deadlock issue. Such message can be added in both configure --disable-tls and the test runtime inside the macro #ifdef BLIS_DISABLE_TLS (at the beginning ?)

Details:
- Added "warning labels" to the use of --disable-tls to both the
  "./configure --help" text as well as the configure output upon
  parsing the --disable-tls option. Thanks to Minh Quan Ho for his
  feedback on this.
@fgvanzee
Copy link
Member Author

@hominhquan I opted not to insert a compile-time warning since the warning would be emitted for every since file compilation (since everything goes through the one header, blis.h).

@hominhquan
Copy link
Contributor

@hominhquan I opted not to insert a compile-time warning since the warning would be emitted for every since file compilation (since everything goes through the one header, blis.h).

Oh I think I was not clear enough about "test runtime message". I meant an additional printf message in test_libblis.c at the beginning of execution, where all parameters are printed to stdout for information, in libblis_test_output_params_struct I guess. Something like:

void libblis_test_output_params_struct( FILE* os, test_params_t* params )
{
    ...
    #ifdef BLIS_DISABLE_TLS
    libblis_test_fprintf_c( os, 
        "[WARNING] This BLIS library was compiled with TLS disabled. We consider you know what you are doing.\n"
        "[WARNING] Multi-threading race-condition, correctness issues and deadlock may occur.\n"
        "[WARNING] If this happens, please consider reconfiguring and recompiling with --disable-threading before any further deep debug.\n"
    );
    #endif // BLIS_DISABLE_TLS
    ...
}

@fgvanzee
Copy link
Member Author

Oh I think I was not clear enough about "test runtime message". I meant an additional printf message in test_libblis.c at the beginning of execution, where all parameters are printed to stdout for information, in libblis_test_output_params_struct I guess.

@hominhquan Ohhh, yes. That would be fine. I'll work on it! Thanks for the clarification.

Details:
- Inserted logic into test_libblis.c so that a warning is printed to the
  informational header if TLS is disabled and threading (of any form) is
  enabled. Thanks to Minh Quan Ho for suggesting this warning.
- Defined bli_info_get_enable_tls(), which returns whether the cpp macro
  BLIS_ENABLE_TLS was defined.
- Whitespace updates.
@fgvanzee
Copy link
Member Author

Are we all okay now with this compromise? (Leave threading untouched when TLS is disabled, but warn of consequences in ./configure --help, in configure output, and in the testsuite informational header output.)

@devinamatthews
Copy link
Member

I guess.

@hominhquan
Copy link
Contributor

LGTM.

@fgvanzee fgvanzee merged commit aea8e1d into master Apr 3, 2023
@fgvanzee fgvanzee deleted the disable_tls branch April 3, 2023 17:17
leekillough pushed a commit to leekillough/blis that referenced this pull request Jun 7, 2023
Details:
- Implemented a new configure option, --disable-tls, which allows the
  user to optionally disable the use of thread-local storage qualifiers
  on static variables in BLIS. This option will rarely be needed, but
  in some situations may allow BLIS to compile when TLS is unavailable.
  Thanks to Nick Knight for suggesting this option.
- Unlike the --disable-system option, --disable-tls does not forcibly
  disable threading. Instead, warnings of the possible consequences of
  using threading with TLS disabled are added to:
  - the output of './configure --help';
  - the output of 'configure' the --disable-tls option is parsed;
  - the informational header output by the testsuite.
  Thanks to Minh Quan Ho for suggesting these warnings.
- Modified frame/include/bli_lang_defs.h so that BLIS_THREAD_LOCAL is
  defined to nothing when BLIS_ENABLE_TLS is not defined.
- Defined bli_info_get_enable_tls(), which returns whether the cpp macro
  BLIS_ENABLE_TLS was defined.
- Edited --disable-system configure status output for clarity.
- Whitespace updates.
ct-clmsn pushed a commit to ct-clmsn/blis that referenced this pull request Jul 29, 2023
Details:
- Implemented a new configure option, --disable-tls, which allows the
  user to optionally disable the use of thread-local storage qualifiers
  on static variables in BLIS. This option will rarely be needed, but
  in some situations may allow BLIS to compile when TLS is unavailable.
  Thanks to Nick Knight for suggesting this option.
- Unlike the --disable-system option, --disable-tls does not forcibly
  disable threading. Instead, warnings of the possible consequences of
  using threading with TLS disabled are added to:
  - the output of './configure --help';
  - the output of 'configure' the --disable-tls option is parsed;
  - the informational header output by the testsuite.
  Thanks to Minh Quan Ho for suggesting these warnings.
- Modified frame/include/bli_lang_defs.h so that BLIS_THREAD_LOCAL is
  defined to nothing when BLIS_ENABLE_TLS is not defined.
- Defined bli_info_get_enable_tls(), which returns whether the cpp macro
  BLIS_ENABLE_TLS was defined.
- Edited --disable-system configure status output for clarity.
- Whitespace updates.
fgvanzee added a commit that referenced this pull request May 22, 2024
- (cherry picked from 593d017)

CREDITS file update.

Details:
- Added attributions associated with commits:
  - 98d4678 9b1beec: @bartoldeman
  - 2b05948 059f151: @ct-clmsn
- Reordered attirubtion for @decandia50.
- (cherry picked from 259f684)

Optionally disable thread-local storage. (#735)

Details:
- Implemented a new configure option, --disable-tls, which allows the
  user to optionally disable the use of thread-local storage qualifiers
  on static variables in BLIS. This option will rarely be needed, but
  in some situations may allow BLIS to compile when TLS is unavailable.
  Thanks to Nick Knight for suggesting this option.
- Unlike the --disable-system option, --disable-tls does not forcibly
  disable threading. Instead, warnings of the possible consequences of
  using threading with TLS disabled are added to:
  - the output of './configure --help';
  - the output of 'configure' the --disable-tls option is parsed;
  - the informational header output by the testsuite.
  Thanks to Minh Quan Ho for suggesting these warnings.
- Modified frame/include/bli_lang_defs.h so that BLIS_THREAD_LOCAL is
  defined to nothing when BLIS_ENABLE_TLS is not defined.
- Defined bli_info_get_enable_tls(), which returns whether the cpp macro
  BLIS_ENABLE_TLS was defined.
- Edited --disable-system configure status output for clarity.
- Whitespace updates.
- (cherry picked from aea8e1d)

Add output.testsuite to .gitignore (#736)

Details:
- Added `output.testsuite` to .gitignore since it was previously not
  being matched by `output.testsuite.*`.
- (cherry picked from 3f1432a)

Added mm_algorithm pdf files (bp and pb).

Details:
- Added PDF versions of the PowerPoint files added in 17cd260.
- (cherry picked from 38fc523)

Added mm_algorithm pptx files (bp and pb).

Details:
- Added two PowerPoint files that contain slides depicting the classic
  Goto algorithm for matrix multiplication as well as its sister
  "panel-block" algorithm. These files reside in docs/diagrams.
- (cherry picked from 17cd260)

Move -fPIC insertion to subconfigs' make_defs.mk. (#738)

Details:
- Previously, common.mk was appending -fPIC to the CPICFLAGS variables
  set within the various subconfigurations' make_defs.mk files. This
  seemed somewhat unintuitive, and so now the -fPIC flag is assigned to
  the various subconfigs' CPICFLAGS variables in the respective
  make_defs.mk files.
- This also commit changes the logic in common.mk so that instead of
  appending, the variable is overwritten, but now *only* in the case
  of Windows (since apparently -fPIC needs to be omitted there). Thanks
  to Nick Knight for catching and reporting this weirdness.
- (cherry picked from 9d778e0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants