Skip to content

Conversation

lntue
Copy link
Contributor

@lntue lntue commented Aug 28, 2024

Currently the nan* functions use nullptr dereferencing to crash with SIGSEGV if the input is nullptr. Both nan(nullptr) and nullptr dereferencing are undefined behaviors according to the C standard. Employing nullptr dereference in the nan function implementation is ok if users only linked against the pre-built library, but it might be completely removed by the compilers' optimizations if it is built from source together with the users' code.

See for instance: https://godbolt.org/z/fd8KcM9bx

This PR uses volatile load to prevent the undefined behavior if libc is built without sanitizers, and leave the current undefined behavior if libc is built with sanitizers, so that the undefined behavior can be caught for users' codes.

@llvmbot
Copy link
Member

llvmbot commented Aug 28, 2024

@llvm/pr-subscribers-libc

Author: None (lntue)

Changes

Currently the nan* functions use nullptr dereferencing to crash with SIGSEGV if the input is nullptr. Both nan(nullptr) and nullptr dereferencing are undefined behaviors according to the C standard. Employing nullptr dereference in the nan function implementation is ok if users only linked against the pre-built library, but it might be completely removed by the compilers' optimizations if it is built from source together with the users' code.

See for instance: https://godbolt.org/z/fd8KcM9bx

This PR uses volatile load to prevent the undefined behavior if libc is built without sanitizers, and leave the current undefined behavior if libc is built with sanitizers, so that the undefined behavior can be caught for users' codes.


Full diff: https://github.com/llvm/llvm-project/pull/106468.diff

11 Files Affected:

  • (modified) libc/src/__support/CMakeLists.txt (+3)
  • (modified) libc/src/__support/macros/sanitizer.h (+19-2)
  • (modified) libc/src/__support/str_to_float.h (+11)
  • (modified) libc/test/src/compiler/CMakeLists.txt (+1)
  • (modified) libc/test/src/compiler/stack_chk_guard_test.cpp (+3-3)
  • (modified) libc/test/src/math/smoke/CMakeLists.txt (+9-5)
  • (modified) libc/test/src/math/smoke/nan_test.cpp (+4-3)
  • (modified) libc/test/src/math/smoke/nanf128_test.cpp (+4-3)
  • (modified) libc/test/src/math/smoke/nanf16_test.cpp (+3-4)
  • (modified) libc/test/src/math/smoke/nanf_test.cpp (+4-3)
  • (modified) libc/test/src/math/smoke/nanl_test.cpp (+4-3)
diff --git a/libc/src/__support/CMakeLists.txt b/libc/src/__support/CMakeLists.txt
index 9bd1e29081a801..1d5c2b585ea32f 100644
--- a/libc/src/__support/CMakeLists.txt
+++ b/libc/src/__support/CMakeLists.txt
@@ -192,6 +192,9 @@ add_header_library(
     libc.src.__support.CPP.optional
     libc.src.__support.FPUtil.fp_bits
     libc.src.__support.FPUtil.rounding_mode
+    libc.src.__support.macros.config
+    libc.src.__support.macros.optimization
+    libc.src.__support.macros.sanitizer
     libc.src.errno.errno
 )
 
diff --git a/libc/src/__support/macros/sanitizer.h b/libc/src/__support/macros/sanitizer.h
index c4f8b5bce39755..c20412e0f8b69f 100644
--- a/libc/src/__support/macros/sanitizer.h
+++ b/libc/src/__support/macros/sanitizer.h
@@ -15,7 +15,25 @@
 // Functions to unpoison memory
 //-----------------------------------------------------------------------------
 
+#if LIBC_HAS_FEATURE(address_sanitizer) || defined(__SANITIZE_ADDRESS__)
+#define LIBC_HAS_ADDRESS_SANITIZER
+#endif
+
 #if LIBC_HAS_FEATURE(memory_sanitizer)
+#define LIBC_HAS_MEMORY_SANITIZER
+#endif
+
+#if LIBC_HAS_FEATURE(undefined_behavior_sanitizer)
+#define LIBC_HAS_UNDEFINED_BEHAVIOR_SANITIZER
+#endif
+
+#if defined(LIBC_HAS_ADDRESS_SANITIZER) ||                                     \
+    defined(LIBC_HAS_MEMORY_SANITIZER) ||                                      \
+    defined(LIBC_HAS_UNDEFINED_BEHAVIOR_SANITIZER)
+#define LIBC_HAS_SANITIZER
+#endif
+
+#ifdef LIBC_HAS_MEMORY_SANITIZER
 // Only perform MSAN unpoison in non-constexpr context.
 #include <sanitizer/msan_interface.h>
 #define MSAN_UNPOISON(addr, size)                                              \
@@ -27,8 +45,7 @@
 #define MSAN_UNPOISON(ptr, size)
 #endif
 
-#if LIBC_HAS_FEATURE(address_sanitizer)
-#define LIBC_HAVE_ADDRESS_SANITIZER
+#ifdef LIBC_HAS_ADDRESS_SANITIZER
 #include <sanitizer/asan_interface.h>
 #define ASAN_POISON_MEMORY_REGION(addr, size)                                  \
   __asan_poison_memory_region((addr), (size))
diff --git a/libc/src/__support/str_to_float.h b/libc/src/__support/str_to_float.h
index ffd6ebf27c7726..4cae02d591078a 100644
--- a/libc/src/__support/str_to_float.h
+++ b/libc/src/__support/str_to_float.h
@@ -20,6 +20,8 @@
 #include "src/__support/detailed_powers_of_ten.h"
 #include "src/__support/high_precision_decimal.h"
 #include "src/__support/macros/config.h"
+#include "src/__support/macros/optimization.h"
+#include "src/__support/macros/sanitizer.h"
 #include "src/__support/str_to_integer.h"
 #include "src/__support/str_to_num_result.h"
 #include "src/__support/uint128.h"
@@ -1208,6 +1210,15 @@ template <class T> LIBC_INLINE StrToNumResult<T> strtonan(const char *arg) {
   using FPBits = typename fputil::FPBits<T>;
   using StorageType = typename FPBits::StorageType;
 
+#ifndef LIBC_HAS_SANITIZER
+  if (LIBC_UNLIKELY(arg == nullptr)) {
+    // Use volatile to prevent undefined behavior of dereferencing nullptr.
+    volatile const char *crashing = arg;
+    // Intentionally crashing with SIGSEGV.
+    return {static_cast<T>(crashing[0]), 0, 0};
+  }
+#endif
+
   FPBits result;
   int error = 0;
   StorageType nan_mantissa = 0;
diff --git a/libc/test/src/compiler/CMakeLists.txt b/libc/test/src/compiler/CMakeLists.txt
index 65a9acceb6f7f1..a45fa8c55e5128 100644
--- a/libc/test/src/compiler/CMakeLists.txt
+++ b/libc/test/src/compiler/CMakeLists.txt
@@ -7,6 +7,7 @@ add_libc_unittest(
   SRCS
     stack_chk_guard_test.cpp
   DEPENDS
+    libc.hdr.signal_macros
     libc.src.__support.macros.sanitizer
     libc.src.compiler.__stack_chk_fail
     libc.src.string.memset
diff --git a/libc/test/src/compiler/stack_chk_guard_test.cpp b/libc/test/src/compiler/stack_chk_guard_test.cpp
index 6b71e155fa3e4d..4ec8398c9fc95d 100644
--- a/libc/test/src/compiler/stack_chk_guard_test.cpp
+++ b/libc/test/src/compiler/stack_chk_guard_test.cpp
@@ -6,7 +6,7 @@
 //
 //===----------------------------------------------------------------------===//
 
-#include "include/llvm-libc-macros/signal-macros.h"
+#include "hdr/signal_macros.h"
 #include "src/__support/macros/sanitizer.h"
 #include "src/compiler/__stack_chk_fail.h"
 #include "src/string/memset.h"
@@ -18,7 +18,7 @@ TEST(LlvmLibcStackChkFail, Death) {
 
 // Disable the test when asan is enabled so that it doesn't immediately fail
 // after the memset, but before the stack canary is re-checked.
-#ifndef LIBC_HAVE_ADDRESS_SANITIZER
+#ifndef LIBC_HAS_ADDRESS_SANITIZER
 TEST(LlvmLibcStackChkFail, Smash) {
   EXPECT_DEATH(
       [] {
@@ -27,4 +27,4 @@ TEST(LlvmLibcStackChkFail, Smash) {
       },
       WITH_SIGNAL(SIGABRT));
 }
-#endif // LIBC_HAVE_ADDRESS_SANITIZER
+#endif // LIBC_HAS_ADDRESS_SANITIZER
diff --git a/libc/test/src/math/smoke/CMakeLists.txt b/libc/test/src/math/smoke/CMakeLists.txt
index 7271e933b9311d..e943d98256a97b 100644
--- a/libc/test/src/math/smoke/CMakeLists.txt
+++ b/libc/test/src/math/smoke/CMakeLists.txt
@@ -2895,9 +2895,10 @@ add_fp_unittest(
   SRCS
     nanf_test.cpp
   DEPENDS
-    libc.include.signal
+    libc.hdr.signal_macros
     libc.src.math.nanf
     libc.src.__support.FPUtil.fp_bits
+    libc.src.__support.macros.sanitizer
   # FIXME: The nan tests currently have death tests, which aren't supported for
   # hermetic tests.
   UNIT_TEST_ONLY
@@ -2910,9 +2911,10 @@ add_fp_unittest(
   SRCS
     nan_test.cpp
   DEPENDS
-    libc.include.signal
+    libc.hdr.signal_macros
     libc.src.math.nan
     libc.src.__support.FPUtil.fp_bits
+    libc.src.__support.macros.sanitizer
   # FIXME: The nan tests currently have death tests, which aren't supported for
   # hermetic tests.
   UNIT_TEST_ONLY
@@ -2925,9 +2927,10 @@ add_fp_unittest(
   SRCS
     nanl_test.cpp
   DEPENDS
-    libc.include.signal
+    libc.hdr.signal_macros
     libc.src.math.nanl
     libc.src.__support.FPUtil.fp_bits
+    libc.src.__support.macros.sanitizer
   # FIXME: The nan tests currently have death tests, which aren't supported for
   # hermetic tests.
   UNIT_TEST_ONLY
@@ -2940,7 +2943,7 @@ add_fp_unittest(
   SRCS
     nanf16_test.cpp
   DEPENDS
-    libc.include.signal
+    libc.hdr.signal_macros
     libc.src.math.nanf16
     libc.src.__support.FPUtil.fp_bits
     libc.src.__support.macros.sanitizer
@@ -2956,9 +2959,10 @@ add_fp_unittest(
   SRCS
     nanf128_test.cpp
   DEPENDS
-    libc.include.signal
+    libc.hdr.signal_macros
     libc.src.math.nanf128
     libc.src.__support.FPUtil.fp_bits
+    libc.src.__support.macros.sanitizer
   # FIXME: The nan tests currently have death tests, which aren't supported for
   # hermetic tests.
   UNIT_TEST_ONLY
diff --git a/libc/test/src/math/smoke/nan_test.cpp b/libc/test/src/math/smoke/nan_test.cpp
index 68c844181a1946..46b9e9aa9563ab 100644
--- a/libc/test/src/math/smoke/nan_test.cpp
+++ b/libc/test/src/math/smoke/nan_test.cpp
@@ -6,12 +6,13 @@
 //
 //===----------------------------------------------------------------------===//
 
+#include "hdr/signal_macros.h"
 #include "src/__support/FPUtil/FPBits.h"
+#include "src/__support/macros/sanitizer.h"
 #include "src/math/nan.h"
 #include "test/UnitTest/FEnvSafeTest.h"
 #include "test/UnitTest/FPMatcher.h"
 #include "test/UnitTest/Test.h"
-#include <signal.h>
 
 class LlvmLibcNanTest : public LIBC_NAMESPACE::testing::FEnvSafeTest {
 public:
@@ -43,8 +44,8 @@ TEST_F(LlvmLibcNanTest, RandomString) {
   run_test("123 ", 0x7ff8000000000000);
 }
 
-#if !defined(LIBC_HAVE_ADDRESS_SANITIZER) && defined(LIBC_TARGET_OS_IS_LINUX)
+#if !defined(LIBC_HAS_ADDRESS_SANITIZER) && defined(LIBC_TARGET_OS_IS_LINUX)
 TEST_F(LlvmLibcNanTest, InvalidInput) {
   EXPECT_DEATH([] { LIBC_NAMESPACE::nan(nullptr); }, WITH_SIGNAL(SIGSEGV));
 }
-#endif // LIBC_HAVE_ADDRESS_SANITIZER
+#endif // LIBC_HAS_ADDRESS_SANITIZER
diff --git a/libc/test/src/math/smoke/nanf128_test.cpp b/libc/test/src/math/smoke/nanf128_test.cpp
index 015cc31e4be237..25dd2ef1d5b1ca 100644
--- a/libc/test/src/math/smoke/nanf128_test.cpp
+++ b/libc/test/src/math/smoke/nanf128_test.cpp
@@ -6,7 +6,9 @@
 //
 //===----------------------------------------------------------------------===//
 
+#include "hdr/signal_macros.h"
 #include "src/__support/FPUtil/FPBits.h"
+#include "src/__support/macros/sanitizer.h"
 #include "src/__support/uint128.h"
 #include "src/math/nanf128.h"
 #include "test/UnitTest/FEnvSafeTest.h"
@@ -53,9 +55,8 @@ TEST_F(LlvmLibcNanf128Test, RandomString) {
            QUIET_NAN);
 }
 
-#if !defined(LIBC_HAVE_ADDRESS_SANITIZER) && defined(LIBC_TARGET_OS_IS_LINUX)
-#include <signal.h>
+#if !defined(LIBC_HAS_ADDRESS_SANITIZER) && defined(LIBC_TARGET_OS_IS_LINUX)
 TEST_F(LlvmLibcNanf128Test, InvalidInput) {
   EXPECT_DEATH([] { LIBC_NAMESPACE::nanf128(nullptr); }, WITH_SIGNAL(SIGSEGV));
 }
-#endif // LIBC_HAVE_ADDRESS_SANITIZER
+#endif // LIBC_HAS_ADDRESS_SANITIZER
diff --git a/libc/test/src/math/smoke/nanf16_test.cpp b/libc/test/src/math/smoke/nanf16_test.cpp
index 81b844bf6bb59c..ec640a3b9eef92 100644
--- a/libc/test/src/math/smoke/nanf16_test.cpp
+++ b/libc/test/src/math/smoke/nanf16_test.cpp
@@ -6,6 +6,7 @@
 //
 //===----------------------------------------------------------------------===//
 
+#include "hdr/signal_macros.h"
 #include "src/__support/FPUtil/FPBits.h"
 #include "src/__support/macros/sanitizer.h"
 #include "src/math/nanf16.h"
@@ -13,8 +14,6 @@
 #include "test/UnitTest/FPMatcher.h"
 #include "test/UnitTest/Test.h"
 
-#include <signal.h>
-
 class LlvmLibcNanf16Test : public LIBC_NAMESPACE::testing::FEnvSafeTest {
 public:
   using StorageType = LIBC_NAMESPACE::fputil::FPBits<float16>::StorageType;
@@ -44,8 +43,8 @@ TEST_F(LlvmLibcNanf16Test, RandomString) {
   run_test("123 ", 0x7e00);
 }
 
-#if !defined(LIBC_HAVE_ADDRESS_SANITIZER) && defined(LIBC_TARGET_OS_IS_LINUX)
+#if !defined(LIBC_HAS_ADDRESS_SANITIZER) && defined(LIBC_TARGET_OS_IS_LINUX)
 TEST_F(LlvmLibcNanf16Test, InvalidInput) {
   EXPECT_DEATH([] { LIBC_NAMESPACE::nanf16(nullptr); }, WITH_SIGNAL(SIGSEGV));
 }
-#endif // LIBC_HAVE_ADDRESS_SANITIZER
+#endif // LIBC_HAS_ADDRESS_SANITIZER
diff --git a/libc/test/src/math/smoke/nanf_test.cpp b/libc/test/src/math/smoke/nanf_test.cpp
index ff5823685225ce..dd3124ee9c5112 100644
--- a/libc/test/src/math/smoke/nanf_test.cpp
+++ b/libc/test/src/math/smoke/nanf_test.cpp
@@ -6,12 +6,13 @@
 //
 //===----------------------------------------------------------------------===//
 
+#include "hdr/signal_macros.h"
 #include "src/__support/FPUtil/FPBits.h"
+#include "src/__support/macros/sanitizer.h"
 #include "src/math/nanf.h"
 #include "test/UnitTest/FEnvSafeTest.h"
 #include "test/UnitTest/FPMatcher.h"
 #include "test/UnitTest/Test.h"
-#include <signal.h>
 
 class LlvmLibcNanfTest : public LIBC_NAMESPACE::testing::FEnvSafeTest {
 public:
@@ -42,8 +43,8 @@ TEST_F(LlvmLibcNanfTest, RandomString) {
   run_test("123 ", 0x7fc00000);
 }
 
-#if !defined(LIBC_HAVE_ADDRESS_SANITIZER) && defined(LIBC_TARGET_OS_IS_LINUX)
+#if !defined(LIBC_HAS_ADDRESS_SANITIZER) && defined(LIBC_TARGET_OS_IS_LINUX)
 TEST_F(LlvmLibcNanfTest, InvalidInput) {
   EXPECT_DEATH([] { LIBC_NAMESPACE::nanf(nullptr); }, WITH_SIGNAL(SIGSEGV));
 }
-#endif // LIBC_HAVE_ADDRESS_SANITIZER
+#endif // LIBC_HAS_ADDRESS_SANITIZER
diff --git a/libc/test/src/math/smoke/nanl_test.cpp b/libc/test/src/math/smoke/nanl_test.cpp
index de9af05100c10a..ef3f9c15dafd9f 100644
--- a/libc/test/src/math/smoke/nanl_test.cpp
+++ b/libc/test/src/math/smoke/nanl_test.cpp
@@ -6,12 +6,13 @@
 //
 //===----------------------------------------------------------------------===//
 
+#include "hdr/signal_macros.h"
 #include "src/__support/FPUtil/FPBits.h"
+#include "src/__support/macros/sanitizer.h"
 #include "src/math/nanl.h"
 #include "test/UnitTest/FEnvSafeTest.h"
 #include "test/UnitTest/FPMatcher.h"
 #include "test/UnitTest/Test.h"
-#include <signal.h>
 
 #if defined(LIBC_TYPES_LONG_DOUBLE_IS_FLOAT64)
 #define SELECT_LONG_DOUBLE(val, _, __) val
@@ -70,8 +71,8 @@ TEST_F(LlvmLibcNanlTest, RandomString) {
   run_test("123 ", expected);
 }
 
-#if !defined(LIBC_HAVE_ADDRESS_SANITIZER) && defined(LIBC_TARGET_OS_IS_LINUX)
+#if !defined(LIBC_HAS_ADDRESS_SANITIZER) && defined(LIBC_TARGET_OS_IS_LINUX)
 TEST_F(LlvmLibcNanlTest, InvalidInput) {
   EXPECT_DEATH([] { LIBC_NAMESPACE::nanl(nullptr); }, WITH_SIGNAL(SIGSEGV));
 }
-#endif // LIBC_HAVE_ADDRESS_SANITIZER
+#endif // LIBC_HAS_ADDRESS_SANITIZER


#ifndef LIBC_HAS_SANITIZER
if (LIBC_UNLIKELY(arg == nullptr)) {
// Use volatile to prevent undefined behavior of dereferencing nullptr.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that this problem is only likely to occur in the tests, does it make more sense to change the tests instead of adding this extra-explicit check here?

Copy link
Contributor Author

@lntue lntue Aug 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the main purpose is not for the test, but hardening user code when sanitizers are not in used. This is especially important when this header function is included and built from source, as the compiler can detect undefined behavior and optimize the code away completely. The explicit check is for performance purpose, since you will not want everything to always be round-tripped through the volatile variable.

And we leave the undefined-behavior back if the code is built under sanitizers, so that if in user codes, nullptr does reach here, then the sanitizers will be able to catch it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand, the only time this will matter is when all of the following conditions are met:
a) User code is built alongside libc code as source
b) The code is built without sanitizers
c) The code is built with optimizations
d) The code passes a nullptr to the nan function in a constant way.

In that case, the user code is incorrect (it should not pass nullptr to nan) and the compiler may do unexpected things. I agree that this check would help find the bug in this specific case, but it will have a penalty for every other case, since this check can't be optimized out due to the volatile. Additionally, if the user code does have this undefined behavior, then they can find it by changing any of the conditions (e.g. building with sanitizers) without us needing this check all the time.

My main point is: If the user wants hardening to find this sort of undefined behavior, they have access to it. If they didn't choose to use hardening then they may not want it.

Copy link
Contributor Author

@lntue lntue Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's triggered even without a & d. This got my attention when the nan tests didn't crash and trigger SIGSEGV on some riscv64 configs.

About your worry regarding the performance, if the inputs (runtime or compile-time) have nullptr, our original intention is to crash with SIGSEGV anyway, and the added nullptr check and volatile load is negligible compared to crashing part.
If the inputs do not have nullptr, the only extra cost is the nullptr check arg == nullptr, which always returns false under the condition / assumption, cost at most 1 CPU clock, and will be hidden away completely by the CPU's branch predictors.

So for this check arg == nullptr to incur observable performance penalty, you'll need to call strtonan for a large array with inputs alternating between very short string and nullptr in order to beat the CPU's branch predictor. But in that case, we already crash at the first nullptr encountered.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point. In that case we should create a generic way to do nullptr checks so it can be used in other places, and also guard it behind a hardening macro so it can be turned off on space-sensitive systems.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a config option to enable/disable null checks. At the current form it's not straightforward to refactor it to a generic macro. I'll leave adding a generic macro for later, after seeing how to do null checks in other functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest replacing the volatile stuff with just __builtin_trap(). It makes this more generic and makes it less dependent on optimization behavior. If you really want it to segfault, then you could do this:

volatile char *crashing = reinterpret_cast<char*>(arg); //might also need a const_cast to be properly generic
char a = *crashing;
__builtin_trap(); //avoid the return while also making it clear to the compiler that this is a terminal path

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactor it to a generic macro in __support/macros/null_check.h

@llvmbot llvmbot added the bazel "Peripheral" support tier build system: utils/bazel label Sep 5, 2024
Copy link
Contributor

@michaelrj-google michaelrj-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lntue lntue merged commit 1896ee3 into llvm:main Sep 11, 2024
9 checks passed
@lntue lntue deleted the nan branch September 11, 2024 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bazel "Peripheral" support tier build system: utils/bazel libc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants