Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Support] Add end/error to decode[US]LEB128AndInc #90006

Merged

Conversation

MaskRay
Copy link
Member

@MaskRay MaskRay commented Apr 24, 2024

Follow-up to #85739 to encourage error checking. We make end mandatory
and add decodeULEB128AndIncUnsafe to be used without end.

Created using spr 1.3.5-bogner
@llvmbot
Copy link
Collaborator

llvmbot commented Apr 24, 2024

@llvm/pr-subscribers-llvm-support

Author: Fangrui Song (MaskRay)

Changes

Follow-up to #85739 to encourage error checking.


Full diff: https://github.com/llvm/llvm-project/pull/90006.diff

2 Files Affected:

  • (modified) llvm/include/llvm/Support/LEB128.h (+8-4)
  • (modified) llvm/unittests/Support/LEB128Test.cpp (+12)
diff --git a/llvm/include/llvm/Support/LEB128.h b/llvm/include/llvm/Support/LEB128.h
index c4e741549f3ff1..6090bfc3a1aeb8 100644
--- a/llvm/include/llvm/Support/LEB128.h
+++ b/llvm/include/llvm/Support/LEB128.h
@@ -200,16 +200,20 @@ inline int64_t decodeSLEB128(const uint8_t *p, unsigned *n = nullptr,
   return Value;
 }
 
-inline uint64_t decodeULEB128AndInc(const uint8_t *&p) {
+inline uint64_t decodeULEB128AndInc(const uint8_t *&p,
+                                    const uint8_t *end = nullptr,
+                                    const char **error = nullptr) {
   unsigned n;
-  auto ret = decodeULEB128(p, &n);
+  auto ret = decodeULEB128(p, &n, end, error);
   p += n;
   return ret;
 }
 
-inline int64_t decodeSLEB128AndInc(const uint8_t *&p) {
+inline int64_t decodeSLEB128AndInc(const uint8_t *&p,
+                                   const uint8_t *end = nullptr,
+                                   const char **error = nullptr) {
   unsigned n;
-  auto ret = decodeSLEB128(p, &n);
+  auto ret = decodeSLEB128(p, &n, end, error);
   p += n;
   return ret;
 }
diff --git a/llvm/unittests/Support/LEB128Test.cpp b/llvm/unittests/Support/LEB128Test.cpp
index 08b8c5573ce637..c5eea7cfc3731d 100644
--- a/llvm/unittests/Support/LEB128Test.cpp
+++ b/llvm/unittests/Support/LEB128Test.cpp
@@ -155,6 +155,12 @@ TEST(LEB128Test, DecodeInvalidULEB128) {
     EXPECT_NE(Error, nullptr);                                                 \
     EXPECT_EQ(0ul, Actual);                                                    \
     EXPECT_EQ(ERROR_OFFSET, ErrorOffset);                                      \
+    Value = reinterpret_cast<const uint8_t *>(VALUE);                          \
+    Error = nullptr;                                                           \
+    Actual = decodeULEB128AndInc(Value, Value + strlen(VALUE), &Error);        \
+    EXPECT_NE(Error, nullptr);                                                 \
+    EXPECT_EQ(0ul, Actual);                                                    \
+    EXPECT_EQ(ERROR_OFFSET, Value - reinterpret_cast<const uint8_t *>(VALUE)); \
   } while (0)
 
   // Buffer overflow.
@@ -224,6 +230,12 @@ TEST(LEB128Test, DecodeInvalidSLEB128) {
     EXPECT_NE(Error, nullptr);                                                 \
     EXPECT_EQ(0ul, Actual);                                                    \
     EXPECT_EQ(ERROR_OFFSET, ErrorOffset);                                      \
+    Value = reinterpret_cast<const uint8_t *>(VALUE);                          \
+    Error = nullptr;                                                           \
+    Actual = decodeSLEB128AndInc(Value, Value + strlen(VALUE), &Error);        \
+    EXPECT_NE(Error, nullptr);                                                 \
+    EXPECT_EQ(0ul, Actual);                                                    \
+    EXPECT_EQ(ERROR_OFFSET, Value - reinterpret_cast<const uint8_t *>(VALUE)); \
   } while (0)
 
   // Buffer overflow.

Copy link
Collaborator

@adrian-prantl adrian-prantl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically this still lets users do the inherently unsafe endless scan of a non-normalized value 0x8080808080808080808080... that exceeds the size of the buffer. Maybe we should require passing the end of the buffer in?

@MaskRay
Copy link
Member Author

MaskRay commented Apr 26, 2024

Some decode[US]LEB128AndInc occurrences in llvm/utils/TableGen/DecoderEmitter.cpp omit end.
Making end non-default would need to update them where there isn't a good parameter...

@dwblaikie
Copy link
Collaborator

This is certainly the minimum I'd like to see (as problematic as the API it's wrapping, rather than worse - so it doesn't cause people to be even less inclined to do error checking due to the convenience of a wrapper that lacks the underlying error handling).

Though I certainly wouldn't mind more robust error checking/improving code when call sites are revisited/updated to use the newer API (without necessarily the motivation to cleanup all uses of the older more relaxed API)

I'm not sure what cases/where one wouldn't have a good end parameter? Lack of one seems like a problem (except in something narrow like the tablegen case, where the buffer being read is baked into the program and known-good - at least in that case the end isn't hard to know/find, if technically unnecessary, but I still wouldn't mind if that got some bounds checking too, I can't imagine that it'd substantially harm performance, though I could be wrong)

Created using spr 1.3.5-bogner
Copy link

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 18268ac0f48d93c2bcddb69732761971669c09ab 7d97a0c0649c29df1c4c7ff3bdfb115ab4df6132 -- llvm/include/llvm/Support/LEB128.h llvm/unittests/Support/LEB128Test.cpp llvm/utils/TableGen/DecoderEmitter.cpp
View the diff from clang-format here.
diff --git a/llvm/include/llvm/Support/LEB128.h b/llvm/include/llvm/Support/LEB128.h
index 3877898e28..a15b73bc14 100644
--- a/llvm/include/llvm/Support/LEB128.h
+++ b/llvm/include/llvm/Support/LEB128.h
@@ -200,8 +200,7 @@ inline int64_t decodeSLEB128(const uint8_t *p, unsigned *n = nullptr,
   return Value;
 }
 
-inline uint64_t decodeULEB128AndInc(const uint8_t *&p,
-                                    const uint8_t *end,
+inline uint64_t decodeULEB128AndInc(const uint8_t *&p, const uint8_t *end,
                                     const char **error = nullptr) {
   unsigned n;
   auto ret = decodeULEB128(p, &n, end, error);
@@ -209,8 +208,7 @@ inline uint64_t decodeULEB128AndInc(const uint8_t *&p,
   return ret;
 }
 
-inline int64_t decodeSLEB128AndInc(const uint8_t *&p,
-                                   const uint8_t *end,
+inline int64_t decodeSLEB128AndInc(const uint8_t *&p, const uint8_t *end,
                                    const char **error = nullptr) {
   unsigned n;
   auto ret = decodeSLEB128(p, &n, end, error);

@dwblaikie
Copy link
Collaborator

Some decode[US]LEB128AndInc occurrences in llvm/utils/TableGen/DecoderEmitter.cpp omit end. Making end non-default would need to update them where there isn't a good parameter...

Not sure I follow this - the good parameter to use for this case would be the end of the buffer.

It looks like the not code-generated parts of this code already do such error checking (even if they only assert when there is an error)

unsigned Start = decodeULEB128(Table.data() + Pos + 1, nullptr,
Table.data() + Table.size(), &ErrMsg);
assert(ErrMsg == nullptr && "ULEB128 value too large!");

It looks like if decodeInstruction took an ArrayRef instead of a pointer it wouldn't require updates to the caller (they pass in global arrays anyway, so ArrayRef's length deduction would kick in) and then the buffer length would be known/could be passed in (well, length - (ptr - data) I suppose, which is admittedly a bit annoying and maybe there's a better way to manage that) to the decodeULEB128 calls.

Anyway, at least this'll do for now I guess - thanks!

Copy link
Collaborator

@dwblaikie dwblaikie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Created using spr 1.3.5-bogner
@MaskRay MaskRay merged commit efad149 into main May 8, 2024
3 of 4 checks passed
@MaskRay MaskRay deleted the users/MaskRay/spr/support-add-enderror-to-decodeusleb128andinc branch May 8, 2024 16:22
Actual = decodeULEB128AndInc(Value, Value + strlen(VALUE), &Error); \
EXPECT_NE(Error, nullptr); \
EXPECT_EQ(0ul, Actual); \
EXPECT_EQ(ERROR_OFFSET, Value - reinterpret_cast<const uint8_t *>(VALUE)); \
Copy link
Member

@aganea aganea May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @MaskRay!
There's no guarantee that VALUE here at L163 will be evaluated to the same value as VALUE when it was assigned at L158. These tests fail when building with latest MSVC using the Debug target. Same comment applies to other cases below.
Are you able to fix this or should I send a PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Google Test filter = LEB128Test.DecodeInvalidULEB128
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from LEB128Test
[ RUN      ] LEB128Test.DecodeInvalidULEB128
C:\src\git\llvm-project\llvm\unittests\Support\LEB128Test.cpp(167): error: Expected equality of these values:
  0u
    Which is: 0
  Value - reinterpret_cast<const uint8_t *>("")
    Which is: -5

C:\src\git\llvm-project\llvm\unittests\Support\LEB128Test.cpp(168): error: Expected equality of these values:
  1u
    Which is: 1
  Value - reinterpret_cast<const uint8_t *>("\x80")
    Which is: -167

C:\src\git\llvm-project\llvm\unittests\Support\LEB128Test.cpp(171): error: Expected equality of these values:
  9u
    Which is: 9
  Value - reinterpret_cast<const uint8_t *>("\x80\x80\x80\x80\x80\x80\x80\x80\x80\x02")
    Which is: -167

C:\src\git\llvm-project\llvm\unittests\Support\LEB128Test.cpp(172): error: Expected equality of these values:
  10u
    Which is: 10
  Value - reinterpret_cast<const uint8_t *>("\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x02")
    Which is: -166

[  FAILED  ] LEB128Test.DecodeInvalidULEB128 (2 ms)
[----------] 1 test from LEB128Test (2 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (4 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] LEB128Test.DecodeInvalidULEB128

 1 FAILED TEST

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I did not see the problem. If you have a PR, that will be appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants