[Support] Add end/error to decode[US]LEB128AndInc #90006

MaskRay · 2024-04-24T23:57:29Z

Follow-up to #85739 to encourage error checking. We make end mandatory
and add decodeULEB128AndIncUnsafe to be used without end.

Created using spr 1.3.5-bogner

llvmbot · 2024-04-24T23:57:59Z

@llvm/pr-subscribers-llvm-support

Author: Fangrui Song (MaskRay)

Changes

Follow-up to #85739 to encourage error checking.

Full diff: https://github.com/llvm/llvm-project/pull/90006.diff

2 Files Affected:

(modified) llvm/include/llvm/Support/LEB128.h (+8-4)
(modified) llvm/unittests/Support/LEB128Test.cpp (+12)

diff --git a/llvm/include/llvm/Support/LEB128.h b/llvm/include/llvm/Support/LEB128.h
index c4e741549f3ff1..6090bfc3a1aeb8 100644
--- a/llvm/include/llvm/Support/LEB128.h
+++ b/llvm/include/llvm/Support/LEB128.h
@@ -200,16 +200,20 @@ inline int64_t decodeSLEB128(const uint8_t *p, unsigned *n = nullptr,
   return Value;
 }
 
-inline uint64_t decodeULEB128AndInc(const uint8_t *&p) {
+inline uint64_t decodeULEB128AndInc(const uint8_t *&p,
+                                    const uint8_t *end = nullptr,
+                                    const char **error = nullptr) {
   unsigned n;
-  auto ret = decodeULEB128(p, &n);
+  auto ret = decodeULEB128(p, &n, end, error);
   p += n;
   return ret;
 }
 
-inline int64_t decodeSLEB128AndInc(const uint8_t *&p) {
+inline int64_t decodeSLEB128AndInc(const uint8_t *&p,
+                                   const uint8_t *end = nullptr,
+                                   const char **error = nullptr) {
   unsigned n;
-  auto ret = decodeSLEB128(p, &n);
+  auto ret = decodeSLEB128(p, &n, end, error);
   p += n;
   return ret;
 }
diff --git a/llvm/unittests/Support/LEB128Test.cpp b/llvm/unittests/Support/LEB128Test.cpp
index 08b8c5573ce637..c5eea7cfc3731d 100644
--- a/llvm/unittests/Support/LEB128Test.cpp
+++ b/llvm/unittests/Support/LEB128Test.cpp
@@ -155,6 +155,12 @@ TEST(LEB128Test, DecodeInvalidULEB128) {
     EXPECT_NE(Error, nullptr);                                                 \
     EXPECT_EQ(0ul, Actual);                                                    \
     EXPECT_EQ(ERROR_OFFSET, ErrorOffset);                                      \
+    Value = reinterpret_cast<const uint8_t *>(VALUE);                          \
+    Error = nullptr;                                                           \
+    Actual = decodeULEB128AndInc(Value, Value + strlen(VALUE), &Error);        \
+    EXPECT_NE(Error, nullptr);                                                 \
+    EXPECT_EQ(0ul, Actual);                                                    \
+    EXPECT_EQ(ERROR_OFFSET, Value - reinterpret_cast<const uint8_t *>(VALUE)); \
   } while (0)
 
   // Buffer overflow.
@@ -224,6 +230,12 @@ TEST(LEB128Test, DecodeInvalidSLEB128) {
     EXPECT_NE(Error, nullptr);                                                 \
     EXPECT_EQ(0ul, Actual);                                                    \
     EXPECT_EQ(ERROR_OFFSET, ErrorOffset);                                      \
+    Value = reinterpret_cast<const uint8_t *>(VALUE);                          \
+    Error = nullptr;                                                           \
+    Actual = decodeSLEB128AndInc(Value, Value + strlen(VALUE), &Error);        \
+    EXPECT_NE(Error, nullptr);                                                 \
+    EXPECT_EQ(0ul, Actual);                                                    \
+    EXPECT_EQ(ERROR_OFFSET, Value - reinterpret_cast<const uint8_t *>(VALUE)); \
   } while (0)
 
   // Buffer overflow.

adrian-prantl

Technically this still lets users do the inherently unsafe endless scan of a non-normalized value 0x8080808080808080808080... that exceeds the size of the buffer. Maybe we should require passing the end of the buffer in?

MaskRay · 2024-04-26T01:58:42Z

Some decode[US]LEB128AndInc occurrences in llvm/utils/TableGen/DecoderEmitter.cpp omit end.
Making end non-default would need to update them where there isn't a good parameter...

dwblaikie · 2024-04-30T04:08:05Z

This is certainly the minimum I'd like to see (as problematic as the API it's wrapping, rather than worse - so it doesn't cause people to be even less inclined to do error checking due to the convenience of a wrapper that lacks the underlying error handling).

Though I certainly wouldn't mind more robust error checking/improving code when call sites are revisited/updated to use the newer API (without necessarily the motivation to cleanup all uses of the older more relaxed API)

I'm not sure what cases/where one wouldn't have a good end parameter? Lack of one seems like a problem (except in something narrow like the tablegen case, where the buffer being read is baked into the program and known-good - at least in that case the end isn't hard to know/find, if technically unnecessary, but I still wouldn't mind if that got some bounds checking too, I can't imagine that it'd substantially harm performance, though I could be wrong)

Created using spr 1.3.5-bogner

github-actions · 2024-04-30T05:49:22Z

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:

git-clang-format --diff 18268ac0f48d93c2bcddb69732761971669c09ab 7d97a0c0649c29df1c4c7ff3bdfb115ab4df6132 -- llvm/include/llvm/Support/LEB128.h llvm/unittests/Support/LEB128Test.cpp llvm/utils/TableGen/DecoderEmitter.cpp

View the diff from clang-format here.

diff --git a/llvm/include/llvm/Support/LEB128.h b/llvm/include/llvm/Support/LEB128.h
index 3877898e28..a15b73bc14 100644
--- a/llvm/include/llvm/Support/LEB128.h
+++ b/llvm/include/llvm/Support/LEB128.h
@@ -200,8 +200,7 @@ inline int64_t decodeSLEB128(const uint8_t *p, unsigned *n = nullptr,
   return Value;
 }
 
-inline uint64_t decodeULEB128AndInc(const uint8_t *&p,
-                                    const uint8_t *end,
+inline uint64_t decodeULEB128AndInc(const uint8_t *&p, const uint8_t *end,
                                     const char **error = nullptr) {
   unsigned n;
   auto ret = decodeULEB128(p, &n, end, error);
@@ -209,8 +208,7 @@ inline uint64_t decodeULEB128AndInc(const uint8_t *&p,
   return ret;
 }
 
-inline int64_t decodeSLEB128AndInc(const uint8_t *&p,
-                                   const uint8_t *end,
+inline int64_t decodeSLEB128AndInc(const uint8_t *&p, const uint8_t *end,
                                    const char **error = nullptr) {
   unsigned n;
   auto ret = decodeSLEB128(p, &n, end, error);

dwblaikie · 2024-05-06T19:36:23Z

Some decode[US]LEB128AndInc occurrences in llvm/utils/TableGen/DecoderEmitter.cpp omit end. Making end non-default would need to update them where there isn't a good parameter...

Not sure I follow this - the good parameter to use for this case would be the end of the buffer.

It looks like the not code-generated parts of this code already do such error checking (even if they only assert when there is an error)

llvm-project/llvm/utils/TableGen/DecoderEmitter.cpp

Lines 859 to 861 in e68d505

    
           unsigned Start = decodeULEB128(Table.data() + Pos + 1, nullptr, 
        
                                          Table.data() + Table.size(), &ErrMsg); 
        
           assert(ErrMsg == nullptr && "ULEB128 value too large!");

It looks like if decodeInstruction took an ArrayRef instead of a pointer it wouldn't require updates to the caller (they pass in global arrays anyway, so ArrayRef's length deduction would kick in) and then the buffer length would be known/could be passed in (well, length - (ptr - data) I suppose, which is admittedly a bit annoying and maybe there's a better way to manage that) to the decodeULEB128 calls.

Anyway, at least this'll do for now I guess - thanks!

dwblaikie

Thanks!

Created using spr 1.3.5-bogner

aganea · 2024-05-22T21:01:13Z

llvm/unittests/Support/LEB128Test.cpp

+    Actual = decodeULEB128AndInc(Value, Value + strlen(VALUE), &Error);        \
+    EXPECT_NE(Error, nullptr);                                                 \
+    EXPECT_EQ(0ul, Actual);                                                    \
+    EXPECT_EQ(ERROR_OFFSET, Value - reinterpret_cast<const uint8_t *>(VALUE)); \


Hello @MaskRay!
There's no guarantee that VALUE here at L163 will be evaluated to the same value as VALUE when it was assigned at L158. These tests fail when building with latest MSVC using the Debug target. Same comment applies to other cases below.
Are you able to fix this or should I send a PR?

Note: Google Test filter = LEB128Test.DecodeInvalidULEB128 [==========] Running 1 test from 1 test suite. [----------] Global test environment set-up. [----------] 1 test from LEB128Test [ RUN ] LEB128Test.DecodeInvalidULEB128 C:\src\git\llvm-project\llvm\unittests\Support\LEB128Test.cpp(167): error: Expected equality of these values: 0u Which is: 0 Value - reinterpret_cast<const uint8_t *>("") Which is: -5 C:\src\git\llvm-project\llvm\unittests\Support\LEB128Test.cpp(168): error: Expected equality of these values: 1u Which is: 1 Value - reinterpret_cast<const uint8_t *>("\x80") Which is: -167 C:\src\git\llvm-project\llvm\unittests\Support\LEB128Test.cpp(171): error: Expected equality of these values: 9u Which is: 9 Value - reinterpret_cast<const uint8_t *>("\x80\x80\x80\x80\x80\x80\x80\x80\x80\x02") Which is: -167 C:\src\git\llvm-project\llvm\unittests\Support\LEB128Test.cpp(172): error: Expected equality of these values: 10u Which is: 10 Value - reinterpret_cast<const uint8_t *>("\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x02") Which is: -166 [ FAILED ] LEB128Test.DecodeInvalidULEB128 (2 ms) [----------] 1 test from LEB128Test (2 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (4 ms total) [ PASSED ] 0 tests. [ FAILED ] 1 test, listed below: [ FAILED ] LEB128Test.DecodeInvalidULEB128 1 FAILED TEST

Hi, I did not see the problem. If you have a PR, that will be appreciated!

[𝘀𝗽𝗿] initial version

98181cf

Created using spr 1.3.5-bogner

llvmbot added the llvm:support label Apr 24, 2024

MaskRay requested review from adrian-prantl and dwblaikie April 24, 2024 23:57

MaskRay mentioned this pull request Apr 24, 2024

[Support] Add decodeULEB128AndInc/decodeSLEB128AndInc #85739

Merged

adrian-prantl reviewed Apr 25, 2024

View reviewed changes

make end mandatory and add Unsafe variant

7d97a0c

Created using spr 1.3.5-bogner

dwblaikie approved these changes May 6, 2024

View reviewed changes

clang-format

1218d5b

Created using spr 1.3.5-bogner

MaskRay merged commit efad149 into main May 8, 2024
3 of 4 checks passed

MaskRay deleted the users/MaskRay/spr/support-add-enderror-to-decodeusleb128andinc branch May 8, 2024 16:22

aganea reviewed May 22, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Support] Add end/error to decode[US]LEB128AndInc #90006

[Support] Add end/error to decode[US]LEB128AndInc #90006

MaskRay commented Apr 24, 2024 •

edited

Loading

llvmbot commented Apr 24, 2024

adrian-prantl left a comment

MaskRay commented Apr 26, 2024

dwblaikie commented Apr 30, 2024

github-actions bot commented Apr 30, 2024

dwblaikie commented May 6, 2024

dwblaikie left a comment

aganea May 22, 2024 •

edited

Loading

aganea May 22, 2024

MaskRay May 23, 2024

[Support] Add end/error to decode[US]LEB128AndInc #90006

[Support] Add end/error to decode[US]LEB128AndInc #90006

Conversation

MaskRay commented Apr 24, 2024 • edited Loading

llvmbot commented Apr 24, 2024

adrian-prantl left a comment

Choose a reason for hiding this comment

MaskRay commented Apr 26, 2024

dwblaikie commented Apr 30, 2024

github-actions bot commented Apr 30, 2024

dwblaikie commented May 6, 2024

dwblaikie left a comment

Choose a reason for hiding this comment

aganea May 22, 2024 • edited Loading

Choose a reason for hiding this comment

aganea May 22, 2024

Choose a reason for hiding this comment

MaskRay May 23, 2024

Choose a reason for hiding this comment

MaskRay commented Apr 24, 2024 •

edited

Loading

aganea May 22, 2024 •

edited

Loading