[LLVM][ADT] Explicitly convert size_t values to SmallVector's size type #77939

andrey-golubev · 2024-01-12T15:33:04Z

Multiple places rely on implicit conversion when assigning 'size_t' values to the member fields (size or capacity) of SmallVector.

Depending on the platform / compiler configuration, this may result in narrowing conversion warnings (especially given that the size type of SmallVector's member fields is determined based on type T - in SmallVector). To avoid the problem altogether, make the conversions explicit.

llvmbot · 2024-01-12T15:33:37Z

@llvm/pr-subscribers-llvm-support

@llvm/pr-subscribers-llvm-adt

Author: Andrei Golubev (andrey-golubev)

Changes

Multiple places rely on implicit conversion when assigning 'size_t' values to the member fields (size or capacity) of SmallVector.

Depending on the platform / compiler configuration, this may result in narrowing conversion warnings (especially given that the size type of SmallVector's member fields is determined based on type T - in SmallVector<T>). To avoid the problem altogether, make the conversions explicit.

Full diff: https://github.com/llvm/llvm-project/pull/77939.diff

1 Files Affected:

(modified) llvm/include/llvm/ADT/SmallVector.h (+5-4)

diff --git a/llvm/include/llvm/ADT/SmallVector.h b/llvm/include/llvm/ADT/SmallVector.h
index 2e6d2dc6ce90a2..65e862911522a8 100644
--- a/llvm/include/llvm/ADT/SmallVector.h
+++ b/llvm/include/llvm/ADT/SmallVector.h
@@ -61,7 +61,7 @@ template <class Size_T> class SmallVectorBase {
 
   SmallVectorBase() = delete;
   SmallVectorBase(void *FirstEl, size_t TotalCapacity)
-      : BeginX(FirstEl), Capacity(TotalCapacity) {}
+      : BeginX(FirstEl), Capacity(static_cast<Size_T>(TotalCapacity)) {}
 
   /// This is a helper for \a grow() that's out of line to reduce code
   /// duplication.  This function will report a fatal error if it can't grow at
@@ -99,8 +99,9 @@ template <class Size_T> class SmallVectorBase {
   ///
   /// This does not construct or destroy any elements in the vector.
   void set_size(size_t N) {
-    assert(N <= capacity());
-    Size = N;
+    auto n = static_cast<Size_T>(N);
+    assert(n <= capacity());
+    Size = n;
   }
 };
 
@@ -468,7 +469,7 @@ void SmallVectorTemplateBase<T, TriviallyCopyable>::takeAllocationForGrow(
     free(this->begin());
 
   this->BeginX = NewElts;
-  this->Capacity = NewCapacity;
+  this->Capacity = static_cast<SmallVectorSizeType<T>>(NewCapacity);
 }
 
 /// SmallVectorTemplateBase<TriviallyCopyable = true> - This is where we put

andrey-golubev · 2024-01-12T15:34:43Z

CC @dexonsmith @alinas @nikic @browneee you seem to be working on / around SmallVector! feel free to add other reviewers as you see fit.

felipepiovezan · 2024-01-12T15:52:03Z

llvm/include/llvm/ADT/SmallVector.h

-    assert(N <= capacity());
-    Size = N;
+    auto n = static_cast<Size_T>(N);
+    assert(n <= capacity());


would it make sense to also assert that n == N?

capacity() also returns size_t, so probably better to keep the assertion first. e.g.

assert(N <= capacity()); Size = static_cast<Size_T>(N); // As Felipe suggests, you could add: assert(static_cast<size_t>(Size) == N);

IMO, @browneee 's suggestion here is more clear / easier to follow. n vs. N is pretty subtle.

Thinking a little more: N <= capacity() implies static_cast<size_t>(Size) == N because capacity() is from a Size_T variable.

capacity() also returns size_t, so probably better to keep the assertion first.

good point. I'm not sure why our downstream had it differently. "reverted" this part.

overall, rewrote this part as per @browneee 's advice. I think it makes sense to keep the assert(static_cast<size_t>(Size) == N); at the end as well. if anything, it's a function's postcondition validation so should be no harm (also, wondering if it might actually catch any data-losing size_t -> uint32_t narrowings).

I think it makes sense to keep the assert(static_cast<size_t>(Size) == N); at the end as well.

This assertion is correct, but not necessary because it is implied by assert(N <= capacity());.

I would avoid an unnecessary assertion, because this is performance sensitive code.

If you want to keep it for clarity then I'd suggest only doing it in debug:

LLVM_DEBUG(assert(static_cast<size_t>(Size) == N));

My vote would be to omit it, because it is covered by the earlier assertion.

I thought assert() is by definition debug-only (exception being if one explicitly sets LLVM_ENABLE_ASSERTIONS). do you think people use assert() via LLVM_ENABLE_ASSERTIONS in production?

Since it's covered by the earlier assertion, seems like all we need is a comment explaining that. Anyone sufficiently interested can read the comment.

added the comment.

browneee · 2024-01-12T18:27:52Z

llvm/include/llvm/ADT/SmallVector.h

@@ -468,7 +469,7 @@ void SmallVectorTemplateBase<T, TriviallyCopyable>::takeAllocationForGrow(
    free(this->begin());

  this->BeginX = NewElts;
-  this->Capacity = NewCapacity;
+  this->Capacity = static_cast<SmallVectorSizeType<T>>(NewCapacity);


Should we add a set_capacity(size_t) method to SmallVectorBase, to make this cast a bit neater?

added. please also take a look at the new function (i'm not a native English speaker so maybe there are errors in the docs. albeit I pretty much copy-pasted the relevant set_size wording).

Thanks.

After seeing this change, I think it would probably make sense to include this line in the new function:

this->BeginX = NewElts;

... setting the allocation pointer and the capacity together makes sense.

This would also change the name of the new function:

/// Set the allocation to \p A and the capacity to \p M. /// /// This does not construct or destroy any elements in the vector. /// This does not clean up any existing allocation. void set_allocation_and_capacity(void *A, size_t M) {

i'm actually hesitant a bit: the set_capacity() has a much broader usage scope than the new function. i guess there are also cases when we only want to set the capacity?

i guess there are also cases when we only want to set the capacity?

Such as?

I think setting a capacity is intrinsically linked to setting a new storage pointer, which is why it makes sense to do both at once..

FWIW, I agree with @browneee; these are intrinsically linked; the point of capacity is to understand the storage pointer.

i guess there are also cases when we only want to set the capacity?

Such as?

Was thinking of realloc() but we'd anyway reset the pointer (since same memory block being "reused" is an implementation detail).

I think setting a capacity is intrinsically linked to setting a new storage pointer, which is why it makes sense to do both at once..

yeah, fair point. I was wondering whether data start != allocation start but these seem to always be equal (I guess the case when they are not is the custom alignment requirements but I haven't found anything like this).

one thing is whether set_allocation_and_capacity is a good name. to me, an "allocation" would mean both the start and the capacity (or end). so named the function set_allocation_range (as a tribute to "counted range"). There's however already replaceAllocation() so maybe set_allocation is enough? anyways, guessing it's for you guys to decide what makes sense here since you seem to interact with this code more often than me.

I prefer set_allocation_and_capacity, but I think any of these options are fine.

browneee · 2024-01-15T18:04:49Z

llvm/include/llvm/ADT/SmallVector.h

+  ///
+  /// This does not construct or destroy any elements in the vector.
+  void set_capacity(size_t M) {
+    assert(size() <= M);


I think this assertion is not useful, and may be misleading.

An assertion that would be clear and useful here would be:

assert(M <= SizeTypeMax());

...before assigning the elements and capacity.

You also wouldn't need assert(capacity() == M); as this assertion implies it will be successful. So remove that.

thanks, the assertion you propose would indeed be useful!

size() <= M is to just guarantee that we don't end up in a weird container state where Size > Capacity. but then using set_size() together with set_capacity() in the current sense would be problematic:

// capacity = 50, size = 40 // grow: // set_size(70); // asserts: 70 <= capacity() fails set_capacity(100); // OK set_size(70); // OK // shrink: // set_capacity(10); // asserts: size() <= 10 fails set_size(2); // OK set_capacity(10); // OK

I guess due to this "swap" in the order, it might make it harder to write generic code? but then I again think of Size > Capacity problem.

assert(M <= SizeTypeMax());

btw, I guess this could be added to set_size() as well. then equality check (size() == N) at the end also gets superseded.

No, if the existing assert(N <= capacity()); passes then I think assert(M <= SizeTypeMax()); must be true.

I think:

set_size should only check assert(N <= capacity());

set_capacity should only check assert(M <= SizeTypeMax());

Using set_capacity to shrink (regardless of the size) doesn't make sense, because set_capacity would not actually handle shrinking (only a small piece of shrinking - switching to a smaller allocation).

Multiple places rely on implicit conversion when assigning 'size_t' values to the member fields (size or capacity) of SmallVector. Depending on the platform / compiler configuration, this may result in narrowing conversion warnings (especially given that the size type of SmallVector's member fields is determined based on type T - in SmallVector<T>). To avoid the problem altogether, make the conversions explicit. Co-authored-by: Orest Chura <orest.chura@intel.com>

andrey-golubev · 2024-01-19T14:09:49Z

@dexonsmith added the comment but also squashed the changes for further merge. could you click the merge (if you think the changes are correct)? I lack permissions for this.

dexonsmith

LGTM too.

dexonsmith · 2024-01-19T16:06:40Z

@dexonsmith added the comment but also squashed the changes for further merge. could you click the merge (if you think the changes are correct)? I lack permissions for this.

Looks like the buildkit/github-pull-requests check is still pending... if you ping me when it's complete, I'd be happy to merge!

andrey-golubev · 2024-01-22T08:23:26Z

@dexonsmith added the comment but also squashed the changes for further merge. could you click the merge (if you think the changes are correct)? I lack permissions for this.

Looks like the buildkit/github-pull-requests check is still pending... if you ping me when it's complete, I'd be happy to merge!

@dexonsmith ping. The build finished over the weekends.

llvmbot added the llvm:adt label Jan 12, 2024

felipepiovezan reviewed Jan 12, 2024

View reviewed changes

browneee reviewed Jan 12, 2024

View reviewed changes

andrey-golubev requested review from dexonsmith, browneee and felipepiovezan January 15, 2024 11:24

browneee reviewed Jan 15, 2024

View reviewed changes

andrey-golubev requested a review from browneee January 16, 2024 13:29

llvmbot added the llvm:support label Jan 17, 2024

browneee approved these changes Jan 17, 2024

View reviewed changes

andrey-golubev force-pushed the small_vector_explicit_cast branch from 9eebfbc to 5e2626b Compare January 19, 2024 14:07

dexonsmith approved these changes Jan 19, 2024

View reviewed changes

dexonsmith merged commit 5fb39ef into llvm:main Jan 22, 2024
4 checks passed

andrey-golubev deleted the small_vector_explicit_cast branch January 22, 2024 08:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLVM][ADT] Explicitly convert size_t values to SmallVector's size type #77939

[LLVM][ADT] Explicitly convert size_t values to SmallVector's size type #77939

andrey-golubev commented Jan 12, 2024

llvmbot commented Jan 12, 2024 •

edited

andrey-golubev commented Jan 12, 2024

felipepiovezan Jan 12, 2024

browneee Jan 12, 2024

dexonsmith Jan 12, 2024

browneee Jan 12, 2024

andrey-golubev Jan 15, 2024 •

edited

browneee Jan 16, 2024

andrey-golubev Jan 17, 2024

dexonsmith Jan 17, 2024

andrey-golubev Jan 19, 2024

browneee Jan 12, 2024

andrey-golubev Jan 15, 2024 •

edited

browneee Jan 15, 2024

andrey-golubev Jan 16, 2024

browneee Jan 16, 2024

dexonsmith Jan 16, 2024

andrey-golubev Jan 17, 2024

andrey-golubev Jan 17, 2024 •

edited

browneee Jan 17, 2024

browneee Jan 15, 2024

andrey-golubev Jan 16, 2024

andrey-golubev Jan 16, 2024 •

edited

browneee Jan 16, 2024

andrey-golubev Jan 17, 2024

andrey-golubev commented Jan 19, 2024 •

edited

dexonsmith left a comment

dexonsmith commented Jan 19, 2024

andrey-golubev commented Jan 22, 2024

[LLVM][ADT] Explicitly convert size_t values to SmallVector's size type #77939

[LLVM][ADT] Explicitly convert size_t values to SmallVector's size type #77939

Conversation

andrey-golubev commented Jan 12, 2024

llvmbot commented Jan 12, 2024 • edited

andrey-golubev commented Jan 12, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrey-golubev Jan 15, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrey-golubev Jan 15, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrey-golubev Jan 17, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrey-golubev Jan 16, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrey-golubev commented Jan 19, 2024 • edited

dexonsmith left a comment

Choose a reason for hiding this comment

dexonsmith commented Jan 19, 2024

andrey-golubev commented Jan 22, 2024

llvmbot commented Jan 12, 2024 •

edited

andrey-golubev Jan 15, 2024 •

edited

andrey-golubev Jan 15, 2024 •

edited

andrey-golubev Jan 17, 2024 •

edited

andrey-golubev Jan 16, 2024 •

edited

andrey-golubev commented Jan 19, 2024 •

edited