Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LLVM][ADT] Explicitly convert size_t values to SmallVector's size type #77939

Merged
merged 1 commit into from
Jan 22, 2024

Conversation

andrey-golubev
Copy link
Contributor

Multiple places rely on implicit conversion when assigning 'size_t' values to the member fields (size or capacity) of SmallVector.

Depending on the platform / compiler configuration, this may result in narrowing conversion warnings (especially given that the size type of SmallVector's member fields is determined based on type T - in SmallVector). To avoid the problem altogether, make the conversions explicit.

@llvmbot
Copy link
Collaborator

llvmbot commented Jan 12, 2024

@llvm/pr-subscribers-llvm-support

@llvm/pr-subscribers-llvm-adt

Author: Andrei Golubev (andrey-golubev)

Changes

Multiple places rely on implicit conversion when assigning 'size_t' values to the member fields (size or capacity) of SmallVector.

Depending on the platform / compiler configuration, this may result in narrowing conversion warnings (especially given that the size type of SmallVector's member fields is determined based on type T - in SmallVector<T>). To avoid the problem altogether, make the conversions explicit.


Full diff: https://github.com/llvm/llvm-project/pull/77939.diff

1 Files Affected:

  • (modified) llvm/include/llvm/ADT/SmallVector.h (+5-4)
diff --git a/llvm/include/llvm/ADT/SmallVector.h b/llvm/include/llvm/ADT/SmallVector.h
index 2e6d2dc6ce90a2..65e862911522a8 100644
--- a/llvm/include/llvm/ADT/SmallVector.h
+++ b/llvm/include/llvm/ADT/SmallVector.h
@@ -61,7 +61,7 @@ template <class Size_T> class SmallVectorBase {
 
   SmallVectorBase() = delete;
   SmallVectorBase(void *FirstEl, size_t TotalCapacity)
-      : BeginX(FirstEl), Capacity(TotalCapacity) {}
+      : BeginX(FirstEl), Capacity(static_cast<Size_T>(TotalCapacity)) {}
 
   /// This is a helper for \a grow() that's out of line to reduce code
   /// duplication.  This function will report a fatal error if it can't grow at
@@ -99,8 +99,9 @@ template <class Size_T> class SmallVectorBase {
   ///
   /// This does not construct or destroy any elements in the vector.
   void set_size(size_t N) {
-    assert(N <= capacity());
-    Size = N;
+    auto n = static_cast<Size_T>(N);
+    assert(n <= capacity());
+    Size = n;
   }
 };
 
@@ -468,7 +469,7 @@ void SmallVectorTemplateBase<T, TriviallyCopyable>::takeAllocationForGrow(
     free(this->begin());
 
   this->BeginX = NewElts;
-  this->Capacity = NewCapacity;
+  this->Capacity = static_cast<SmallVectorSizeType<T>>(NewCapacity);
 }
 
 /// SmallVectorTemplateBase<TriviallyCopyable = true> - This is where we put

@andrey-golubev
Copy link
Contributor Author

CC @dexonsmith @alinas @nikic @browneee you seem to be working on / around SmallVector! feel free to add other reviewers as you see fit.

assert(N <= capacity());
Size = N;
auto n = static_cast<Size_T>(N);
assert(n <= capacity());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to also assert that n == N?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

capacity() also returns size_t, so probably better to keep the assertion first. e.g.

    assert(N <= capacity());
    Size = static_cast<Size_T>(N);

    // As Felipe suggests, you could add:
    assert(static_cast<size_t>(Size) == N);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, @browneee 's suggestion here is more clear / easier to follow. n vs. N is pretty subtle.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking a little more: N <= capacity() implies static_cast<size_t>(Size) == N because capacity() is from a Size_T variable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

capacity() also returns size_t, so probably better to keep the assertion first.

good point. I'm not sure why our downstream had it differently. "reverted" this part.

overall, rewrote this part as per @browneee 's advice. I think it makes sense to keep the assert(static_cast<size_t>(Size) == N); at the end as well. if anything, it's a function's postcondition validation so should be no harm (also, wondering if it might actually catch any data-losing size_t -> uint32_t narrowings).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to keep the assert(static_cast<size_t>(Size) == N); at the end as well.

This assertion is correct, but not necessary because it is implied by assert(N <= capacity());.

I would avoid an unnecessary assertion, because this is performance sensitive code.

If you want to keep it for clarity then I'd suggest only doing it in debug:

LLVM_DEBUG(assert(static_cast<size_t>(Size) == N));

My vote would be to omit it, because it is covered by the earlier assertion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought assert() is by definition debug-only (exception being if one explicitly sets LLVM_ENABLE_ASSERTIONS). do you think people use assert() via LLVM_ENABLE_ASSERTIONS in production?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's covered by the earlier assertion, seems like all we need is a comment explaining that. Anyone sufficiently interested can read the comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added the comment.

@@ -468,7 +469,7 @@ void SmallVectorTemplateBase<T, TriviallyCopyable>::takeAllocationForGrow(
free(this->begin());

this->BeginX = NewElts;
this->Capacity = NewCapacity;
this->Capacity = static_cast<SmallVectorSizeType<T>>(NewCapacity);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a set_capacity(size_t) method to SmallVectorBase, to make this cast a bit neater?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added. please also take a look at the new function (i'm not a native English speaker so maybe there are errors in the docs. albeit I pretty much copy-pasted the relevant set_size wording).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

After seeing this change, I think it would probably make sense to include this line in the new function:

this->BeginX = NewElts;

... setting the allocation pointer and the capacity together makes sense.

This would also change the name of the new function:

  /// Set the allocation to \p A and the capacity to \p M.
  ///
  /// This does not construct or destroy any elements in the vector.
  /// This does not clean up any existing allocation.
void set_allocation_and_capacity(void *A, size_t M) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm actually hesitant a bit: the set_capacity() has a much broader usage scope than the new function. i guess there are also cases when we only want to set the capacity?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess there are also cases when we only want to set the capacity?

Such as?

I think setting a capacity is intrinsically linked to setting a new storage pointer, which is why it makes sense to do both at once..

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I agree with @browneee; these are intrinsically linked; the point of capacity is to understand the storage pointer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess there are also cases when we only want to set the capacity?

Such as?

Was thinking of realloc() but we'd anyway reset the pointer (since same memory block being "reused" is an implementation detail).

I think setting a capacity is intrinsically linked to setting a new storage pointer, which is why it makes sense to do both at once..

yeah, fair point. I was wondering whether data start != allocation start but these seem to always be equal (I guess the case when they are not is the custom alignment requirements but I haven't found anything like this).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one thing is whether set_allocation_and_capacity is a good name. to me, an "allocation" would mean both the start and the capacity (or end). so named the function set_allocation_range (as a tribute to "counted range"). There's however already replaceAllocation() so maybe set_allocation is enough? anyways, guessing it's for you guys to decide what makes sense here since you seem to interact with this code more often than me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer set_allocation_and_capacity, but I think any of these options are fine.

///
/// This does not construct or destroy any elements in the vector.
void set_capacity(size_t M) {
assert(size() <= M);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this assertion is not useful, and may be misleading.

An assertion that would be clear and useful here would be:

assert(M <= SizeTypeMax());

...before assigning the elements and capacity.

You also wouldn't need assert(capacity() == M); as this assertion implies it will be successful. So remove that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, the assertion you propose would indeed be useful!

size() <= M is to just guarantee that we don't end up in a weird container state where Size > Capacity. but then using set_size() together with set_capacity() in the current sense would be problematic:

// capacity = 50, size = 40

// grow:
// set_size(70); // asserts: 70 <= capacity() fails
set_capacity(100); // OK
set_size(70); // OK

// shrink:
// set_capacity(10); // asserts: size() <= 10 fails
set_size(2); // OK
set_capacity(10); // OK

I guess due to this "swap" in the order, it might make it harder to write generic code? but then I again think of Size > Capacity problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert(M <= SizeTypeMax());

btw, I guess this could be added to set_size() as well. then equality check (size() == N) at the end also gets superseded.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, if the existing assert(N <= capacity()); passes then I think assert(M <= SizeTypeMax()); must be true.

I think:

  • set_size should only check assert(N <= capacity());
  • set_capacity should only check assert(M <= SizeTypeMax());

Using set_capacity to shrink (regardless of the size) doesn't make sense, because set_capacity would not actually handle shrinking (only a small piece of shrinking - switching to a smaller allocation).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Multiple places rely on implicit conversion when assigning 'size_t' values
to the member fields (size or capacity) of SmallVector.

Depending on the platform / compiler configuration, this may result in
narrowing conversion warnings (especially given that the size type of
SmallVector's member fields is determined based on type T - in SmallVector<T>).
To avoid the problem altogether, make the conversions explicit.

Co-authored-by: Orest Chura <orest.chura@intel.com>
@andrey-golubev
Copy link
Contributor Author

andrey-golubev commented Jan 19, 2024

@dexonsmith added the comment but also squashed the changes for further merge. could you click the merge (if you think the changes are correct)? I lack permissions for this.

Copy link
Collaborator

@dexonsmith dexonsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM too.

@dexonsmith
Copy link
Collaborator

@dexonsmith added the comment but also squashed the changes for further merge. could you click the merge (if you think the changes are correct)? I lack permissions for this.

Looks like the buildkit/github-pull-requests check is still pending... if you ping me when it's complete, I'd be happy to merge!

@andrey-golubev
Copy link
Contributor Author

@dexonsmith added the comment but also squashed the changes for further merge. could you click the merge (if you think the changes are correct)? I lack permissions for this.

Looks like the buildkit/github-pull-requests check is still pending... if you ping me when it's complete, I'd be happy to merge!

@dexonsmith ping. The build finished over the weekends.

@dexonsmith dexonsmith merged commit 5fb39ef into llvm:main Jan 22, 2024
4 checks passed
@andrey-golubev andrey-golubev deleted the small_vector_explicit_cast branch January 22, 2024 08:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants