Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Span length should be unsigned #1242

Closed
JosephBialekMsft opened this issue Jul 19, 2018 · 11 comments
Closed

Span length should be unsigned #1242

JosephBialekMsft opened this issue Jul 19, 2018 · 11 comments
Assignees

Comments

@JosephBialekMsft
Copy link

It doesn't make a whole lot of sense to me for span's length to be a signed value. The length can never be negative in the first place, nor are negative values used for any special purpose.

Signed lengths for span have the following annoyances:

  1. Many common interfaces that return buffers that one may want to wrap in a span have a size_t or unsigned size. For example: malloc, ExAllocatePoolWithTag (Windows kernel allocator). Many things you may want to use span with also operate on unsigned sizes (for example, network functions such as send/recv).
  2. Span code-gen is less efficient since there are checks all over the place (for example, span creation, [] overload, etc.) that must check that size/index is not negative.
  3. We've been working with the VS backend compiler team on better span optimizations. They've noted that certain optimizations aren't possible if the span size type and index type are different (signed vs. unsigned). It is very common for people to use unsigned types when looping through arrays. In the Windows kernel as an example, signed values are virtually never used to iterate through an array.
@neilmacintosh
Copy link
Contributor

@JosephBialekMsft the design decision to use a signed type for span's indexing and size is discussed in the original proposal and follows general thinking described in the Core Guidelines.

It might also be worth looking at: https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#es102-use-signed-types-for-arithmetic and https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#es107-dont-use-unsigned-for-subscripts-prefer-gslindex for context.

The fact that unsigned values are used for indexing inside loops by the Windows kernel could be considered a reliability issue. Signed arithmetic allows the compiler to invoke undefined behavior on overflow/underflow. Unsigned arithmetic simply masks the bug.

@JosephBialekMsft
Copy link
Author

Thanks Neil, I gave that a read.

In the context of span, many of examples listed (integer underflow) would matter in a particularly meaningful way. A negative value is invalid just like a value of 4294967294 is almost certain to be invalid in a span.

In the case of integer overflows, it's certainly possible that you overflow your index while iterating through a span, although again, such situations are likely to be pretty rare based on my experience reviewing code that uses unsigned indexes.

The downside is you double the the amount of checks that are required (>= 0 and < size), and checks are being made for situations that are logically nonsensical (a negative index being passed). Binary protocols intentionally use unsigned sizes since it allows for a greater range of meaningful values per bit of storage used. Many pointer/size interfaces use unsigned sizes since it makes more logical sense and cuts the amount of checks in half.

This may just devolve to a religious debate of signed vs unsigned. In the world I live in, unsigned is everywhere. It'd be nice to have an unsigned span to interop cleanly since porting all this unsigned code to signed is impossible in many cases (i.e. we are talking about publicly documented interfaces/structures) and the port itself would certainly lead to bugs even if it were possible.

@annagrin
Copy link

@JosephBialekMsft , @neilmacintosh

I decided to give this a summary, let me know if I missed anything.

span's index and size are currently signed (ptrdiff_t)

Pros:

  • eliminated some potential bugs (see links above posted by @neilmacintosh)
  • conforms to standard range of accessible contiguous memory
  • removes undefined behavior cases (ie makes code portable)

Cons:

  • not convenient to use in old code where it replaces other data structure
  • not convenient to use with standard libraries where compiler warns about signed/unsigned mismatch
  • potentially changes behavior - for example, a too large index (undefined behavior) wraps around after cast to ptrdiff_t and therefore becomes unexpected behavior
  • affects performance and optimizations because we need to ensure that index is non-negative. This is arguable - to have a signed index and to ensure c++ standard, we would need to check that the index is not too large and does not wrap around, which is no better than a non-negative check.

Conclusion:

  • we don't want to remove benefits of converting to standard c++ behavior
  • we do want to alleviate the cons above, which is possible by creating either wrapper/transition span type or helper functions for size() and operator[] that would ensure that the conversion is done correctly.

@JosephBialekMsft
Copy link
Author

@annagrin

I don't agree that it removes undefined behavior cases. An unsigned value wrapping is defined behavior, even if it isn't expected by the developer (but that falls in to the bullet point of "eliminated some potential bugs"). In that case we still have memory safety, but the logic could be incorrect. Maybe I'm not thinking of an undefined behavior case you're thinking of though.

While pro #2 is true for most practical scenarios, but it's not always true. Windows on x86 supports partitioning 3GB of user-mode virtual address space and 1GB of kernel-mode virtual address space. I believe Linux has similar support. Thus on x86 it would be technically possible for someone to make an allocation that doesn't fit in to a span. Although it is unlikely. If 64bit OS's ever allow full use of the 64bit virtual address space (57bit is on the way from Intel) we'd have similar problems. Just bringing this up for completeness.

I think it'd be useful to just allow span to specify if it's going to use a signed or unsigned size span. I haven't looked at all the code but I imagine the optimizer should just eliminate the <=0 checks for unsigned since that condition is impossible. You may not need to change anything else about the class.

@robert-andrzejuk
Copy link
Contributor

For the Core Guidelines this was commented by Herb Sutter: #1115

@neilmacintosh
Copy link
Contributor

Thanks @robert-andrzejuk for pointing to Herb's nice exposition of the Core Guidelines thinking on signed vs. unsigned integer types for indexing.

Given that span has already been standardized with a signed index type, making the sort of change @JosephBialekMsft suggests would represent a significant fork for this GSL implementation. I see no reason for such a fork, given it does not follow the thinking contained in the Core Guidelines.

@hsutter
Copy link
Contributor

hsutter commented Jul 30, 2018

Editors call: If it isn't already said, we should say that unsigned overflow is well defined but defined to something that is wrong for what we use indices for.

@neilmacintosh and @GabrielDosReis could you please add links to the comment threads in the GSL repo that also cover this with additional detail?

@neithernut
Copy link

I think it was this conversation: microsoft/GSL#322

@neilmacintosh
Copy link
Contributor

neilmacintosh commented Jul 30, 2018

Thanks @neithernut, there is another, much earlier discussion here too: microsoft/GSL#171.

Let me also briefly summarize our thinking again here, just for clarity.

  1. std::ptrdiff_t is the type that is provided for indexing into arrays by the language (as per the standard). It is a signed integer type. You might happen to have an operating system that allows you to allocate a quantity of memory larger than can be indexed by std::ptrdiff_t, but you can only legally access that memory using std::ptrdiff_t. If your compiler does otherwise then it is offering a non-portable extension.

  2. Unsigned integers (unfortunately) have well-defined behavior in the language to "wrap around". This behavior is error-prone and undesirable in the context of calculating sizes and indices for access into memory.

@hsutter
Copy link
Contributor

hsutter commented Jul 30, 2018

Also remember that gsl::index is now the approved way to refer to std::ptrdiff_t for indexing porpoises (see #1115).

@neilmacintosh
Copy link
Contributor

Ok. I think this discussion is done at this point, and "the way of the Guidelines" on this topic is pretty clear so I'm closing the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants