Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unsigned conversion regression #12554

Closed
mratsim opened this issue Oct 29, 2019 · 38 comments
Closed

unsigned conversion regression #12554

mratsim opened this issue Oct 29, 2019 · 38 comments

Comments

@mratsim
Copy link
Collaborator

mratsim commented Oct 29, 2019

The following used to give 44 on 0.20.2.

It now raises an error range error on 1.0.

let x = 300'u
echo byte(x)

This broke our upcoming networking code: vacp2p/nim-libp2p@541f0f2. In general, I don't think people expect range checking on unsigned integers, especially on converting to raw byte.

This may have a large impact on low-level code for example: graphics, crypto, networking, emulation ...

@kraptor
Copy link
Contributor

kraptor commented Oct 30, 2019

For low-level code, what I always do is using proper casting to avoid this:

let x = 300'u
echo cast[byte](x)

@Araq
Copy link
Member

Araq commented Oct 30, 2019

Here is what the spec says about it:

`Automatic type conversion`:idx: is performed in expressions where different
kinds of integer types are used: the smaller type is converted to the larger.

A `narrowing type conversion`:idx: converts a larger to a smaller type (for
example ``int32 -> int16``. A `widening type conversion`:idx: converts a
smaller type to a larger type (for example ``int16 -> int32``). In Nim only
widening type conversions are *implicit*:

Type conversions
----------------
Syntactically a `type conversion` is like a procedure call, but a
type name replaces the procedure name. A type conversion is always
safe in the sense that a failure to convert a type to another
results in an exception (if it cannot be determined statically).

So it now behaves as the spec requires and previously it didn't. But I agree the wording is not explicit about unsigned types and worse there was no changelog entry for this change.

@Araq
Copy link
Member

Araq commented Oct 30, 2019

This may have a large impact on low-level code for example: graphics, crypto, networking, emulation ...

We will add a changelog entry and we might add a switch to turn it off again, but looking at it without the history, it seems to be an improvement, use bit masking or cast to convert to bytes. Nim got more consistent in its behaviour.

@dryajov
Copy link

dryajov commented Oct 30, 2019

I generally agree with the behavior - it is safer, but the problem is that it fails at runtime, it should be caught at compile time and warn at the very least, tho failing compilation is much more preferable.

@Araq
Copy link
Member

Araq commented Oct 30, 2019

Alright, good. Let's make this create a new warning.

@dryajov
Copy link

dryajov commented Oct 30, 2019

Dope!

@krux02
Copy link
Contributor

krux02 commented Oct 30, 2019

@Araq what kind of warning do you have in mind?

@Araq
Copy link
Member

Araq commented Oct 30, 2019

Warning: This type conversion now produces a runtime check that can raise

@dryajov
Copy link

dryajov commented Oct 30, 2019

How about:

Warning: narrowing type conversions are [deprecated/not allowed] 🚲 🏠

@krux02
Copy link
Contributor

krux02 commented Oct 30, 2019

What if this runtime check is exactly what I need?

@dryajov
Copy link

dryajov commented Oct 30, 2019

I think this isn't for checks, its for assignments specifically, thats where I would expect it to warn.

let myByte: byte = 300 should warn

@krux02
Copy link
Contributor

krux02 commented Oct 30, 2019

that one isn't allowed, it is checked at compile time.

@dryajov
Copy link

dryajov commented Oct 30, 2019

You're right, that one is caught, this one isn't (https://play.nim-lang.org/#ix=20lj):

let val: uint = 300
let myByte: byte = byte(val)

@Araq
Copy link
Member

Araq commented Oct 30, 2019

Then you turn off this warning? or rewrite your code to have a raise statement and a range check.

@krux02
Copy link
Contributor

krux02 commented Oct 30, 2019

How about: "Everything is fine as it is."? When the network code crashes because of an unhandled case, that is a bug that should be fixed. It is a good crash. Don't hide it.

@dryajov
Copy link

dryajov commented Oct 30, 2019

I don't see how this is network specific, any sort of bit shifting will land you this error, moreover, I bet that there's a huge amount of code out there that (unknowingly) relies on this behavior that will now start to randomly crash all over. The intent of the warning is to uncover all those places that need fixing before they crash your app.

@Clyybber
Copy link
Contributor

Another case for a third integer type, the safe uint. (a uint with range and overflow checks)

@mratsim
Copy link
Collaborator Author

mratsim commented Oct 30, 2019

unsigned is not supposed to have overflow checks, when a number goes over the range it should roll over, including when narrowing the type.

Assuming we want to add all of that on the Nim unsigned integer, I would need new memory int types that behaves like the current unsigned int for the purposes of implementing VM and cryptography. Just like pointers and ptr UncheckedArray they would mean "I know what I'm doing".

Otherwise unsigned would be the first Nim feature with no low-level escape hatch, this also has:

  • runtime costs
  • code auditing cost
  • code size cost

Unlike signed integers which are general purpose, unsigned main use-case is having a tight control over the memory layout and the code generated.

@krux02
Copy link
Contributor

krux02 commented Oct 30, 2019

@mratsim I am entirely sure what you want to say. Especially with this memory int types.

But I can ensure you, that only the conversion between the types has a range check.

Otherwise unsigned would be the first Nim feature with no low-level escape hatch

Remember there is also always the conversion with cast that never has a range check.

... including when narrowing the type

Are you proposing that cast[byte](x) and byte(x) should be the same thing? What should I do when I want the old behavior of byte(x) back?

@mratsim
Copy link
Collaborator Author

mratsim commented Oct 30, 2019

I was under the impression that casting was for reinterpreting the raw memory i.e. it would only be valid between types of the exact same size.

The old behaviour of byte in 0.20.2 and before was the one I was describing.
The current behaviour is a regression introduced in 1.0 that has the potential of breaking all the past networking, graphics, VM, emulators, crypto and probably kernel in Nim projects codes written.

What is even worse is that it's a change that will not be caught at compile-time but at runtime, potentially only after a long time until we have an overflow.

And the ugly is that someone who wants to not use this behavior has no recourse. I don't have alternative types that behave like a machine integer anymore. This is not expected from a system programming language, you need an escape hatch for low-level programming.

Lastly, in many cases and especially for cryptography, you might want not to throw any exceptions at all on certain part of the codebase. Now as soon as you use int or uint, you have an OverflowError tag that will pollute the effect system which makes one of the most promising usage of the effect system useless.


If you want the current 1.0 behavior you can use:

proc safe_byte(x: int): byte =
  if x > int(high(int8)):
    raise newError(RangeError, "Not in the 0 .. 255 range: " & $x)
  byte(x)

I want to emphasize that the issue is not just byte, it's all unsigned integer types as well.

Unsigned -> unsigned

Unsigned -> unsigned conversion should be runtime check free

Works in 0.20.2 // Fails in 1.0.x

let x = 1'u64 shl 32 + 1
echo uint32(x)

Signed -> unsigned conversion

Signed -> unsigned should have range check because signed integers are checked.

This was the case also in the past

let x = -1'i64
echo uint32(x)

Signed -> signed conversion

Signed -> signed should have range check because signed integers are checked.

This was not the case in 0.20.2 and has been fixed in 1.0 (and introduced the current regression)

let x = 1'u64 shl 32 + 1
echo int32(x)

@Araq
Copy link
Member

Araq commented Oct 30, 2019

I'm changing my mind about this issue and agree with @mratsim more and more.

@dryajov
Copy link

dryajov commented Oct 30, 2019

I'm on the same boat, after some cursory literature review, e.g. https://en.wikipedia.org/wiki/Integer_overflow (citation 1) - a computation involving unsigned operands can never overflow, as per C11 standard, I tend to agree with this being at the very least surprising behavior. Nim is ofcourse free to make its own choices in this regard, but this seems to be a well established practice.

My suggestion with regards to the warning where mostly to enable early detection of issues that this change introduced, other than that I now agree with @mratsim assessment.

@krux02
Copy link
Contributor

krux02 commented Oct 30, 2019

I don't think that conversions to unsigned integers were not checked in the past. They were half ass range checked with special cases where range checking could not be implemented. Finally we have the int128 type in the compiler and we could implement all range checks correctly. uint64 was such an exception type that was left out for range checks.

My suggestion is that you at least write an RFC for the behavior that you want, because you want to change the spec.

@dryajov we are just talking about conversations into unsigned integer types. Not about computations within unsigned types. These already have the expected mod2 behavior.

@krux02
Copy link
Contributor

krux02 commented Oct 30, 2019

Btw, I really think we should not touch this thing anymore. It was a big pain in the ass to get it all working correctly to the spec. It all works according to the spec. Using cast[byet](x) if you don't want a really dosn't seem to be a big deal. and btw, I made sure that casting between integer types of different sizes works correctly on the VM as well.

@dryajov
Copy link

dryajov commented Oct 30, 2019

@dryajov we are just talking about conversations into unsigned integer types. Not about computations within unsigned types. These already have the expected mod2 behavior.

Then I would say that this is even less consistent, if computation already behaves in this way I would expect conversion to work the same.

@dryajov
Copy link

dryajov commented Oct 30, 2019

Btw, I really think we should not touch this thing anymore. It was a big pain in the ass to get it all working correctly to the spec. It all works according to the spec. Using castbyet if you don't want a really dosn't seem to be a big deal. and btw, I made sure that casting between integer types of different sizes works correctly on the VM as well.

If we don't do something about it, existing code is going to trip on this pretty badly. At the very least this would require everybody to audit their code or expect random crashes, I'm not sure this is the best experience for end users.

To avoid this we either a) revert to the previous behavior and allow conversion to behave just like computation does b) at the very least introduce a warning that allows people to find and fix their code.

@krux02
Copy link
Contributor

krux02 commented Oct 31, 2019

@dryajov you want the old behavior where var x: uint64 = -1 is fine, but var y: uint8 = 300 is not?

Just a reminder let x = byte(300) was always illegal.

@dryajov
Copy link

dryajov commented Oct 31, 2019

@krux02 I would go with what @mratsim has suggested so far, I.e. making sure that it is at least consistent with 0.20.2.

On a broader note, maybe arithmetic types need some revising, but I'm certainly not an expert on Nim nor type systems to have a super strong opinion. At this point tho, anything that touches this would probably be a breaking change and won't come before the next major release.

Btw, thanks for the awesome job on getting to 1.0. Nim is truly a special language and you've all done an amazing job!

@Araq
Copy link
Member

Araq commented Oct 31, 2019

My suggestion is that you at least write an RFC for the behavior that you want, because you want to change the spec.

Correct, the bugfixes requires a spec update. I'm willing to do that because people have to come to rely on the old behaviour, so the spec should reflect the behavior of 0.20.

@mratsim
Copy link
Collaborator Author

mratsim commented Oct 31, 2019

@dryajov you want the old behavior where var x: uint64 = -1 is fine, but var y: uint8 = 300 is not?

Just a reminder let x = byte(300) was always illegal.

This is about conversion between unsigned types. Your second example is using signed int.

This used to work

let x = 300'u64
let y = byte(x)

Note that the fact that this doesn't let x = byte(300'u64) is due to the VM representing everything as an int internally for constant folding.

@krux02
Copy link
Contributor

krux02 commented Oct 31, 2019

let x = byte(300'u64) isn't computed in the VM. It is computed in semfold.nim.

@arnetheduck
Copy link
Contributor

I would generally expect all conversions to unsigned not to raise - that's why I use unsigned in the first place, to get wrapping behaviour and I don't see why the function that converts to unsigned should be any different. the mirror operation, uint->int should raise however - int is a checked type

By this logic, byte(-1) is a totally permissible expression that resolves to the byte 0xff both at runtime and at compile time (on a two's complement machine).

generally, I'd also think that byte(300'u64) should be allowed - why should runtime and compile time have different behaviour? it is useful when wanting to find the bit pattern of some large number without calculating it in your head, and it's quite odd that var x = 300'u64; byte(x) would be handled differently.

@zah
Copy link
Member

zah commented Oct 31, 2019

There is a way to follow the spec and to address @mratsim's concerns at the same time.

The narrowing conversion can be modelled to have a pre-condition that the value must be in range. When the compiler is not sure that this pre-condition is satisfied, it must insert a range check before the narrowing. But if the control flow demonstrates that the pre-condition is satisfied, the range check can be omitted.

How does the compiler know that the pre-condition is satisfied? Well, this is exactly the type of data-flow analysis that will become possible with the planned integration of Z3. It boils down to noticing that expressions such as intVal and 0xff are within a certain range and the same is also ensured by branches in the code such as if value < 256: ....

This will allow us to have a simple non-casting template looking like this:

template truncate*(val: SomeUnsigned, TargetType: typedesc): TargetType =
  TargetType(val and high(TargetType))

The bit-masking should be included in the peephole optimisation patterns of the back-end compiler and it produces exactly the same code as the "machine truncation" byte(int_val). Check out the compiled code here:

https://godbolt.org/z/DD-uJc

Unguarded narrowing can become a warning for now and an error in the future. truncate can use a cast until the more sophisticated data-flow analysis is in place.

@mratsim
Copy link
Collaborator Author

mratsim commented Oct 31, 2019

It is fine to implement such an optimization for signed integers.
However unsigned integers should not have any check at all, and this should be guaranteed.

This is an escape hatch required to write any low-level code were the developer knows better than the default language.

In particular, crypto developers already have to jump through some hoops to ensure that the C compiler never produce an "if" instruction see: https://github.com/veorq/cryptocoding.
Nim shouldn't add its own difficulties on top.
It needs a family of integer types that guarantees never to branch or throw an exception to be usable for cryptography.

Additionally, I would argue that any change that adds or remove thrown exceptions should be considered a breaking API change (in that case it was announced for signed int, and is a regression for unsigned).

Note that this is not only true for crypto, if we want to add GPU code generation in the future we will need to not generate such checks/exceptions as well.

@Araq
Copy link
Member

Araq commented Oct 31, 2019

Here you go, nim-lang/RFCs#175

@Araq
Copy link
Member

Araq commented Nov 16, 2019

This one works for me fwiw:

let t: uint = 3
assert 0 < t

Same for:

let t: uint32 = 0
echo t xor 0o777 # fine
echo 0o777 xor t # nope

@disruptek
Copy link
Contributor

Never mind, I must have had 245a954 in my compiler, which means this shouldn't work in 1.0. Duh.

@Araq
Copy link
Member

Araq commented Nov 20, 2019

Fixed, 1.0.4 will have it too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants