Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Unsigned integer casting overflowing as if signed when using int() or UInt32() #3065

Closed
fnands opened this issue Jun 17, 2024 · 13 comments
Assignees
Labels
bug Something isn't working mojo-repo Tag all issues with this label

Comments

@fnands
Copy link

fnands commented Jun 17, 2024

Bug description

Migrating this here after a bit of discussion in Discord.

It seems like casting to unsigned integers actually just casts to signed integers, but has different behaviour in different cases.

I.e. I would expect that if I took a UInt32 value, set it to 129, cast it to UInt8 and then call int(), I would still get 129.

However, this seem to end up as 4294967169, i.e. 2**32 - 127

If I instead call UInt32() I get 1, so it overflows as 2**32 + 1? Not sure about the last one.

So despite the fact that no overflow is expected (129 < 255), it seems to overflow, i.e. act like an signed integer.

The same goes if I cast to UInt16, i.e. casting 32768 (2**16/2 + 1) to UInt16 and then calling int() I get 4294934528 i.e. 2**32 - 2**16/2

So it looks like the cast to UInt8 and UInt16 actually perform casts to Int8 and Int16, and then overflow.

Even more confusingly, this behaviour is different based on where the original UInt32 is from, but behaviour differs when initialising using var and alias (see below).

Steps to reproduce

Most minimal example

def main():
 
    var b: UInt32 = 129
    print(b.cast[DType.uint8]())
    print(int(b.cast[DType.uint8]()))
    print(UInt32(b.cast[DType.uint8]()))

Produces:

129
-127
1

Doesn't have the same effect in the REPL, but the REPL does seem to have a clue:

  1> var a: UInt32 = 129 
  2.  
(SIMD[uint32, 1]) a = {
  (ui32) [0] = 129
}
  2> var b = a.cast[DType.uint8]() 
  3.  
(SIMD[uint8, 1]) b = {
  (ui8) [0] = -127
}

A little bit less of a minimal example:

fn fill_super_minimal() -> List[UInt32]:
    var table = List[UInt32](capacity=4)
    table.append(129)
 
    # This overflows with alias, but not var
    table.append(int(table[0].cast[DType.uint8]()))
 
    # This overflows in both cases, as if uint8 overflowed.
    var a: UInt32 = 129
    table.append(int(a.cast[DType.uint8]()))
 
    # This overflows as if uint32 overflows
    var b: UInt32 = 129
    table.append(UInt32(b.cast[DType.uint8]()))
 
    return table
 
 
def main():
 
    var var_table = fill_super_minimal()
    alias alias_table = fill_super_minimal()
 
    for i in range(4):
        print(var_table[i], alias_table[i])
 

The expected output would be:

129 129
129 129
129 129
129 129

But I get:

129 129
129 4294967169
4294967169 4294967169
1 1

System information

- What OS did you do install Mojo on ?
Ubuntu 22.04.4 LTS
- Provide version information for Mojo by pasting the output of `mojo -v`
mojo 24.4.0 (2cb57382)
- Provide Modular CLI version by pasting the output of `modular -v`
modular 0.8.0 (39a426b5)
@fnands fnands added bug Something isn't working mojo-repo Tag all issues with this label labels Jun 17, 2024
@fnands
Copy link
Author

fnands commented Jun 17, 2024

Just tested on nightly and behaviour is identical.
Nightly version: mojo 2024.6.1705 (79838f00)

@soraros
Copy link
Contributor

soraros commented Jun 17, 2024

Related to #2860 and #3045.

The 1 is caused by implicit conversion. UInt32(...) doesn't do what you think it does.

@fnands
Copy link
Author

fnands commented Jun 17, 2024

So just to be clear:

The var vs alias differences seem to only appear when reading a value from a table.

So there might be two issues: the table one and the int() one.

@soraros
Copy link
Contributor

soraros commented Jun 17, 2024

Smaller repro:

fn main():
    print(int(UInt8(128)))  # prints -128

It's very curious that the following runs fine:

fn main():
  var n = UInt8(128)
  print(n)       # prints 128
  print(int(n))  # prints 128

@fnands
Copy link
Author

fnands commented Jun 17, 2024

Smaller repro:

fn main():
    print(int(UInt8(128)))  # prints -128

Similarly, running:

  var a: UInt32 = 129

  print(int(a.cast[DType.uint8]()))

in the REPL returns 129.

Running:

def main():
    var a: UInt32 = 129

    print(int(a.cast[DType.uint8]()))

main()

In the REPL returns -127

martinvuyk added a commit to martinvuyk/mojo that referenced this issue Jul 3, 2024
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
@martinvuyk
Copy link
Contributor

I have a bit of a quick and dirty solution proposal for this in #3172 . I would love some help since MLIR won't let me do rebind to another type and I'm out of my depth when it comes to IR and compiler stuff. As far as I understand we only have access to pop and index dialect and we'd need a way to bitcast the value itself and I've only found __mlir_op.pop.pointer.bitcast which I imagine only works in the heap
@soraros any idea who in the Mojo team could jump in here?

martinvuyk added a commit to martinvuyk/mojo that referenced this issue Jul 10, 2024
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
@soraros
Copy link
Contributor

soraros commented Jul 12, 2024

@soraros any idea who in the Mojo team could jump in here?

IDK. Let's just summon the all knowing @JoeLoser.

@laszlokindrat
Copy link
Contributor

I'm taking a look at this now:

def main():
 
    var b: UInt32 = 129
    print(b.cast[DType.uint8]())
    print(int(b.cast[DType.uint8]()))
    print(UInt32(b.cast[DType.uint8]()))

There is something flat out wrong with the conversion in print(int(b.cast[DType.uint8]())), but we can probably fix it easily. The other one, however, is actually due to an implicit bool conversion: print(UInt32(b.cast[DType.uint8]())). I'm looking at options right now for this.

@martinvuyk
Copy link
Contributor

print(UInt32(b.castDType.uint8)). I'm looking at options right now for this.

In PR #3172 I thought about adding a constructor that casts WDYT?

    fn __init__[A: DType, //](inout self, value: SIMD[A, size]):
        """Cast the other SIMD vector into self.

        Parameters:
            A: The DType of the other SIMD.

        Args:
            value: The value to cast into self.
        """
        self = value.cast[type]()

@laszlokindrat
Copy link
Contributor

In PR #3172 I thought about adding a constructor that casts WDYT?

    fn __init__[A: DType, //](inout self, value: SIMD[A, size]):
        """Cast the other SIMD vector into self.

        Parameters:
            A: The DType of the other SIMD.

        Args:
            value: The value to cast into self.
        """
        self = value.cast[type]()

I think we should make casting like this explicit. One way to do that is to make an always-failing constructor. I'm working on that now.

@laszlokindrat laszlokindrat self-assigned this Jul 12, 2024
@soraros
Copy link
Contributor

soraros commented Jul 12, 2024

@laszlokindrat I had the idea of adding a all-failing constructor to catch this case. However, with implicit conversion to Bool, it's still less ideal than what we had before. The LSP can't warn us about mismatched types:

fn f(x: SIMD) -> SIMD:
  return x  # We can only find out about the dtype/shape mismatch after the first compiling attempt

@soraros
Copy link
Contributor

soraros commented Jul 14, 2024

The implicit casting part is traced in #3045.

@laszlokindrat
Copy link
Contributor

@laszlokindrat I had the idea of adding a all-failing constructor to catch this case. However, with implicit conversion to Bool, it's still less ideal than what we had before. The LSP can't warn us about mismatched types:

fn f(x: SIMD) -> SIMD:
  return x  # We can only find out about the dtype/shape mismatch after the first compiling attempt

I agree this is not optimal, but the LSP currently has a general limitation that it doesn't work with constrained. IMO That's what really should be fixed, in conjunction with a tighter story around constraints.

modularbot pushed a commit that referenced this issue Jul 17, 2024
The `pop.cast` op would sign extend when casting a smaller unsigned type
to the `index` type, which would cause incorrect behavior for
`int(UInt8(128))` and similar code. This patch fixes that by first
upcasting to a larger unsigned scalar value and then converting to `Int`.

Fixes #3065

MODULAR_ORIG_COMMIT_REV_ID: 111bb5545c02663ac0e7bf7cdb3678ea23c261fd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working mojo-repo Tag all issues with this label
Projects
None yet
Development

No branches or pull requests

4 participants