Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MySQL driver cannot handle bigint unsigned in execute( ) #233

Closed
bbkr opened this issue Aug 12, 2022 · 14 comments
Closed

MySQL driver cannot handle bigint unsigned in execute( ) #233

bbkr opened this issue Aug 12, 2022 · 14 comments

Comments

@bbkr
Copy link

bbkr commented Aug 12, 2022

CREATE TABLE foo ( id bigint unsigned ) engine=InnoDB;
INSERT INTO foo ( id ) VALUES ( 18446744073709551615 );    -- 2^64 -1

This value can be SELECTed without any issues:

my $id = $connection.execute( 'SELECT id FROM foo' ).allrows( )[ 0 ][ 0 ];
say $id;

will print 18446744073709551615

But somehow cannot be used as execute() argument:

$connection.execute( 'SELECT id FROM foo WHERE id = ?', $id );
Cannot unbox 64 bit wide bigint into native integer
  in block  at /home/bbkr/.raku/sources/A15CA24FF2518653680BE63142CB6661532CAA84 (DBDish::mysql::StatementHandle) line 119
  in block  at /home/bbkr/.raku/sources/A15CA24FF2518653680BE63142CB6661532CAA84 (DBDish::mysql::StatementHandle) line 114
  in method execute at /home/bbkr/.raku/sources/A15CA24FF2518653680BE63142CB6661532CAA84 (DBDish::mysql::StatementHandle) line 111
  in method execute at /home/bbkr/.raku/sources/B092781CCB9B75F51B0389A686A61C391F70F118 (DBDish::mysql::Connection) line 57
@bbkr bbkr changed the title MySQL driver cannot INSERT bigint unsigned MySQL driver cannot handle bigint unsigned in execute( ) Aug 12, 2022
@bbkr
Copy link
Author

bbkr commented Aug 12, 2022

https://github.com/raku-community-modules/DBIish/blob/main/lib/DBDish/mysql/StatementHandle.rakumod#L119

This logic is not valid, int64 has one bit less and cannot store maximum unsigned value.

I'm not sure what is the best fix without performance penalty:

when Int  {
    $st = MYSQL_TYPE_LONGLONG;
    $_ > 0 ?? Blob[uint64].new($_) !! Blob[int64].new($_);
}

?

rbt added a commit that referenced this issue Nov 23, 2022
Values over 2^63 were not handled correctly. We could try uint64 or int128 for handling some larger values, but very very large values will still need to be treated as a string. Since the vast majority of integer values tend to be small, just treat anything outside int64 as a string.

Fixes #233
@rbt rbt closed this as completed in 642c062 Nov 23, 2022
@bbkr
Copy link
Author

bbkr commented Nov 24, 2022

I think this needs proper solution. Current one is doing wrong typization for HALF of available range based on very risky assumption.

Yes, most autoincremented IDs are small. But bigints are also used as compact way to store binary flags for example. Or geo data. Driver should not manipulate data type like that. This is error prone and may create weird side effects.

@rbt
Copy link
Collaborator

rbt commented Nov 24, 2022

It's impossible to fix properly as MySQL doesn't have any way of handling numbers over 2**64 other than as a string (that I've found), so string needs to be the final fallback.

At best, we can improve 2^63 to 2^64.

So you prefer this coding?

if $_ > -2**63 and $_ < 2**63 {
    $st = MYSQL_TYPE_LONGLONG;
    Blob[int64].new($_);
} elsif $_ > 0 and $_ < 2**64 - 1 {
    $st = MYSQL_TYPE_LONGLONG;
    Blob[uint64].new($_);
} else {
    .Str.encode;
}

Also, can you provide a test case where it shows a change? If this has a semantic impact, we should have a test to ensure the whole BIGINT range has similar semantics.

The round-trip test I added to t/24-mysql-large-value.t doesn't detect any difference between a string encoded number and a LONGLONG encoded number.

@bbkr
Copy link
Author

bbkr commented Nov 24, 2022

The issue refers to biggest MySQL numeric type that MySQL can handle. IMO this else part is not test-coverable (or is it?) and simple if >0 then use uint64 else use int64 should be enough (and faster).

The functional difference is on data consumer side. One expects numeric type from numeric column. This value may get propagated for example incorrectly as string to JSON and crash some strong typed consumer of such payload. Round trip MySQL->MySQL does not show whole story.

In my particular case https://github.com/bbkr/UpRooted/blob/master/t/31-reader-mysql.rakutest#L42 - this hit me here when I was checking how dumped data is presented for consumption and was unable to truly test whole field range.

@rbt
Copy link
Collaborator

rbt commented Nov 24, 2022

The issue refers to biggest MySQL numeric type that MySQL can handle. IMO this else part is not test-coverable (or is it?) and simple if >0 then use uint64 else use int64 should be enough (and faster).

Unfortunately this is not adequate. Raku Ints such as 2^80 do not fit into either uint64 or MYSQL_TYPE_LONGLONG, and need to be transmitted as a string. The MySQL numeric type can store much larger numbers than BigInt, and even if they couldn't this generating a suitable error message (not a complaint about unboxing) requires more logic.

The functional difference is on data consumer side. One expects numeric type from numeric column.

Agreed, and everything in the test I added all values are returned as a Rat as expected from a Numeric column.

If you can identify, and we code around a semantic difference, we need to add a DBIish test for it to ensure it doesn't change in the future.

@rbt
Copy link
Collaborator

rbt commented Nov 24, 2022

In my particular case https://github.com/bbkr/UpRooted/blob/master/t/31-reader-mysql.rakutest#L42 - this hit me here when I was checking how dumped data is presented for consumption and was unable to truly test whole field range.

This appears to be a store and read round-trip? You'll get back an Int as the Raku datatype from a MySQL bigint column regardless of whether it's transmitted across the protocol as a fixed-length 8 bytes or as a Unicode encoded string.

@bbkr
Copy link
Author

bbkr commented Nov 24, 2022

OK, then the fastest way seems to be (also with corrected range edges):

if -2**63 <= $_ <= 2**63  -1 {      # most stuff goes here
    $st = MYSQL_TYPE_LONGLONG;
    Blob[int64].new($_);
}
elsif 2**63 <= $_ <= 2**64 - 1 {     # half of unsigned int range goes here and does not need costly String conversion
    $st = MYSQL_TYPE_LONGLONG;
    Blob[uint64].new($_);
} else {    # super rare cases in real world use
    .Str.encode;
}

That is mix of your and mine solution that handles most common cases cheaply, does not have string casting penalty for unsigned int top range and has fallback for overflows. Am I correct?

@bbkr
Copy link
Author

bbkr commented Nov 24, 2022

BTW: My project does round trip when writing data here: https://github.com/bbkr/UpRooted/blob/master/t/42-writer-mysql.rakutest#L28

So it is currently tested "up to int signed". I think your solution alone will work just fine, however with slight performance penalty mentioned in my previous post.

rbt added a commit that referenced this issue Nov 24, 2022
As per discussion in #233, this code should have better performance and fixes the boundaries (which I messed up).
@rbt
Copy link
Collaborator

rbt commented Nov 24, 2022

The code you suggest seems reasonable to me but it doesn't quite work.

In this branch:
ed66c7e

We get this test result:
https://github.com/raku-community-modules/DBIish/actions/runs/3541415105/jobs/5945651204#step:12:201

These values:

9223372036854775808
18446744073709551615

Come back as these values:

-1
-9223372036854775808

@bbkr
Copy link
Author

bbkr commented Nov 24, 2022

Weird, on logical level this fix should be correct (Blobs do not overflow). Haven't looked in the test itself yet (I'm still in my $dayjob).

@rbt
Copy link
Collaborator

rbt commented Nov 24, 2022

Are you sure MYSQL_TYPE_LONGLONG handles those values? I'm reading it as an int64_t .

@bbkr
Copy link
Author

bbkr commented Nov 24, 2022

@bbkr
Copy link
Author

bbkr commented Nov 24, 2022

So my fix proposal were based on wrong assumption, sorry about that. Looks like your version is the valid way to solve it and there will always be string casting performance penalty in upper bigint unsigned range.

rbt added a commit that referenced this issue Nov 24, 2022
rbt added a commit that referenced this issue Nov 24, 2022
@rbt
Copy link
Collaborator

rbt commented Nov 24, 2022

No worries. I'll merge in the lower boundary issue you identified as soon as the tests pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants