Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(dev2.0) strange behaviour of printf "%x" #1530

Closed
tomcl opened this issue Aug 21, 2018 · 26 comments
Closed

(dev2.0) strange behaviour of printf "%x" #1530

tomcl opened this issue Aug 21, 2018 · 26 comments

Comments

@tomcl
Copy link
Contributor

tomcl commented Aug 21, 2018

Description

printf "%x" sometimes prints 32 it ints wrong

Repro code

    let xs = (1 <<< 31) 
    let xu = (1u <<< 31)
    printfn "(1 <<< 31) signed: %x (%d); (1 <<< 31) unsigned: %x (%d)" xs xs xu xu

Expected and actual results

expected: %x is 0x80000000 for both signed and unsigned
actual: %x prints as ff-7f000000 both signed and unsigned (wrong). %d is -2147483648 both signed and unsigned (correct).

EDIT: %d for unsigned should be 2147483648 according to the spec, so this is also unexpected!

Related information

  • Fable version (dotnet fable --version): 2.00-beta-001
  • Operating system: windows

NB - 2.00 is significantly better than 1.37 on this test, since the decimal printed values are correct now, and the signed and unsigned hex values are the same. However, they are still not quite right!

@xdaDaveShaw
Copy link
Contributor

I get the following in FSI:

(1 <<< 31) signed: 80000000 (-2147483648); (1 <<< 31) unsigned: 80000000 (2147483648)

Above it says that the expectation is:

(1 <<< 31) signed: 80000000 (-2147483648); (1 <<< 31) unsigned: 80000000 (-2147483648)

Note that the second %d is negative.

Is there a reason that it wouldn't be the same as FSI?

The actual I get in REPL2 is:

(1 <<< 31) signed: ff-7f000000 (-2147483648); (1 <<< 31) unsigned: ff-7f000000 (-2147483648)

@tomcl
Copy link
Contributor Author

tomcl commented Aug 21, 2018

Thanks Dave,

The %d does not worry me so much. It could be interpreted as %d always printing signed and doing an unsigned to signed coercion when given an unsigned parameter - printf does do such things.

OTOH you are right the %d printf spec for F# says:

Formats any basic integer type formatted as a decimal integer, signed if the basic integer type is signed.

However the %x is actually very confusing, as well as wrong.

I'll edit the OP to note the unsigned %d error.

@xdaDaveShaw
Copy link
Contributor

xdaDaveShaw commented Aug 21, 2018

I'm guessing someone needs to find the implementation of .NET's "x" format and implement it in Java Script, Not sure if this is in mscorlib or inside the CLR though.

The current fable implementation is here:

function toHex(value: number) {
return value < 0
? "ff" + (16777215 - (Math.abs(value) - 1)).toString(16)
: value.toString(16);
}

@tomcl
Copy link
Contributor Author

tomcl commented Aug 21, 2018

The %d problem is more difficult to fix - I think it requires printf parameters to be processed differently according to whether they are signed or unsigned. That could be got round for 32 bit if these are always sign extended for signed, and zero extended for unsigned. But it would remain for 64 bit.

The %x problem could be fixed for %x on its own quickly by changing the hex converter function toHex.

A complete solution would be much more complex because would have to implement all the width characters for %x and %d (the two could however be done uniformly). Personally I find things like %08x pretty useful and lots of stuff will print wonky if %6d etc do not work!

Do you think a partial solution, with common width characters implemented, would be useful?

@alfonsogarciacaro
Copy link
Member

The toHex function was added by a contributor a while ago and we can change it if there's a better alternative. Do you know any? (There are plenty here but I'm not sure which is the best one).

About %d we're not doing any actual formatting. Although the problem seems to happen because we're not discriminating shift operations between signed and unsigned integers, we'll probably need to add a check somehow.

@tomcl
Copy link
Contributor Author

tomcl commented Aug 22, 2018

The code now seems much more difficult to break than 1.37 was! But I don't understand quite how you are internally implementing different widths and signedness of numbers.

One inconsistency with dotnet F# is that conversion of negative numbers from signed to unsigned results in 0, whereas it should result in a large unsigned number: (-1 |> int16 |> uint16) = 16383.

However I suspect this may be difficult to track properly, and other than this things seem pretty good.

Suggestion for toHex

this is always supposed to be unsigned, so I suggest

function toHex(value: number) {
  value.toString(16)
}

That would I think always be better than the current version.

@alfonsogarciacaro
Copy link
Member

Number conversions that were mainly implemented by @ncave generally work. -1 |> int16 |> uint16 gives 65535 both in Fable REPL2 and FSI. I'm surrounding now left shifts for unsigned integers with >>> 0 and it seems to fix this case. However, replacing toHex as suggested yields:

(1 <<< 31) signed: -80000000 (-2147483648); (1 <<< 31) unsigned: 80000000 (2147483648)

Which is not right yet (the first -80000000 is prefixed with unary minus), and it also makes this test fail.

@xdaDaveShaw
Copy link
Contributor

I suppose it comes down to the question, do we want x formatting to match .NET or JavaScript?

I can see both sides...

Using JS version

If I write and test in F#, I get different results at runtime.

Using .NET version

Lots of effort to port chunks of the BCL to JavaScript.

I guess it could be one of the caveats mentioned in the Docs?

@alfonsogarciacaro
Copy link
Member

In general, if it's easy to match the F#/.NET semantics we try to do it, but if it takes too much work or makes fable-core size explode we usually settle with a good approximation. String formatting is not matching the F#/.NET specs in full so yes, we can add this caveat to the docs. Besides that, I'm open to modify toHex if there's a better alternative. Would value.toString(16) be preferred over the current implementation?

@xdaDaveShaw
Copy link
Contributor

Given the current implementation doesn't match either native JS or .NET implementation, it is its own thing, and reverting to native JS is easy enough, I'd suggest we do that and update the docs.

I can submit a PR tonight if no one else does before then.

@alfonsogarciacaro
Copy link
Member

That'd be awesome @xdaDaveShaw, thank you! Could you please also update the tests accordingly? (StringTests.fs)

@tomcl
Copy link
Contributor Author

tomcl commented Aug 22, 2018

From my POV we should at least have something compatible with bitwise operations, so that a 32 bit two's complement number will display correctly as hex digits. That is what NET does and it is just common sense. The code for this is very easy for an unsigned 32 bit number:

(x >>> 0).toString(16).toUpperCase() (the toUpperCase is provided elsewhere in the print code)

In .NET signed int32 displays as its unsigned bit equivalent, and that can also be got from native JS print by an F# conversion (xs |> uint32).

I'd agree with Dave's solution, simply because beyond 32 bits there is not a nice way to solve this problem in general, we do not know whether the number is supposed to be int32 or int64 and hence how many leading '1's - and for int64 thinking about bits (and hence hex representation) does not work for large numbers (abs value > 2^52 bits). These get approximated by JS anyway.

So if JS native is a - sign (for negative numbers) followed by the hex representation of the abs value then that will be usable and also not confusing. We need a caveat in the docs:

(1) that int64 integers lose precision above 2^53 (or whatever is the exact limit)
(2) that %x will display as sign and magnitude for negative numbers.

I'll update the documentation with a full description of the consequences of using JS FP numbers, and how conversions do/don't work (e.g. the fact that signed to unsigned of larger size zeros negative numbers). Which branch of the repo should I patch?

@tomcl
Copy link
Contributor Author

tomcl commented Aug 22, 2018

Re the documentation, this definitely needs updating since I still do not quite understand what is what:

All numeric types including decimal become JS number (64-bit floating type), except for int64, uint64 and bigint. From: compatibility.md.

That makes sense, since int64, uint64 don't fit in Number. But, in that case (1) how are they represented and (2) what does printf do with them? I got the impression they were coded as Number (losing precision for larger values). In which case the documentation is currently wrong? Or, if it is right, it needs a few more details to clarify.

Should these necessary details be documented under compatibility.md, or in a link from compatibility.md, or somewhere else?

PS from my tests it certainly looks as though int64, uint64 are encoded in number since i can see the prevision loss for larger numbers.

@alfonsogarciacaro
Copy link
Member

Thanks a lot for offering your help to update the documentation @tomcl. If you do it it's probably better to use the dev2.0 branch (which is now the default branch). Before that, it'd be great if you could PR some samples with cases where there's difference between JS and .NET so we can identify them and see if it's possible to fix them in a future update.

Actually int64 is the only case where we're using a different structure than JS numbers to better reflect .NET semantics (this was also contributed by @ncave and it's very important to correctly read assemblies in the REPL). It's in fable-core/Long.js module, and it already includes a toString method with different radices, I just forgot to call it from toHex 😅 Your proposal seems to work fine, maybe something like this could cover most of the cases:

import Long, { toString as longToString } from "./Long";

function toHex(x) {
    return x instanceof Long ? longToString(x, 16) : (x >>> 0).toString(16);
}

@tomcl
Copy link
Contributor Author

tomcl commented Aug 22, 2018

OK - great! So in fact .Net compatibility is pretty good - the 64/32 bit thing can be determined dynamically from types provided. When I've got straight in my mind what is supposed to be, I'll check whether I can find any corner cases with semantics different from .NET. I think having working hex printout will make that a lot easier!

I'd dearly like an automatic testbench that runs identical code fragments via FABLE and .NET comparing results...

i'm very impressed with fable 2.0 - it looks like it has cleaner semantics that Fable 1 as well as being faster.

I noted when porting 5K lines of code from 1.37 that a whole load of jsinterop stuff with dynamic types interfacing to browser/electron needed minor rewriting to pass compiler, not in a bad way, but maybe systematic enough to give a guide. Is this what everyone finds or just me?

@matthid
Copy link
Contributor

matthid commented Aug 22, 2018

In general, if it's easy to match the F#/.NET semantics we try to do it, but if it takes too much work or makes fable-core size explode we usually settle with a good approximation.

Couldn't fable tell us by emitting a warning somehow when using such APIs?

@tomcl
Copy link
Contributor Author

tomcl commented Aug 22, 2018

OK, so here are the expected (from fsi) .NET numeric conversions, and what fable2.0 does.

(-1 |> any unsigned numeric type) should be 2^N -1 where N is type width.
(2^N-1 in any unsigned type width N -> any signed type) should be -1.

examples from FSI:

(-1 |> char |> uint) = 16383u
(-1 |> char |>int) = 16383 (char is unsigned 16 bit, so that conversion determines value)
(-1 |> uint64) = 18446744073709551615uL
(-1L |> uint64) = 18446744073709551615uL
(18446744073709551615uL |> int64) = -1L
(18446744073709551615uL |> int32) = -1
(-1 |> uint32) = 4294967295u
(4294967295u |> int) = -1

Basically, in .NET, -1 is preserved to and from numeric types, although it looks like 2^N-1 in an unsigned type of length N. this is pretty clean.

Fable2 gives answers that depend on BOTH source and result type. Notable from repl2 is:

uint64 -> int64 which seems to go via uint.
int -> uint64 negative numbers set to 0L

(-1 |> int32) -4294967296 (correct)
(-1 |> uint32) = 4294967295u (correct - and this is the most important one!)
(4294967295u |> int32) = -1 (correct, also important)
(-1L |> uint64) = 18446744069414584320uL (correct)
(18446744069414584320uL |> int64) = -4294967296L (incorrect)
(-1 |> uint64) = 0uL (incorrect)

conversions from char to int32 or uint32 go wrong in a not easy to understand way:

(-1 |> char |> uint32) = -4294967295u  (incorrect)
(-1 |> char |> int32) = -4294967296 (incorrect)

Any conversion between int64 or uint64 and some other integer is not JS standard, since 64 bit types are custom, so we should do these like .NET I think?

char does not seem to exist in JS. Characters can be converted to unicode codes which I think are normally 16 bit unsigned. So the FS conversions to char should result in positive Number values. Maybe -1 is converted to char as an all ones bit pattern which gets converted to Number as +/- 2^N-1 rather than -1 as it should be?

What I don't understand in Fable is when are int32/uint32 held as Number, and when are they held as 32 bits (as they will be converted after any bitwise op)?

@tomcl
Copy link
Contributor Author

tomcl commented Aug 22, 2018

Back to @alfonsogarciacaro proposal re %x. I agree that it makes sense and is understandable if:

int32,uint32 is done via JS .toString(16) (printing as the abs value converted to hex with - in front for negative numbers).

int64,uint64 is done as in the already written in long.js .toString(16) function which seems to do things the same way as JS standard, hex with - for negative numbers.

%x checks type for 64 bit and does the 64 bit or 32 bit version.

That is very simple, and also would make it easier to work out what is going on with all the other conversions.

:)

@xdaDaveShaw
Copy link
Contributor

I'm beginning to work on some tests for %x and will open a PR soon.

For the Docs around the numeric conversions @tomcl is talking about, would it make sense to open a separate issue?

@tomcl
Copy link
Contributor Author

tomcl commented Aug 22, 2018

good idea, done that #1532

@alfonsogarciacaro
Copy link
Member

Just a quick update to let you know that dev2.0 branch has become master now :)

@alfonsogarciacaro
Copy link
Member

@matthid We already tried to do that in most cases though it's tricky to find a right balance (e.g. it may not make sense to issue a warning every time you use a decimal to say it's being compiled to JS number so it may lose precision in some situations, although we do that when you explicitly convert a float into decimal). We're open to PRs for the places you think this can be improved :)

@xdaDaveShaw
Copy link
Contributor

I made some progress last night, but noticed some other issues with Hex Formatting (String.Format doesn't work on Long for example). I'll continue with it tonight.

@alfonsogarciacaro
Copy link
Member

@xdaDaveShaw Please check this comment for an example on how to make formatting (toHex in this case) work with Long 👍

@xdaDaveShaw
Copy link
Contributor

I saw that and kept it in mind, however, I had yet to write a failing test that needed longToString to get the correct value.

I'll submit the PR tonight and we can see where it's at.

@alfonsogarciacaro
Copy link
Member

Let's close this as it should be fixed by @xdaDaveShaw #1535 PR. Let's deal with the conversion issues in #1532. Thanks a lot you all for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants