New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve String.toInt and String.toFloat #818

Merged
merged 6 commits into from Jan 23, 2017

Conversation

Projects
None yet
7 participants
@evancz
Member

evancz commented Jan 20, 2017

With some parser work I’m doing, I wanted to expand the set of numbers that Elm can easily turn into Int and Float values.

Changes

  • String.toFloat accepts scientific notation like 1e+40
  • String.toInt accepts hex like 0xBEEF
  • Added a bunch of tests for these things

evancz added some commits Jan 20, 2017

Redo String.toInt and String.toFloat
Two main changes:

  - `String.toFloat` accepts scientific notation like `1e+40`
  - `String.toInt` accepts hex like `0xBEEF`
@process-bot

This comment has been minimized.

Show comment
Hide comment
@process-bot

process-bot Jan 20, 2017

Thanks for the pull request! Make sure it satisfies this checklist. My human colleagues will appreciate it!

Here is what to expect next, and if anyone wants to comment, keep these things in mind.

process-bot commented Jan 20, 2017

Thanks for the pull request! Make sure it satisfies this checklist. My human colleagues will appreciate it!

Here is what to expect next, and if anyone wants to comment, keep these things in mind.

Show outdated Hide outdated src/Native/String.js
{
if (len === 1)
// must be hex
if (s[1] !== 'x')

This comment has been minimized.

@brian-carroll

brian-carroll Jan 20, 2017

If this is a number of minutes that I've extracted from a time-formatted string, like "09", it will fail. Is that desired behaviour?

@brian-carroll

brian-carroll Jan 20, 2017

If this is a number of minutes that I've extracted from a time-formatted string, like "09", it will fail. Is that desired behaviour?

@mbylstra

This comment has been minimized.

Show comment
Hide comment
@mbylstra

mbylstra Jan 20, 2017

Any chance you could also accept binary literals? eg 0b1010101 or 0B1010101. These are really helpful if you have to deal with parsing or generating binary file formats. Eg: If I just want the last 5 bits of a byte:

Bitwise.and myByte (String.toInt "0b11111")

With hex it's

Bitwise.and myByte 0x1F

which is far less readable. Parsing binary formats is really hard - this makes it a little easier!

mbylstra commented Jan 20, 2017

Any chance you could also accept binary literals? eg 0b1010101 or 0B1010101. These are really helpful if you have to deal with parsing or generating binary file formats. Eg: If I just want the last 5 bits of a byte:

Bitwise.and myByte (String.toInt "0b11111")

With hex it's

Bitwise.and myByte 0x1F

which is far less readable. Parsing binary formats is really hard - this makes it a little easier!

, goodFloat "6.022e+23" 6.022e23
, badFloat "6.022e"
, badFloat "6.022n"
, badFloat "6.022.31"

This comment has been minimized.

@eeue56

eeue56 Jan 21, 2017

Contributor

🎨 this is a good place for having fuzz tests, e.g

fuzz int "All ints are correctly converted to a string and back" (\int ->
  Expect.equal (Ok int) (String.toInt (toString int))
@eeue56

eeue56 Jan 21, 2017

Contributor

🎨 this is a good place for having fuzz tests, e.g

fuzz int "All ints are correctly converted to a string and back" (\int ->
  Expect.equal (Ok int) (String.toInt (toString int))
Show outdated Hide outdated src/Native/String.js
}
return _elm_lang$core$Result$Ok(parseFloat(s));
var n = +s;
return n === n ? _elm_lang$core$Result$Ok(n) : floatErr(s);

This comment has been minimized.

@eeue56

eeue56 Jan 21, 2017

Contributor

💭 Why use this over isNaN? is it meant to be faster? If so, a comment might be worthwhile here since it's not obvious

@eeue56

eeue56 Jan 21, 2017

Contributor

💭 Why use this over isNaN? is it meant to be faster? If so, a comment might be worthwhile here since it's not obvious

Show outdated Hide outdated src/Native/String.js
}
var dotCount = 0;
for (var i = start; i < len; ++i)
if (s.length === 0 || /[\sxbo]/.test(s))

This comment has been minimized.

@eeue56

eeue56 Jan 21, 2017

Contributor

🎨 might be worth having a comment stating what the regex test does here (as it's not obvious unless you think about it/are familiar with non-decimal number strings)

@eeue56

eeue56 Jan 21, 2017

Contributor

🎨 might be worth having a comment stating what the regex test does here (as it's not obvious unless you think about it/are familiar with non-decimal number strings)

This comment has been minimized.

@crazymykl

crazymykl Jan 23, 2017

Also, many languages (e. g. JavaScript, Ruby) allow capital letters for indicating base, 0XF and similar.

@crazymykl

crazymykl Jan 23, 2017

Also, many languages (e. g. JavaScript, Ruby) allow capital letters for indicating base, 0XF and similar.

@justinmimbs

This comment has been minimized.

Show comment
Hide comment
@justinmimbs

justinmimbs Jan 21, 2017

The toInt function could be simplified with regexes. For example, the version below performs the same checks (but allows strings with leading zeros), and is a bit faster in some brief timing tests.

I think @brian-carroll brings up a good point. I would want "09" to parse without error. 👍

function toInt(s)
{
  // decimal
  if (/^[+-]?\d+$/.test(s))
  {
    return _elm_lang$core$Result$Ok(parseInt(s, 10));
  }

  // hex
  if (/^0[Xx][\dA-Fa-f]+$/.test(s))
  {
    return _elm_lang$core$Result$Ok(parseInt(s, 16));
  }

  return intErr(s);
}

(Edited hex regex to allow "0X" prefix, as pointed out by @crazymykl.)

justinmimbs commented Jan 21, 2017

The toInt function could be simplified with regexes. For example, the version below performs the same checks (but allows strings with leading zeros), and is a bit faster in some brief timing tests.

I think @brian-carroll brings up a good point. I would want "09" to parse without error. 👍

function toInt(s)
{
  // decimal
  if (/^[+-]?\d+$/.test(s))
  {
    return _elm_lang$core$Result$Ok(parseInt(s, 10));
  }

  // hex
  if (/^0[Xx][\dA-Fa-f]+$/.test(s))
  {
    return _elm_lang$core$Result$Ok(parseInt(s, 16));
  }

  return intErr(s);
}

(Edited hex regex to allow "0X" prefix, as pointed out by @crazymykl.)

@mbylstra

This comment has been minimized.

Show comment
Hide comment
@mbylstra

mbylstra Jan 21, 2017

Something to keep in mind with parsing numbers like 09 is that a leading zero is often used as syntax for Octal representations. I'm not saying 09 should be parsed as an Octal number, it's just something to consider.

mbylstra commented Jan 21, 2017

Something to keep in mind with parsing numbers like 09 is that a leading zero is often used as syntax for Octal representations. I'm not saying 09 should be parsed as an Octal number, it's just something to consider.

evancz added some commits Jan 23, 2017

Be permissive with leading zeros in toInt
(This is primarily so that the new changes are fully backwards
compatible, not because I think this is ideal. But this way it can come
out as a patch release.)
@evancz

This comment has been minimized.

Show comment
Hide comment
@evancz

evancz Jan 23, 2017

Member

The alternate toInt implementation is not faster by my tests. The results reverse by swapping the order of the tests, so I suspect this has more to do with "VM warm up" than the actual details.

About parsing 0b1111, the code you write does not type check. toInt produces a Result. It'd be better to just allow binary digit literals, but that's a separate discussion that I do not want to have right now.

About leading zeros, I only know of JS making that octal, and it's getting phased out. I would prefer not to allow "too many" zeros at the front, but to be backwards compatible and release this as a core patch, I need to keep allowing it for now.

I've responded to other things with some code changes.

Member

evancz commented Jan 23, 2017

The alternate toInt implementation is not faster by my tests. The results reverse by swapping the order of the tests, so I suspect this has more to do with "VM warm up" than the actual details.

About parsing 0b1111, the code you write does not type check. toInt produces a Result. It'd be better to just allow binary digit literals, but that's a separate discussion that I do not want to have right now.

About leading zeros, I only know of JS making that octal, and it's getting phased out. I would prefer not to allow "too many" zeros at the front, but to be backwards compatible and release this as a core patch, I need to keep allowing it for now.

I've responded to other things with some code changes.

@evancz evancz merged commit f2c29f5 into master Jan 23, 2017

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
@evancz

This comment has been minimized.

Show comment
Hide comment
@evancz

evancz Jan 23, 2017

Member

Thanks for the reviews everyone!

Member

evancz commented Jan 23, 2017

Thanks for the reviews everyone!

@evancz evancz deleted the string-to-number branch Mar 7, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment