Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number to/from string API #6220

Closed
huonw opened this Issue May 3, 2013 · 10 comments

Comments

Projects
None yet
6 participants
@huonw
Copy link
Member

huonw commented May 3, 2013

This is an umbrella issue following on from/working with #4819. The number <-> string conversion needs work and thought.

For example, the current to_str version allocates, and the current from_str doesn't work with the compiler optimisations very well: none of the (for example) exponent parsing code is removed when the function is specialised for ints.

Another important thing to consider is how non-primitive numbers (e.g. bigint/rational) are to be supported/helped (including not at all, other than the normal to/from_str traits)

@huonw

This comment has been minimized.

Copy link
Member Author

huonw commented May 3, 2013

For comparison, glibc's scanf: floating point, and integers (these are entirely separate).

And printf: general number printing and specifically floating point (it actually uses (stack-allocated) GMP bignums). Notably the general number printing is one huge (500 line) macro that gets called twice.

@Aatch

This comment has been minimized.

Copy link
Contributor

Aatch commented May 3, 2013

Also, I have been working on a generic parser for printf-like format strings. The current grammar is this:

 Placeholder := Indicator Position? Flag* Width? Precision? Specifier
 Position := '{' '-'? [0-9]+ '}'
 Width := [1-9] | '[' ('-'? [0-9]+ | '*') ']'
 Precision := '.' ([1-9] | '[' ('-'? [0-9]+ | '*' ) ']')
 Flag := <From supplied set>
 Specifier := <From supplied set>

So thinks are specified roughly like: %{1} [2].9f, which is where the values is in argument 1 and the width is retrieved from argument 2.

@huonw

This comment has been minimized.

Copy link
Member Author

huonw commented May 4, 2013

A possibility for code sharing would be having a munch_number<Num: Integer>(&str) -> (Option<Num>, uint) function that parses a number at the start of the string, so from_str would do this then check that the whole string has been parsed.

This would allow float parsing to use the int parsing code to read subsections.

(This function might be actually useful for any from_str value.)

@Kimundi

This comment has been minimized.

Copy link
Member

Kimundi commented May 14, 2013

I started on writing specialised replacements for ints, uints and floats. Currently the new (u)int versions use a stack allocated vector and a callback function. For the float versions, I started looking into papers and the links here for an optimal algorithm. I will also look into making the number munching thing work.

@huonw

This comment has been minimized.

brendanzab added a commit to brendanzab/rust that referenced this issue Feb 19, 2014

Decouple integer formatting from std::num::strconv
This is working towards a complete rewrite and ultimate removal of the `std::num::strconv` module (see rust-lang#6220), and the removal of the `ToStrRadix` trait in favour of using the `std::fmt` functionality directly. This should make for a cleaner API, encourage less allocation, and make the implementation far more comprehensible.

The `Formatter::{pad_integral, with_padding}` methods have also been refactored make things easier to understand.

The formatting tests for integers have been moved out of `run-pass/ifmt.rs` in order to provide more immediate feedback when building using `make check-stage2-std NO_REBUILD=1`.

brendanzab added a commit to brendanzab/rust that referenced this issue Feb 19, 2014

Decouple integer formatting from std::num::strconv
This is working towards a complete rewrite and ultimate removal of the `std::num::strconv` module (see rust-lang#6220), and the removal of the `ToStrRadix` trait in favour of using the `std::fmt` functionality directly. This should make for a cleaner API, encourage less allocation, and make the implementation far more comprehensible.

The `Formatter::{pad_integral, with_padding}` methods have also been refactored make things easier to understand.

The formatting tests for integers have been moved out of `run-pass/ifmt.rs` in order to provide more immediate feedback when building using `make check-stage2-std NO_REBUILD=1`.

brendanzab added a commit to brendanzab/rust that referenced this issue Feb 19, 2014

Decouple integer formatting from std::num::strconv
This is working towards a complete rewrite and ultimate removal of the `std::num::strconv` module (see rust-lang#6220), and the removal of the `ToStrRadix` trait in favour of using the `std::fmt` functionality directly. This should make for a cleaner API, encourage less allocation, and make the implementation far more comprehensible.

The `Formatter::{pad_integral, with_padding}` methods have also been refactored make things easier to understand.

The formatting tests for integers have been moved out of `run-pass/ifmt.rs` in order to provide more immediate feedback when building using `make check-stage2-std NO_REBUILD=1`.

brendanzab added a commit to brendanzab/rust that referenced this issue Feb 19, 2014

Decouple integer formatting from std::num::strconv
This is working towards a complete rewrite and ultimate removal of the `std::num::strconv` module (see rust-lang#6220), and the removal of the `ToStrRadix` trait in favour of using the `std::fmt` functionality directly. This should make for a cleaner API, encourage less allocation, and make the implementation far more comprehensible.

The `Formatter::{pad_integral, with_padding}` methods have also been refactored make things easier to understand.

The formatting tests for integers have been moved out of `run-pass/ifmt.rs` in order to provide more immediate feedback when building using `make check-stage2-std NO_REBUILD=1`.

brendanzab added a commit to brendanzab/rust that referenced this issue Feb 19, 2014

Decouple integer formatting from std::num::strconv
This is working towards a complete rewrite and ultimate removal of the `std::num::strconv` module (see rust-lang#6220), and the removal of the `ToStrRadix` trait in favour of using the `std::fmt` functionality directly. This should make for a cleaner API, encourage less allocation, and make the implementation far more comprehensible.

The `Formatter::{pad_integral, with_padding}` methods have also been refactored make things easier to understand.

The formatting tests for integers have been moved out of `run-pass/ifmt.rs` in order to provide more immediate feedback when building using `make check-stage2-std NO_REBUILD=1`.

brendanzab added a commit to brendanzab/rust that referenced this issue Feb 19, 2014

Decouple integer formatting from std::num::strconv
This is working towards a complete rewrite and ultimate removal of the `std::num::strconv` module (see rust-lang#6220), and the removal of the `ToStrRadix` trait in favour of using the `std::fmt` functionality directly. This should make for a cleaner API, encourage less allocation, and make the implementation far more comprehensible.

The `Formatter::{pad_integral, with_padding}` methods have also been refactored make things easier to understand.

The formatting tests for integers have been moved out of `run-pass/ifmt.rs` in order to provide more immediate feedback when building using `make check-stage2-std NO_REBUILD=1`.

brendanzab added a commit to brendanzab/rust that referenced this issue Feb 21, 2014

Decouple integer formatting from std::num::strconv
This is working towards a complete rewrite and ultimate removal of the `std::num::strconv` module (see rust-lang#6220), and the removal of the `ToStrRadix` trait in favour of using the `std::fmt` functionality directly. This should make for a cleaner API, encourage less allocation, and make the implementation far more comprehensible.

The `Formatter::{pad_integral, with_padding}` methods have also been refactored make things easier to understand.

The formatting tests for integers have been moved out of `run-pass/ifmt.rs` in order to provide more immediate feedback when building using `make check-stage2-std NO_REBUILD=1`.

brendanzab added a commit to brendanzab/rust that referenced this issue Feb 21, 2014

Decouple integer formatting from std::num::strconv
This works towards a complete rewrite and ultimate removal of the `std::num::strconv` module (see rust-lang#6220), and the removal of the `ToStrRadix` trait in favour of using the `std::fmt` functionality directly. This should make for a cleaner API, encourage less allocation, and make the implementation far more comprehensible.

The `Formatter::pad_integral` method has also been refactored make it easier to understand.

The formatting tests for integers have been moved out of `run-pass/ifmt.rs` in order to provide more immediate feedback when building using `make check-stage2-std NO_REBUILD=1`.

The benchmarks have been standardised between std::num::strconv and std::num::fmt to make it easier to compare the performance of the different implementations.

Arbitrary radixes are now easier to use in format strings. For example:

~~~
assert_eq!(format!("{:04}", radix(3, 2)), ~"0011");
~~~

bors added a commit that referenced this issue Feb 22, 2014

auto merge of #12382 : bjz/rust/fmt-int, r=alexcrichton
This is PR is the beginning of a complete rewrite and ultimate removal of the `std::num::strconv` module (see #6220), and the removal of the `ToStrRadix` trait in favour of using the `std::fmt` functionality directly. This should make for a cleaner API, encourage less allocation, and make the implementation more comprehensible .

The `Formatter::{pad_integral, with_padding}` methods have also been refactored make things easier to understand.

The formatting tests for integers have been moved out of `run-pass/ifmt.rs` in order to provide more immediate feedback when building using `make check-stage2-std NO_REBUILD=1`.

Arbitrary radixes are now easier to use in format strings. For example:

~~~rust
assert_eq!(format!("{:04}", radix(3, 2)), ~"0011");
~~~

The benchmarks have been standardised between `std::num::strconv` and `std::num::fmt` to make it easier to compare the performance of the different implementations.

~~~
 type | radix | std::num::strconv      | std::num::fmt
======|=======|========================|======================
 int  | bin   | 1748 ns/iter (+/- 150) | 321 ns/iter (+/- 25)
 int  | oct   |  706 ns/iter (+/- 53)  | 179 ns/iter (+/- 22)
 int  | dec   |  640 ns/iter (+/- 59)  | 207 ns/iter (+/- 10)
 int  | hex   |  637 ns/iter (+/- 77)  | 205 ns/iter (+/- 19)
 int  | 36    |  446 ns/iter (+/- 30)  | 309 ns/iter (+/- 20)
------|-------|------------------------|----------------------
 uint | bin   | 1724 ns/iter (+/- 159) | 322 ns/iter (+/- 13)
 uint | oct   |  663 ns/iter (+/- 25)  | 175 ns/iter (+/- 7)
 uint | dec   |  613 ns/iter (+/- 30)  | 186 ns/iter (+/- 6)
 uint | hex   |  519 ns/iter (+/- 44)  | 207 ns/iter (+/- 20)
 uint | 36    |  418 ns/iter (+/- 16)  | 308 ns/iter (+/- 32)
~~~

jxs added a commit to jxs/rust that referenced this issue Apr 2, 2014

Decouple integer formatting from std::num::strconv
This works towards a complete rewrite and ultimate removal of the `std::num::strconv` module (see rust-lang#6220), and the removal of the `ToStrRadix` trait in favour of using the `std::fmt` functionality directly. This should make for a cleaner API, encourage less allocation, and make the implementation far more comprehensible.

The `Formatter::pad_integral` method has also been refactored make it easier to understand.

The formatting tests for integers have been moved out of `run-pass/ifmt.rs` in order to provide more immediate feedback when building using `make check-stage2-std NO_REBUILD=1`.

The benchmarks have been standardised between std::num::strconv and std::num::fmt to make it easier to compare the performance of the different implementations.

Arbitrary radixes are now easier to use in format strings. For example:

~~~
assert_eq!(format!("{:04}", radix(3, 2)), ~"0011");
~~~
@nrc

This comment has been minimized.

Copy link
Member

nrc commented May 4, 2014

I filed a vaguely related RFC for adding scanf-like functionality to Rust - rust-lang/rfcs#67

@aturon

This comment has been minimized.

Copy link
Member

aturon commented Aug 12, 2014

cc me

@gsingh93

This comment has been minimized.

Copy link
Contributor

gsingh93 commented Jan 20, 2015

Here's a library that's doing something like this: https://github.com/mahkoh/scan

@aturon

This comment has been minimized.

Copy link
Member

aturon commented Feb 16, 2015

@huonw Is there anything actionable here right now, or should this perhaps move to the RFCs repo?

@lifthrasiir lifthrasiir referenced this issue Apr 19, 2015

Merged

New floating-to-decimal formatting routine #24612

1 of 3 tasks complete

bors added a commit that referenced this issue May 9, 2015

Auto merge of #24612 - lifthrasiir:flt2dec, r=pnkfelix
This is a direct port of my prior work on the float formatting. The detailed description is available [here](https://github.com/lifthrasiir/rust-strconv#flt2dec). In brief,

* This adds a new hidden module `core::num::flt2dec` for testing from `libcoretest`. Why is it in `core::num` instead of `core::fmt`? Because I envision that the table used by `flt2dec` is directly applicable to `dec2flt` (cf. #24557) as well, which exceeds the realm of "formatting".
* This contains both Dragon4 algorithm (exact, complete but slow) and Grisu3 algorithm (exact, fast but incomplete).
* The code is accompanied with a large amount of self-tests and some exhaustive tests. In particular, `libcoretest` gets a new dependency on `librand`. For the external interface it relies on the existing test suite.
* It is known that, in the best case, the entire formatting code has about 30 KBs of binary overhead (judged from strconv experiments). Not too bad but there might be a potential room for improvements.

This is rather large code. I did my best to comment and annotate the code, but you have been warned.

For the maximal availability the original code was licensed in CC0, but I've also dual-licensed it in MIT/Apache as well so there should be no licensing concern.

This is [breaking-change] as it changes the float output slightly (and it also affects the casing of `inf` and `nan`). I hope this is not a big deal though :)

Fixes #7030, #18038 and #24556. Also related to #6220 and #20870.

## Known Issues

- [x] I've yet to finish `make check-stage1`. It does pass main test suites including `run-pass` but there might be some unknown edges on the doctests.
- [ ] Figure out how this PR affects rustc.
- [ ] Determine which internal routine is mapped to the formatting specifier. Depending on the decision, some internal routine can be safely removed (for instance, currently `to_shortest_str` is unused).

bors added a commit that referenced this issue May 9, 2015

Auto merge of #24612 - lifthrasiir:flt2dec, r=pnkfelix
This is a direct port of my prior work on the float formatting. The detailed description is available [here](https://github.com/lifthrasiir/rust-strconv#flt2dec). In brief,

* This adds a new hidden module `core::num::flt2dec` for testing from `libcoretest`. Why is it in `core::num` instead of `core::fmt`? Because I envision that the table used by `flt2dec` is directly applicable to `dec2flt` (cf. #24557) as well, which exceeds the realm of "formatting".
* This contains both Dragon4 algorithm (exact, complete but slow) and Grisu3 algorithm (exact, fast but incomplete).
* The code is accompanied with a large amount of self-tests and some exhaustive tests. In particular, `libcoretest` gets a new dependency on `librand`. For the external interface it relies on the existing test suite.
* It is known that, in the best case, the entire formatting code has about 30 KBs of binary overhead (judged from strconv experiments). Not too bad but there might be a potential room for improvements.

This is rather large code. I did my best to comment and annotate the code, but you have been warned.

For the maximal availability the original code was licensed in CC0, but I've also dual-licensed it in MIT/Apache as well so there should be no licensing concern.

This is [breaking-change] as it changes the float output slightly (and it also affects the casing of `inf` and `nan`). I hope this is not a big deal though :)

Fixes #7030, #18038 and #24556. Also related to #6220 and #20870.

## Known Issues

- [x] I've yet to finish `make check-stage1`. It does pass main test suites including `run-pass` but there might be some unknown edges on the doctests.
- [ ] Figure out how this PR affects rustc.
- [ ] Determine which internal routine is mapped to the formatting specifier. Depending on the decision, some internal routine can be safely removed (for instance, currently `to_shortest_str` is unused).
@huonw

This comment has been minimized.

Copy link
Member Author

huonw commented Jan 5, 2016

I think this is basically solved/irrelevant, e.g. #24612 & #30175 makes us handle floats right, and the new formatting infrastructure allows avoiding allocations (i.e. we have more than just the old to_str). I think other aspects are better handled elsewhere (e.g. RFC repo, or an external crate).

@huonw huonw closed this Jan 5, 2016

nivkner added a commit to nivkner/rust that referenced this issue Sep 30, 2017

address some `FIXME`s whose associated issues were marked as closed
remove FIXME(rust-lang#13101) since `assert_receiver_is_total_eq` stays.
remove FIXME(rust-lang#19649) now that stability markers render.
remove FIXME(rust-lang#13642) now the benchmarks were moved.
remove FIXME(rust-lang#6220) now that floating points can be formatted.
remove FIXME(rust-lang#18248) and write tests for `Rc<str>` and `Rc<[u8]>`
remove reference to irelevent issues in FIXME(rust-lang#1697, rust-lang#2178...)
update FIXME(rust-lang#5516) to point to getopts issue 7
update FIXME(rust-lang#7771) to point to RFC 628
update FIXME(rust-lang#19839) to point to issue 26925

nivkner added a commit to nivkner/rust that referenced this issue Sep 30, 2017

address some `FIXME`s whose associated issues were marked as closed
remove FIXME(rust-lang#13101) since `assert_receiver_is_total_eq` stays.
remove FIXME(rust-lang#19649) now that stability markers render.
remove FIXME(rust-lang#13642) now the benchmarks were moved.
remove FIXME(rust-lang#6220) now that floating points can be formatted.
remove FIXME(rust-lang#18248) and write tests for `Rc<str>` and `Rc<[u8]>`
remove reference to irelevent issues in FIXME(rust-lang#1697, rust-lang#2178...)
update FIXME(rust-lang#5516) to point to getopts issue 7
update FIXME(rust-lang#7771) to point to RFC 628
update FIXME(rust-lang#19839) to point to issue 26925

nivkner added a commit to nivkner/rust that referenced this issue Sep 30, 2017

address some `FIXME`s whose associated issues were marked as closed
remove FIXME(rust-lang#13101) since `assert_receiver_is_total_eq` stays.
remove FIXME(rust-lang#19649) now that stability markers render.
remove FIXME(rust-lang#13642) now the benchmarks were moved.
remove FIXME(rust-lang#6220) now that floating points can be formatted.
remove FIXME(rust-lang#18248) and write tests for `Rc<str>` and `Rc<[u8]>`
remove reference to irelevent issues in FIXME(rust-lang#1697, rust-lang#2178...)
update FIXME(rust-lang#5516) to point to getopts issue 7
update FIXME(rust-lang#7771) to point to RFC 628
update FIXME(rust-lang#19839) to point to issue 26925
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.