Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make string_of_float and float_of_string locale-independent #1185

Merged
merged 6 commits into from Mar 28, 2018

Conversation

@ygrek
Copy link
Contributor

commented May 28, 2017

Problem:

see https://caml.inria.fr/mantis/view.php?id=6701 and ocaml-community/yojson#13

Solution:

  1. Use uselocale around calls to vsnprintf.
  2. Use strtod_l if available. If not available - use uselocale around calls to strtod.

Possible improvement - use vsnprintf_l if available.

Test output for 4.04.1 :

locale from environment
  print 1.23 : 1.23
  parse "12345.6789" : 12345.6789
  roundtrip 1.23 : 1.23
locale nl_NL
  print 1.23 : 1,23
  parse "12345.6789" : exn Failure("float_of_string")
  roundtrip 1.23 : 1,23
locale POSIX
  print 1.23 : 1.23
  parse "12345.6789" : 12345.6789
  roundtrip 1.23 : 1.23

With this patch :

$ ./q.locale 
locale from environment
  print 1.23 : 1.23
  parse "12345.6789" : 12345.6789
  roundtrip 1.23 : 1.23
locale nl_NL
  print 1.23 : 1.23
  parse "12345.6789" : 12345.6789
  roundtrip 1.23 : 1.23
locale POSIX
  print 1.23 : 1.23
  parse "12345.6789" : 12345.6789
  roundtrip 1.23 : 1.23

Speed overhead of uselocale around vsnprintf (x86_64, glibc).
Test program :

let () =
  for _ = 1 to 1_000_000 do
    ignore @@ Printf.sprintf "%f %f" 123456.789 (-8888.8888)
  done
$ sudo perf stat ./speed.4.04.1 |& grep instructions
     9,684,761,712      instructions              #    1.72  insn per cycle         
$ sudo perf stat ./speed.locale |& grep instructions
     9,780,738,205      instructions              #    1.73  insn per cycle         

i.e. each call to vsnprintf (~4800 instructions) bears additional cost of ~48 insns for two uselocale calls, i.e. 1% slowdown for every %f conversion in Printf.printf.

@gasche

This comment has been minimized.

Copy link
Member

commented May 28, 2017

The code looks sensible to me, and the benchmark is reassuring (I wondered about the performance impact), but the Windows CI seems to be failing.

@damiendoligez
Copy link
Member

left a comment

Reviewed and made a few comments.

/*
OCaml runtime itself doesn't call setlocale, i.e. it is using
standard "C" locale by default, but it is possible that
thirt-party code loaded into process does.

This comment has been minimized.

Copy link
@damiendoligez

damiendoligez May 29, 2017

Member

typo: "third-party"


#endif

locale_t caml_locale = 0;

This comment has been minimized.

Copy link
@damiendoligez

damiendoligez May 29, 2017

Member

Isn't locale_t a pointer type? I'd much rather make that explicit and use NULL instead of 0 (here and in a few other places).

This comment has been minimized.

Copy link
@ygrek

ygrek May 29, 2017

Author Contributor

man page uses (locale_t)0 - I will use that too

caml_locale = _create_locale(LC_NUMERIC, "C");
#else
caml_locale = duplocale(LC_GLOBAL_LOCALE);
caml_locale = newlocale(LC_NUMERIC_MASK,"C",caml_locale);

This comment has been minimized.

Copy link
@damiendoligez

damiendoligez May 29, 2017

Member

This will leak memory on OSX (and presumably the BSDs), whose duplocale/newlocale API is much more reasonable than Linux's.

This comment has been minimized.

Copy link
@ygrek

ygrek May 29, 2017

Author Contributor

Good catch!
I didn't know these api is such a mess. Funny thing is BSD claims it conforms to POSIX 2008 too, go figure :]
I believe I can get away with dropping duplocale usage and doing newlocale(LC_NUMERIC_MASK,"C",(locale_t)0)

@ygrek ygrek force-pushed the ygrek:uselocale branch from c4d64c8 to 8dd2a3f May 29, 2017

@ygrek ygrek force-pushed the ygrek:uselocale branch from 8dd2a3f to 6d2778f May 29, 2017

@ygrek

This comment has been minimized.

Copy link
Contributor Author

commented May 29, 2017

AFAIU mingw port links against old msvcrt which doesn't have _create_locale, but thankfully has per-thread setlocale

@ygrek ygrek force-pushed the ygrek:uselocale branch from f550707 to 747f969 May 29, 2017

@ygrek ygrek force-pushed the ygrek:uselocale branch from 747f969 to 6658798 May 29, 2017

@ygrek

This comment has been minimized.

Copy link
Contributor Author

commented Jun 23, 2017

ftr my work here is done at this point, waiting for review/decision

@xavierleroy

This comment has been minimized.

Copy link
Contributor

commented Sep 11, 2017

I think we're at a fork in the road: either we go the way of this PR, with all the nonsensical APIs offered by various C standard libraries, or we find and reuse good code that performs float<->string conversions the way we want them. I'm still punting on this decision. Opinions welcome.

@dbuenzli

This comment has been minimized.

Copy link
Contributor

commented Sep 11, 2017

It would generally make it easier to embed the runtime system in exotic settings (e.g. for unikernels), the hard libc requirement of the ocaml runtime system are very minimal (string.h, snprintf and malloc, the rest can be stubbed by returning ENOSYS) and that's one of the painful one to provide (I and others have been using this code for that).

There was a bit of discussion in MPR7218 with links to recent floating point conversion routines implementations. Also following the related issues shows that floating point conversion issue seem to show up quite recurrently.

@avsm

This comment has been minimized.

Copy link
Member

commented Sep 11, 2017

I'd like to second the desire to not depend on the OS runtime for this if at all possible. It's been responsible for a surprising amount of the complexity of embedding OCaml in non-libc environments. The dtoa implementation that @dbuenzli linked above has worked pretty well, and is hopefully more readable when unifdef'ed.

@gasche

This comment has been minimized.

Copy link
Member

commented Sep 11, 2017

I like the gained consistency/portability, but string_of_float can be a performance-sensitive operation for many script-like data-moving programs so one should also ensure that there is no notable performance regression compared to the libc we care about.

@dbuenzli

This comment has been minimized.

Copy link
Contributor

commented Sep 11, 2017

Regarding performance it seems both the V8 and SpiderMonkey JavaScript engines replaced Gay's code linked above with this code for improving performance, given the setting in which these operate it should also give a good confidence w.r.t. correctness, but the problem is that it's written in C++. Someone here seems to have benchmarked a few dtoa implementations.

@ygrek

This comment has been minimized.

Copy link
Contributor Author

commented Sep 12, 2017

FTR I would also prefer embedding code given that system api is so braindead and inconsistent (but I will not be working on it - too much work)

changes approved but merging is now conditional on the policy decision of using the system library or embedding the conversion function

@damiendoligez damiendoligez added this to the 4.07-or-later milestone Sep 25, 2017

@alainfrisch alainfrisch added the stdlib label Feb 28, 2018

@alainfrisch

This comment has been minimized.

Copy link
Contributor

commented Feb 28, 2018

I would also like to embed battle-tested implementation of string<->float conversion functions, but this won't be completely straightforward (need to choose which implementation, discuss if the license is ok, benchmark, etc) and nobody has volunteered to drive this project. So I'm in favor of already accepting this PR, which fixes an actual problem and has already been reviewed.
@damiendoligez : do you agree?

@damiendoligez

This comment has been minimized.

Copy link
Member

commented Mar 1, 2018

@alainfrisch You're right. Let's take this branch of the fork and we can later backtrack and embed the functions when we find good implementations with an acceptable license.

@damiendoligez
Copy link
Member

left a comment

Still looks good.

@damiendoligez

This comment has been minimized.

Copy link
Member

commented Mar 1, 2018

Formally approved. I'll merge in a few days if there are no objections.

@damiendoligez damiendoligez merged commit b0181bf into ocaml:trunk Mar 28, 2018

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@garrigue

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2018

On MacOs, I get errors and cannot compile float.c.
The compiler says that locale_t and use locale are undefined.

@objmagic

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2018

+1 on Jacque's comment.

Just merged trunk and float.c cannot compile now. I am on Mac OS X 10.13.3.

gasche added a commit that referenced this pull request Mar 29, 2018

Revert "make string_of_float and float_of_string locale-independent (#…
…1185)"

This reverts commit b0181bf.

The revert comes from the fact that it breaks the build on OSX, see

  #1185 (comment)
@gasche

This comment has been minimized.

Copy link
Member

commented Mar 29, 2018

I just reverted the PR ( revert commit 682aceb ) so that everyone can keep using trunk. I will re-submit a PR so that we don't forget that it has to be re-merged.

gasche added a commit to gasche/ocaml that referenced this pull request Mar 29, 2018

make string_of_float and float_of_string locale-independent (ocaml#1185)
notes:

* ifdef fest for MSVC
https://docs.microsoft.com/en-us/cpp/c-runtime-library/locale

* avoid duplocale due to different semantics on BSD
https://www.freebsd.org/cgi/man.cgi?query=newlocale&sektion=3

* mingw links against old msvcrt and doesn't have _create_locale

* this is the second merge of this change, which broke OSX build
  on the first try, see
  ocaml#1185 (comment)
@gasche

This comment has been minimized.

Copy link
Member

commented Mar 29, 2018

@garrigue, @objmagic: just to be sure, you did re-run ./configure before trying to build from the new trunk?

@objmagic

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2018

@gasche Yes. make distclean -> ./configure -> make world.opt

@shindere

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2018

@shindere

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2018

@shindere

This comment has been minimized.

Copy link
Contributor

commented Mar 29, 2018

@gasche

This comment has been minimized.

Copy link
Member

commented Jun 1, 2018

This is now re-merged as #1803.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.