-
-
Notifications
You must be signed in to change notification settings - Fork 332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add third party dtoa library #272
Comments
Added the Milo Yip DTOA library (emyg_dtoa) is uses to avoid issues where the standard sprintf() dtoa function changes output based on locale settings. It is also 40-50% faster than the standard dtoa for raw numeric data. If you wish to not use this third party library you can compile libxlsxwriter without it by passing `USE_STANDARD_DOUBLE=1` to make. The USE_DOUBLE_FUNCTION build variable is no longer used. Imported source from Source from https://github.com/miloyip/dtoa-benchmark Feature request #272
Added the Milo Yip DTOA library (emyg_dtoa) is uses to avoid issues where the standard sprintf() dtoa function changes output based on locale settings. It is also 40-50% faster than the standard dtoa for raw numeric data. If you wish to not use this third party library you can compile libxlsxwriter without it by passing `USE_STANDARD_DOUBLE=1` to make. The USE_DOUBLE_FUNCTION build variable is no longer used. Imported source from Source from https://github.com/miloyip/dtoa-benchmark Feature request #272
Added the Milo Yip DTOA library (emyg_dtoa) is uses to avoid issues where the standard sprintf() dtoa function changes output based on locale settings. It is also 40-50% faster than the standard dtoa for raw numeric data. If you wish to not use this third party library you can compile libxlsxwriter without it by passing `USE_STANDARD_DOUBLE=1` to make. The USE_DOUBLE_FUNCTION build variable is no longer used. Imported source from Source from https://github.com/miloyip/dtoa-benchmark Feature request #272
Added the Milo Yip DTOA library (emyg_dtoa) is uses to avoid issues where the standard sprintf() dtoa function changes output based on locale settings. It is also 40-50% faster than the standard dtoa for raw numeric data. If you wish to not use this third party library you can compile libxlsxwriter without it by passing `USE_STANDARD_DOUBLE=1` to make. The USE_DOUBLE_FUNCTION build variable is no longer used. Imported source from Source from https://github.com/miloyip/dtoa-benchmark Feature request #272
Great to see that you started to work on this feature. 😄 To make the emyg_dtoa code work with older C compilers, too, I had to make some further adjustments (see emyg_dtoa.c). Maybe you could consider to adopt those modifications. |
I'm still testing it out. I need to add some double specific tests as well to see if there are any issues.
Do you mean including a version of stdint.h? Are there other changes as well? I don't plan to include a copy of stdint.h so people with Windows compilers older than VS 2010 will either need to figure out a way of including it or use the standard dtoa formatting. I think that this be a small subset of potential users. |
Sure, I understand that very well.
Yes, at several places I had to move variable definitions to the top of a code block, because older compilers (not compatible with C99) complain otherwise.
Fair comment. However, quite a few people are still using VS 2010, even though it is a rather dated compiler, and VS 2010 for example does not support C99 (as far as I know, no VS compiler version does fully support C99). So, if you want to support VS 2010 you need to address its incompatibilities (like not having |
Added the Milo Yip DTOA library (emyg_dtoa) is uses to avoid issues where the standard sprintf() dtoa function changes output based on locale settings. It is also 40-50% faster than the standard dtoa for raw numeric data. If you wish to not use this third party library you can compile libxlsxwriter without it by passing `USE_STANDARD_DOUBLE=1` to make. The USE_DOUBLE_FUNCTION build variable is no longer used. Imported source from Source from https://github.com/miloyip/dtoa-benchmark Feature request #272
Why choose an outdated less performant version of Since 2018, most C++ compilers switched to the brand new Visual Studio incorporated this algorithm back in 2017. Release notes here.
|
I mentioned Ryu already in January in one of my comments to issue 64. It is true that Visual C++ 2017 adopted Ryu for Of course, it would be possible to replace the emyg_dtoa code by Ulf Adams' ryu C code. However, I found out that the conversion results differ, although only slightly. While emyg_dtoa tries to produce the shortest possible string, ryu always appends the exponent. Examples: double x1 = 1.23456789012345678;
/* emyg_dtoa(x1) => "1.2345678901234568" */
/* ryu(x1) => "1.2345678901234567E0" */
double x2 = 0.61728394506172835;
/* emyg_dtoa(x2) => "0.6172839450617284" */
/* ryu(x2) => "6.172839450617283E-1" */ The example shows another effect: obviously the last significant digit is (at least sometimes) rounded differently by the two algorithms. Most likely both differences don't matter much in practice. |
Interesting. And Excel seems to format the latter as |
In the GUI Excel displays at most 15 significant digits. However, internally (that is, for the representation in the file itself) up to 17 significant digits are used. In the GUI the decimal separator depends on the user's locale (or the user's settings). In the file always a point is used as the decimal separator. |
(I used a custom cell decimal format with 30 decimals. That didn't change the UI.) |
Such a custom cell format allows you to display a value like |
In the meantime I added Resulting file sizes are slightly bigger, on average 0.2 percent - that is, neglectable. There was no measurable effect on runtime, but that was to be expected for the small test cases. IMHO replacing |
@utelle Can you push the RYU alternative to a branch of your repo so that I test it. |
How many? 😛 Will it have an impact on, let's say, 200 000 values? |
@jmcnamara I'll do that later today. Since RYU adds an exponent field to all floating point numbers, the resulting Excel files differ from the ones that are used for comparison in the test cases. That is, all tests fail in that respect. Nevertheless, Excel can successfully open all generated files and from the user's perspective they are identical. |
Good question. I have not done any performance tests yet.
Probably yes, but my guess is that it will be smaller than you may expect. In respect to speed the |
@jmcnamara I created branch ryu_test in my repository. On invoking |
Added the optional Milo Yip DTOA library (emyg_dtoa) to avoid issues where the standard sprintf() dtoa function changes output based on locale settings. It is also 40-50% faster than the standard dtoa for raw numeric data. If you wish to use this third party library you can compile libxlsxwriter with it by passing `USE_DTOA_LIBRARY=1` to make. The USE_DOUBLE_FUNCTION build variable is no longer used. Imported source from https://github.com/miloyip/dtoa-benchmark Feature request #272
@utelle I've dusted off this work again with e EMYG library on the dtoa branch and rebased it to main. Can you try it when/if you get a chance and let me know if you encounter any issues. It is an option compilation so you will need to pass "USE_DTOA_LIBRARY=1" to If there are no issues I'll merge it into main and put it in the next release. |
I tested the Lines 429 and 437 should be removed and the array index has to be adjusted in line 430 to WriteExponent(kk - 1, &buffer[2]); resp line 438 to WriteExponent(kk - 1, &buffer[0 + length + 2]); If you want an explicit plus sign in the exponent, you should modify function |
Thanks. I'll fix that. |
Added the optional Milo Yip DTOA library (emyg_dtoa) to avoid issues where the standard sprintf() dtoa function changes output based on locale settings. It is also 40-50% faster than the standard dtoa for raw numeric data. If you wish to use this third party library you can compile libxlsxwriter with it by passing `USE_DTOA_LIBRARY=1` to make. The USE_DOUBLE_FUNCTION build variable is no longer used. Imported source from https://github.com/miloyip/dtoa-benchmark Feature request #272
@utelle I've pushed a fix, with a test case. I used a force push so you will need to get the latest code from the branch again. |
The new version works now as expected. |
Added the optional Milo Yip DTOA library (emyg_dtoa) to avoid issues where the standard sprintf() dtoa function changes output based on locale settings. It is also 40-50% faster than the standard dtoa for raw numeric data. If you wish to use this third party library you can compile libxlsxwriter with it by passing `USE_DTOA_LIBRARY=1` to make. The USE_DOUBLE_FUNCTION build variable is no longer used. Imported source from https://github.com/miloyip/dtoa-benchmark Feature request #272
This has been merged to main and released in libxlsxwriter version 1.1.1. |
I have started to look at using a third party library to do dtoa (double to string) formatting. Currently this is on the dtoa branch.
This is in order to avoid locale issues with double sprintf() formatting (for example getting "1,234" instead of the "1.234" required by Excel. For more background and a discussion of other workarounds see #64
The code is working on the dtoa branch with Mac/Linux/Window but I'm still testing it. If you would like to test it then please let me know how you get on.
The text was updated successfully, but these errors were encountered: