New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-33731: Implement support for locale specific format (WIP) #8612

Open
wants to merge 4 commits into
base: master
from

Conversation

Projects
None yet
6 participants
@james-emerton
Copy link

james-emerton commented Aug 2, 2018

Adds support for 'l' and 'L' which will format a string as per 'f'
except that they will use locale specific grouping, separators, and
decimal point. In the case of 'L' the LC_MONETARY values will be used.

https://bugs.python.org/issue33731

Implement support for locale specific decimal format
Adds support for 'l' and 'L' which will format a string as per 'f'
except that they will use locale specific grouping, separators, and
decimal point. In the case of 'L' the LC_MONETARY values will be used.

@james-emerton james-emerton requested review from rhettinger and skrah as code owners Aug 2, 2018

@the-knights-who-say-ni

This comment has been minimized.

Copy link

the-knights-who-say-ni commented Aug 2, 2018

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA (this might be simply due to a missing "GitHub Name" entry in your b.p.o account settings). This is necessary for legal reasons before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

When your account is ready, please add a comment in this pull request
and a Python core developer will remove the CLA not signed label
to make the bot check again.

You can check yourself
to see if the CLA has been received.

Thanks again for your contribution, we look forward to reviewing it!

Improve Decimal format tests
Compare the output of the 'l' and 'L' formats against the output of
`locale.format_string()`. The locale data seems to differ between
platforms, so testing against literals is challenging. (Other format
tests are explicitly providing locale data, but that approach would fail
to properly test my modifications.)

Also added a test against literals for the en_US locale in the hopes
that it's consistent across platforms.
@james-emerton

This comment has been minimized.

Copy link

james-emerton commented Aug 4, 2018

I think I need some guidance on testing this correctly. I see that the existing format tests for the Decimal type are passing locale data explicitly via an undocumented parameter. Unfortunately, using this approach doesn't really test my changes, which were actually to libmpdec.

I've run tests locally on both MacOS and Windows, and they pass on my local machine, but the Windows CI build is failing. I haven't yet tested a Linux build locally, for which I'll need to spin up another VM.

Finally, since this requires changes to libmpdec, is it acceptable to commit those changes here or do they first need to be incorporated into the upstream repository?

@skrah

This comment has been minimized.

Copy link
Member

skrah commented Aug 4, 2018

_decimal has the undocumented parameter, see test_n_format.

If I approve the changes, the patch can include libmpdec. It seems that there's still some discussion in the issue though. An immediate observation is that I'd prefer 'm' for monetary -- There is at least one open issue for the 'm' parameter, too.

@james-emerton

This comment has been minimized.

Copy link

james-emerton commented Aug 4, 2018

I can certainly use the aforementioned parameter, but I feel that doing so doesn't really exercise the changes. This changes mpd_parse_format_str to treat the format as 'f' with the addition of locale data. Using the parameter to provide the locale data would test that we switch the format to 'f' but not that the correct locale data is being loaded. Maybe that's okay/the best we can do in this case?

The use of 'l' and 'L' was the suggestion of Eric Smith in his comment on bpo-34311. The intention being 'l' for locale and 'L' for monetary locale, as this format would not be exclusively for the monetary context. At any rate, I'm certainly open to changing the letters being used.

@ericvsmith

This comment has been minimized.

Copy link
Member

ericvsmith commented Aug 4, 2018

I don't feel strongly about l and L. m certainly works for me, too. Has any other language broken ground on this? Can we follow their example?

@james-emerton

This comment has been minimized.

Copy link

james-emerton commented Aug 4, 2018

I did a bit more looking around, and the Single UNIX Specification for printf provides a modifier that performs grouping. From http://man7.org/linux/man-pages/man3/printf.3.html

  '      For decimal conversion (i, d, u, f, F, g, G) the output is to
         be grouped with thousands' grouping characters if the locale
         information indicates any.  (See setlocale(3).)  Note that
         many versions of gcc(1) cannot parse this option and will
         issue a warning.  (SUSv2 did not include %'F, but SUSv3 added
         it.)

Thus, n should be equivalent to 'g and we should also support the modifier for f, o, x, d, and their uppercase equivalents. I think this is a better approach than introducing another type.

It appears that the C99 implementation provides no mechanism to use the values from LC_MONETARY in place of LC_NUMERIC. This distinction seems exceedingly rare in practice, and I'd personally be okay with leaving it out.

@skrah

This comment has been minimized.

Copy link
Member

skrah commented Aug 4, 2018

Currently we're using uppercase to mean "print an uppercase exponent". So:

'n' => regular_locale + 'g'
'N' => regular_locale + 'G'

'l' => regular_locale + 'f'
'L' => monetary_locale + 'f'?

Here I'd expect 'F', even though 'F' doesn't really do anything:

'L' => regular_locale + 'F' 

libmpdec actually makes use of the regular convention:

if (isupper((uchar)type)) {
    type = tolower((uchar)type);
    flags |= MPD_FMT_UPPER;
}

Perhaps we can use a modifier like $n, $l for use monetary?

james-emerton added some commits Aug 4, 2018

Reverting Decimal format changes
Upon further discussion and research it appears that we should be supporting grouping via the `'` modifier as per C99
bpo-33731: Add locale modifier to Decimal.__format__
This implements the `'` modifier in place of the thousands separator to enable locale specific grouping and decimal point.
@james-emerton

This comment has been minimized.

Copy link

james-emerton commented Aug 5, 2018

I've now implemented this (for just Decimal so far) by accepting ' in place of the , or _ characters. (I noticed while was in here that Decimal doesn't currently support _)

I also see that those other two modifiers are documented in PEP 378 and PEP 515. Should I be writing this up as a PEP as well?

@steelman

This comment has been minimized.

Copy link

steelman commented Jan 3, 2019

Interesting. Without knowing about this PR, I whipped up something similar (#11405). Clearly there is a demand for this feature.

@skrah

This comment has been minimized.

Copy link
Member

skrah commented Jan 4, 2019

@james-emerton Perhaps a mini-PEP that briefly lists the motivation and syntax alternatives (I still like $n) is a good way forward. Discussion is currently scattered among two GitHub issues and two bugs.python.org issues.

This is just my opinion, @ericvsmith is the format-language expert.

@skrah

This comment has been minimized.

Copy link
Member

skrah commented Jan 4, 2019

Also @steelman of course.

@steelman

This comment has been minimized.

Copy link

steelman commented Jan 4, 2019

I've sent an e-mail to python-ideas, however, I can't see it in the archive yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment