Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-33731: Implement support for locale specific format (WIP) #8612

Closed

Conversation

james-emerton
Copy link

@james-emerton james-emerton commented Aug 2, 2018

Adds support for 'l' and 'L' which will format a string as per 'f'
except that they will use locale specific grouping, separators, and
decimal point. In the case of 'L' the LC_MONETARY values will be used.

https://bugs.python.org/issue33731

Adds support for 'l' and 'L' which will format a string as per 'f'
except that they will use locale specific grouping, separators, and
decimal point. In the case of 'L' the LC_MONETARY values will be used.
@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA (this might be simply due to a missing "GitHub Name" entry in your b.p.o account settings). This is necessary for legal reasons before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

When your account is ready, please add a comment in this pull request
and a Python core developer will remove the CLA not signed label
to make the bot check again.

You can check yourself
to see if the CLA has been received.

Thanks again for your contribution, we look forward to reviewing it!

Compare the output of the 'l' and 'L' formats against the output of
`locale.format_string()`. The locale data seems to differ between
platforms, so testing against literals is challenging. (Other format
tests are explicitly providing locale data, but that approach would fail
to properly test my modifications.)

Also added a test against literals for the en_US locale in the hopes
that it's consistent across platforms.
@james-emerton
Copy link
Author

I think I need some guidance on testing this correctly. I see that the existing format tests for the Decimal type are passing locale data explicitly via an undocumented parameter. Unfortunately, using this approach doesn't really test my changes, which were actually to libmpdec.

I've run tests locally on both MacOS and Windows, and they pass on my local machine, but the Windows CI build is failing. I haven't yet tested a Linux build locally, for which I'll need to spin up another VM.

Finally, since this requires changes to libmpdec, is it acceptable to commit those changes here or do they first need to be incorporated into the upstream repository?

@skrah
Copy link
Contributor

skrah commented Aug 4, 2018

_decimal has the undocumented parameter, see test_n_format.

If I approve the changes, the patch can include libmpdec. It seems that there's still some discussion in the issue though. An immediate observation is that I'd prefer 'm' for monetary -- There is at least one open issue for the 'm' parameter, too.

@james-emerton
Copy link
Author

james-emerton commented Aug 4, 2018

I can certainly use the aforementioned parameter, but I feel that doing so doesn't really exercise the changes. This changes mpd_parse_format_str to treat the format as 'f' with the addition of locale data. Using the parameter to provide the locale data would test that we switch the format to 'f' but not that the correct locale data is being loaded. Maybe that's okay/the best we can do in this case?

The use of 'l' and 'L' was the suggestion of Eric Smith in his comment on bpo-34311. The intention being 'l' for locale and 'L' for monetary locale, as this format would not be exclusively for the monetary context. At any rate, I'm certainly open to changing the letters being used.

@ericvsmith
Copy link
Member

I don't feel strongly about l and L. m certainly works for me, too. Has any other language broken ground on this? Can we follow their example?

@james-emerton
Copy link
Author

james-emerton commented Aug 4, 2018

I did a bit more looking around, and the Single UNIX Specification for printf provides a modifier that performs grouping. From http://man7.org/linux/man-pages/man3/printf.3.html

  '      For decimal conversion (i, d, u, f, F, g, G) the output is to
         be grouped with thousands' grouping characters if the locale
         information indicates any.  (See setlocale(3).)  Note that
         many versions of gcc(1) cannot parse this option and will
         issue a warning.  (SUSv2 did not include %'F, but SUSv3 added
         it.)

Thus, n should be equivalent to 'g and we should also support the modifier for f, o, x, d, and their uppercase equivalents. I think this is a better approach than introducing another type.

It appears that the C99 implementation provides no mechanism to use the values from LC_MONETARY in place of LC_NUMERIC. This distinction seems exceedingly rare in practice, and I'd personally be okay with leaving it out.

@skrah
Copy link
Contributor

skrah commented Aug 4, 2018

Currently we're using uppercase to mean "print an uppercase exponent". So:

'n' => regular_locale + 'g'
'N' => regular_locale + 'G'

'l' => regular_locale + 'f'
'L' => monetary_locale + 'f'?

Here I'd expect 'F', even though 'F' doesn't really do anything:

'L' => regular_locale + 'F' 

libmpdec actually makes use of the regular convention:

if (isupper((uchar)type)) {
    type = tolower((uchar)type);
    flags |= MPD_FMT_UPPER;
}

Perhaps we can use a modifier like $n, $l for use monetary?

Upon further discussion and research it appears that we should be supporting grouping via the `'` modifier as per C99
This implements the `'` modifier in place of the thousands separator to enable locale specific grouping and decimal point.
@james-emerton
Copy link
Author

I've now implemented this (for just Decimal so far) by accepting ' in place of the , or _ characters. (I noticed while was in here that Decimal doesn't currently support _)

I also see that those other two modifiers are documented in PEP 378 and PEP 515. Should I be writing this up as a PEP as well?

@steelman
Copy link

steelman commented Jan 3, 2019

Interesting. Without knowing about this PR, I whipped up something similar (#11405). Clearly there is a demand for this feature.

@skrah
Copy link
Contributor

skrah commented Jan 4, 2019

@james-emerton Perhaps a mini-PEP that briefly lists the motivation and syntax alternatives (I still like $n) is a good way forward. Discussion is currently scattered among two GitHub issues and two bugs.python.org issues.

This is just my opinion, @ericvsmith is the format-language expert.

@skrah
Copy link
Contributor

skrah commented Jan 4, 2019

Also @steelman of course.

@steelman
Copy link

steelman commented Jan 4, 2019

I've sent an e-mail to python-ideas, however, I can't see it in the archive yet.

@james-emerton
Copy link
Author

Last I was working on this (months ago!) I started but never finished drafting a PEP. I've finished that off and added a bit about the alternative suggestions I've seen.

python/peps#886

@skrah
Copy link
Contributor

skrah commented Jan 30, 2019

@james-emerton Thanks, in the meantime Eric Smith has posted his preferred version here:

https://mail.python.org/pipermail/python-ideas/2019-January/054837.html

I now support that version. PEPs should probably be announced and discussed on python-ideas.

@skrah
Copy link
Contributor

skrah commented Jan 30, 2019

I suggest to continue the discussion at https://bugs.python.org/issue35638 and on python-ideas, so I'm closing this one now (we can reopen as appropriate later).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants