Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datetime field names use different normalization than calendar #22342

Open
TomAugspurger opened this issue Aug 14, 2018 · 2 comments
Open

Datetime field names use different normalization than calendar #22342

TomAugspurger opened this issue Aug 14, 2018 · 2 comments
Labels
Bug Needs Discussion Requires discussion from core team before further action Timeseries Unicode Unicode strings

Comments

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Aug 14, 2018

At least under some conditions?

>>> import locale
>>> import calendar
>>> import unicodedata
>>> from pandas._libs.tslibs import fields
>>> locale_name = "crh_UA.UTF-8"
>>> x = fields.get_locale_names('f_month', locale_name)[6].capitalize()
>>> y = calendar.month_name[6].capitalize()

>>> x == y
# False

unicodedata.normalize("NFD", x) == unicodedata.normalize("NFD", y)
# True

this manifests in failures in #21814 (comment)

@TomAugspurger
Copy link
Contributor Author

So two questions:

  1. do we care enough to change this?
  2. do we make an guarantees on what normalization we use, and if it matches python?

@TomAugspurger TomAugspurger added this to the Contributions Welcome milestone Aug 14, 2018
@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Aug 14, 2018

Once #21814 is in, this should be reproducible with

docker run -it --name=py36_locale continuumio/miniconda:latest /bin/bash 
git clone https://github.com/pandas-dev/pandas
cd pandas
sh ci/install_circle.sh

@mroeschke mroeschke added Bug Needs Discussion Requires discussion from core team before further action labels Apr 1, 2020
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Discussion Requires discussion from core team before further action Timeseries Unicode Unicode strings
Projects
None yet
Development

No branches or pull requests

2 participants