Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Support μs Greek small letter mu #58473

Closed
wants to merge 1 commit into from

Conversation

covracer
Copy link

@covracer covracer commented Apr 29, 2024

The abbreviation for microseconds is best represented as μs, using the 0x3bc Greek small letter mu, instead of the 0xb5 Latin Extended micro sign. This aligns with:

Ensure that both Latin Extended and Greek characters are accepted as aliases for microseconds. Prefer the Greek character in a comment. Looking at the UTF8 test data in test_clipboard.py, the degree and registered trademark symbols should sufficiently cover the Latin Extended character block, so prefer the Greek character there too.

Recommended changes for when the RUF001 updates graduate from preview to stable (can't be applied yet because of RUF100):

diff --git a/pandas/_libs/tslibs/timedeltas.pyi b/pandas/_libs/tslibs/timedeltas.pyi
index 9fcea5e32d..5cbf061859 100644
--- a/pandas/_libs/tslibs/timedeltas.pyi
+++ b/pandas/_libs/tslibs/timedeltas.pyi
@@ -55,7 +55,7 @@ UnitChoices: TypeAlias = Literal[
     "us",
     "microseconds",
     "microsecond",
-    "µs",  # 0x0b5 Latin Extended micro sign
+    "µs",  # noqa: RUF001 # 0x0b5 Latin Extended micro sign
     "μs",  # 0x3bc Greek small letter mu
     "micro",
     "micros",
diff --git a/pandas/tests/scalar/timedelta/test_constructors.py b/pandas/tests/scalar/timedelta/test_constructors.py
index 138546eb07..adeff7b77a 100644
--- a/pandas/tests/scalar/timedelta/test_constructors.py
+++ b/pandas/tests/scalar/timedelta/test_constructors.py
@@ -294,7 +294,7 @@ def test_construction():
     assert Timedelta("1 millisecond") == timedelta(milliseconds=1)
     assert Timedelta("1 us") == timedelta(microseconds=1)
     # 0x0b5 Latin Extended micro sign
-    assert Timedelta("1 µs") == timedelta(microseconds=1)
+    assert Timedelta("1 µs") == timedelta(microseconds=1) # noqa: RUF001
     # 0x3bc Greek small letter mu
     assert Timedelta("1 μs") == timedelta(microseconds=1)
     assert Timedelta("1 micros") == timedelta(microseconds=1)

Characters which look the same but have different underlying codes (homographs) are problematic. The abbreviation for microseconds is best represented as μs, using the 0x3bc Greek small letter mu, instead of the 0xb5 Latin Extended micro sign. This aligns with:

- [The International System of Units (SI) brochure](https://www.bipm.org/documents/20126/41483022/SI-Brochure-9-EN.pdf)
- NFKC normalized [Python code](https://peps.python.org/pep-3131/) and [domain names](https://unicode.org/reports/tr36/)
- Section 2.5 Duplicated Characters of [Unicode Technical Report 25](https://www.unicode.org/reports/tr25/)
- The microfarads abbreviation in [Pandas tests](/pandas-dev/pandas/tree/2.2.x/pandas/tests/computation/test_eval.py#L1914)
- Ruff confusable mapping [updates](https://github.com/astral-sh/ruff/pull/4430/files) (currently in the "preview" stage)

Ensure that both Latin Extended and Greek characters are accepted as aliases for microseconds. Prefer the Greek character in a comment and UTF8 test data.
@mroeschke
Copy link
Member

Thanks for the PR, but we require a bit more discussion on the enhancement issue from the core team before moving forward with PR so closing for now, but we can reopen if there's support for this enhancement request

@mroeschke mroeschke closed this Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Support μs Greek small letter mu
2 participants