Skip to content

Conversation

@parthkandharkar
Copy link

This PR fixes incorrect lowercasing of Excel number-format string literals during CSS parsing and atomization. Pandas previously lowercased all CSS values, which corrupted custom number formats (e.g., "M" → "m").

The fix treats number-format as a special case in both parse() and atomize(), preserving user-provided casing. Additional unit tests verify this behavior and ensure compatibility with existing CSS resolution behavior

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

in_string = False

for ch in value:
if ch == '"':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is not reliable, quotes can also be single quotes no?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the changes as requested. Although I don't think excel supports single-quoted literals in number formats.

Comment on lines 435 to +439
# TODO: don't lowercase case sensitive parts of values (strings)
val = val.strip().lower()
if prop == "number-format":
val = _normalize_number_format_value(val)
else:
val = val.strip().lower()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question I have is why are we doing .lower() at all? What breaks if we remove this?

Copy link
Author

@parthkandharkar parthkandharkar Nov 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We lowercase CSS properties and values because CSS is defined as case-insensitive for all keywords and the entire Styler formatting engine is built on that assumption. If we remove .lower(), we immediately break core parts of the parser because the implementation expects all tokens to be normalized before comparison. Everything from font weights, border styles, colors and unit names is matched against internal tables that contain only lowercase entries. Without .lower(), inputs like "BOLD", "SOLID", "Red", "PX", "None", or "THIN" stop matching and fall through the logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: Strings in Excel number formats do not preserve case

2 participants