Add IniConfig.parse() with inline comment stripping and Unicode whitespace handling #70

RonnyPfannschmidt · 2025-10-18T21:49:44Z

Summary

This PR adds backward-compatible solutions for issue #55 (inline comment handling) and issue #4 (Unicode whitespace).

Changes

1. Add `IniConfig.parse()` classmethod (Fixes #55)

New parse() classmethod with strip_inline_comments parameter
- Default: True - properly strips inline comments from values
- Opt-out: False - preserves old behavior if needed
IniConfig() constructor maintains backward compatibility (doesn't strip comments)
Users should migrate to IniConfig.parse() for correct comment handling

Example:

# Recommended: strips inline comments
config = IniConfig.parse("setup.cfg")
# "name = value # comment" → value is "value"

# Backward compatible: preserves old behavior  
config = IniConfig("setup.cfg")
# "name = value # comment" → value is "value # comment"

2. Add `strip_section_whitespace` parameter (Addresses #4)

Opt-in parameter for IniConfig.parse() to strip Unicode whitespace from section names
Default: False - preserves existing behavior
When True: strips Unicode whitespace (U+00A0, U+2000, U+3000, etc.) from section names
Documents that Python 3's str.strip() has handled Unicode whitespace since Python 3.0 (2008)
Since iniconfig 2.0.0 (Python 3 only), values and key names already benefit from Unicode-aware stripping

Example:

# Opt-in to Unicode whitespace stripping for section names
config = IniConfig.parse("setup.cfg", strip_section_whitespace=True)
# "[section\u00a0]" → section name is "section" (NO-BREAK SPACE stripped)

3. Code Refactoring

Consolidated __init__ to accept optional _sections and _sources parameters
Simplified parse() to call constructor with pre-parsed data
Removed complex __new__ logic for cleaner, more maintainable code
Added parse_ini_data() helper function to eliminate code duplication

Testing

✅ All 49 tests pass (42 existing + 7 new)
✅ 100% backward compatible
✅ All pre-commit hooks pass (ruff, mypy)

Documentation

Updated CHANGELOG with detailed notes for v2.3.0
Documented Python 3's Unicode whitespace handling history
Added comprehensive docstrings with examples

Closes

Fixes #55
Addresses #4 (opt-in solution with full Unicode whitespace documentation)

🤖 Generated with Claude Code

Fixes pytest-dev#55 - Inline comments were incorrectly included in parsed values The bug: Inline comments (# or ;) were being included as part of values instead of being stripped, inconsistent with how section comments are handled. Example of the bug: name = value # comment Result was: "value # comment" (incorrect) Should be: "value" (correct) Changes: - Add IniConfig.parse() classmethod with strip_inline_comments parameter - Default: strip_inline_comments=True (correct behavior - strips comments) - Can set strip_inline_comments=False if old buggy behavior needed - IniConfig() constructor preserves old behavior for backward compatibility (calls parse_ini_data with strip_inline_comments=False) - Add parse_ini_data() helper in _parse.py to avoid code duplication - Update _parseline() to support strip_inline_comments parameter - Add comprehensive tests for both correct and legacy behavior Backward compatibility: Existing code using IniConfig() continues to work unchanged. Users should migrate to IniConfig.parse() for correct behavior. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Add opt-in Unicode whitespace stripping for section names (issue pytest-dev#4) Changes: - Add strip_section_whitespace parameter to IniConfig.parse() - Default: False (preserves backward compatibility) - When True: strips Unicode whitespace from section names - Document Unicode whitespace handling in CHANGELOG - Python 3's str.strip() has handled Unicode since Python 3.0 (2008) - iniconfig 2.0.0+ benefits from this automatically - Values and key names already strip Unicode whitespace correctly - Add tests for Unicode whitespace handling Background: Since iniconfig moved to Python 3 only in version 2.0.0, all strings are Unicode by default. Python 3's str.strip() handles Unicode whitespace characters (NO-BREAK SPACE, EN QUAD, IDEOGRAPHIC SPACE, etc.) automatically. This addresses the core concern in issue pytest-dev#4 for values and key names. The new strip_section_whitespace parameter provides opt-in stripping for section names, which were not previously stripped for backward compatibility. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Consolidate __init__ to accept optional _sections and _sources parameters, allowing parse() to simply call the constructor. Changes: - Add _sections and _sources optional parameters to __init__ - Compute sections and sources first, then assign once to Final attributes - When pre-parsed data provided, use it directly (called from parse()) - Otherwise, parse the data normally (backward compatible path) - Simplify parse() to just call constructor with pre-parsed data This makes the code cleaner and easier to understand while maintaining the exact same functionality and backward compatibility. All 49 tests pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

RonnyPfannschmidt and others added 3 commits October 18, 2025 23:00

RonnyPfannschmidt merged commit 7faed13 into pytest-dev:main Oct 18, 2025
15 checks passed

RonnyPfannschmidt mentioned this pull request Oct 18, 2025

strip unicode whitespace #4

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add IniConfig.parse() with inline comment stripping and Unicode whitespace handling #70

Add IniConfig.parse() with inline comment stripping and Unicode whitespace handling #70

Uh oh!

RonnyPfannschmidt commented Oct 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add IniConfig.parse() with inline comment stripping and Unicode whitespace handling #70

Add IniConfig.parse() with inline comment stripping and Unicode whitespace handling #70

Uh oh!

Conversation

RonnyPfannschmidt commented Oct 18, 2025

Summary

Changes

1. Add IniConfig.parse() classmethod (Fixes #55)

2. Add strip_section_whitespace parameter (Addresses #4)

3. Code Refactoring

Testing

Documentation

Closes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. Add `IniConfig.parse()` classmethod (Fixes #55)

2. Add `strip_section_whitespace` parameter (Addresses #4)