Skip to content

Add support for chapter ("avdelning") level in document structure#27

Merged
marcarl merged 6 commits into
mainfrom
feature/add-avdelning-support
Jan 5, 2026
Merged

Add support for chapter ("avdelning") level in document structure#27
marcarl merged 6 commits into
mainfrom
feature/add-avdelning-support

Conversation

@marcarl
Copy link
Copy Markdown
Collaborator

@marcarl marcarl commented Jan 4, 2026

Summary

This PR adds full support for documents containing AVDELNING (division) structures. The formatting infrastructure was already 90% in place - this change removes the blocking condition and improves ID generation.

Changes

1. Remove AVDELNING Block

  • File: sfs_processor.py
  • Change: Removed the blocking condition in ignore_rules() that prevented documents with AVDELNING from being processed
  • Documents with AVDELNING structures now process automatically

2. Improve AVDELNING Section ID Generation

  • File: formatters/format_sfs_text.py
  • Change: Generate clean, concise IDs matching the existing chapter format
  • Convert Roman numerals (I, II, III) to Arabic numbers (1, 2, 3)
  • Convert Swedish ordinals (FÖRSTA, ANDRA) to Arabic numbers (1, 2, 3)
  • Use "avd" prefix for consistency with "kap" prefix

3. Add Integration Tests

  • File: test/test_format_sfs_text.py
  • Added comprehensive tests for:
    • AVDELNING formatting with Roman numerals
    • AVDELNING formatting with Swedish ordinals
    • Section tagging with CSS class 'avdelning'
    • ID generation for both formats

Example Output

<section id="avd1" class="avdelning">
  ## AVDELNING I. SOCIALTJÄNSTENS MÅL
</section>

<section id="kap1" class="kapitel">
  ## 1 kap. Lagens innehåll
</section>

AVDELNING Patterns Supported

  • Roman numerals: AVDELNING I, AVDELNING II, AVD. IIIid="avd1", id="avd2", id="avd3"
  • Swedish ordinals: FÖRSTA AVDELNING, ANDRA AVDELNINGENid="avd1", id="avd2"

Testing

✅ All 33 tests pass
✅ Tested with document 2025:400 (Socialtjänstlag) containing 9 AVDELNING sections
✅ All sections processed correctly with proper formatting and IDs

Impact

  • No breaking changes
  • Enables processing of previously blocked documents
  • Consistent ID format across document structures

marcarl and others added 4 commits January 4, 2026 23:17
Enable automatic processing of documents containing AVDELNING (division)
structures. The formatting infrastructure was already in place - this change
removes the blocking condition that prevented these documents from being
processed.

Changes:
- Remove AVDELNING block from ignore_rules() in sfs_processor.py
- Add integration tests for AVDELNING formatting and section tagging
- Verify AVDELNING headers are formatted as H2 with class="avdelning"

Tested with document 2025:400 (Socialtjänstlag) which contains 9 AVDELNING
sections, all processed correctly.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Generate clean, concise IDs for AVDELNING sections using the format
"avd1", "avd2", etc., matching the existing "kap1", "kap2" format
for chapters.

Changes:
- Convert Roman numerals (I, II, III) to Arabic numbers (1, 2, 3)
- Convert Swedish ordinals (FÖRSTA, ANDRA) to Arabic numbers (1, 2, 3)
- Use "avd" prefix instead of "avdelning" for brevity
- Add tests for both Roman numeral and Swedish ordinal formats

Example output:
- "AVDELNING I" -> id="avd1"
- "ANDRA AVDELNINGEN" -> id="avd2"
- "AVD. III" -> id="avd3"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
AVDELNING sections now properly wrap all their child KAPITEL sections and
their descendants. Previously, AVDELNING sections would close immediately
after their heading, with KAPITEL sections as siblings instead of children.

Changes:
- Track when inside an AVDELNING section
- Increase effective level by +1 for all headings inside AVDELNING
  - H2 KAPITEL becomes effective level 3
  - H3 subsections become effective level 4
  - H4 paragraphs become effective level 5
- AVDELNING only closes when encountering another AVDELNING or document end

Example structure:
```
<section id="avd1" class="avdelning">
  ## AVDELNING I
  <section id="kap1" class="kapitel">
    ## 1 kap.
    ...child sections...
  </section>
  <section id="kap2" class="kapitel">
    ## 2 kap.
  </section>
</section>
```

Verified with document 2025:400:
- 647 opening tags match 647 closing tags
- All KAPITEL sections properly nested inside AVDELNING

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fix issue where title-case references to divisions in "Lagens disposition"
(e.g., "Avdelning I") were incorrectly treated as division headers, creating
spurious AVDELNING sections.

Changes:
- Remove re.IGNORECASE from is_chapter_header() pattern matching
- Remove re.IGNORECASE from generate_section_id() pattern matching
- Only all-uppercase "AVDELNING" is now recognized as a division header
- Title-case "Avdelning" is treated as regular paragraph content

Impact:
- Document 2025:400 now has exactly 9 AVDELNING sections (not 18)
- "Lagens disposition" section correctly lists divisions as text content
- All section tags remain balanced (638 opening, 638 closing)

Before: "AVDELNING I" and "Avdelning I" both matched
After: Only "AVDELNING I" matches (case-sensitive)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@marcarl marcarl changed the title Add support for AVDELNING document structure Add support for chapter ("avdelning") level in document structure Jan 4, 2026
Content under AVDELNING sections now uses lower heading levels to
reflect proper hierarchy:
- AVDELNING headers remain H2 (##)
- KAPITEL headers become H3 (###) when inside AVDELNING
- Subsection titles become H4 (####)
- Paragraph markers (§) become H5 (#####) when inside AVDELNING

This ensures the Markdown heading hierarchy properly represents the
document structure, where AVDELNING is the top-level organizational
unit containing KAPITEL and their paragraphs.

Example structure:
  ## AVDELNING I. (H2)
    ### 1 kap. (H3 - shifted from H2)
      #### Section title (H4)
        ##### 1 § (H5 - shifted from H4)
@marcarl marcarl merged commit 25bf65b into main Jan 5, 2026
5 checks passed
@marcarl marcarl deleted the feature/add-avdelning-support branch January 5, 2026 09:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant