Skip to content

FEAT: Add ArabicPresentationFormConverter for Arabic isolated-form substitution#1888

Merged
romanlutz merged 4 commits into
microsoft:mainfrom
Raulster24:raulster24/add-arabic-presentation-form-converter
Jun 2, 2026
Merged

FEAT: Add ArabicPresentationFormConverter for Arabic isolated-form substitution#1888
romanlutz merged 4 commits into
microsoft:mainfrom
Raulster24:raulster24/add-arabic-presentation-form-converter

Conversation

@Raulster24
Copy link
Copy Markdown
Contributor

Description

Adds ArabicPresentationFormConverter, a deterministic PromptConverter that substitutes Arabic letters with their isolated Arabic Presentation Forms-B glyphs (for example ALEF U+0627 -> U+FE8D). A reader still recognizes the same letters, shown in their non-joining isolated shapes, while the underlying code point and token sequence change. The substitution map is derived from Unicode decomposition data (not hand-maintained), so it stays correct across Unicode versions. Characters with no Arabic isolated presentation form (including Arabic digits and punctuation) are left unchanged.

Third in a small set of atomic Arabic-script converters, following BidiConverter (#1832) and TatweelConverter (#1869). This is a character-substitution converter and can later migrate to a shared CharacterSubstitutionConverter base alongside the planned ArabiziConverter.

cc @romanlutz

Tests and Documentation

  • Added tests/unit/prompt_converter/test_arabic_presentation_form_converter.py: exact mapping, non-Arabic passthrough, mixed text, Arabic-non-letter (digit/punctuation) passthrough, empty input, determinism, and unsupported-input-type rejection. All pass: uv run pytest tests/unit/prompt_converter/test_arabic_presentation_form_converter.py
  • Registered in pyrit/prompt_converter/__init__.py (import + __all__).
  • Added a usage example to doc/code/converters/1_text_to_text_converters.py and regenerated the paired .ipynb plus the converter modality table in 0_converters.ipynb via JupyText.
  • ruff and ty are clean; the converter-documentation conformance test passes.

@romanlutz romanlutz added this pull request to the merge queue Jun 2, 2026
Merged via the queue into microsoft:main with commit a6ce3d4 Jun 2, 2026
52 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants