Handle multilingual strings to improve text shaping results (fix #1187) #1193

andersonhc · 2024-06-04T12:51:49Z

As pointed in #1187, if the text has different languages, harfbuzz will auto-detect and shape using the first script found.

This change includes the Unicode Scripts table into fpdf2 and breaks the input string into different fragments that are shaped individually if multiple scripts are found.

Having fragments being "script aware" will also be useful in the future to implement automatic text wrapping.

Checklist:

The GitHub pipeline is OK (green),
meaning that both pylint (static code analyzer) and black (code formatter) are happy with the changes of this PR.
A unit test is covering the code added / modified by this PR
This PR is ready to be merged
In case of a new feature, docstrings have been added, with also some documentation in the docs/ folder
A mention of the change is present in CHANGELOG.md

By submitting this pull request, I confirm that my contribution is made under the terms of the GNU LGPL 3.0 license.

codecov-commenter · 2024-06-04T17:14:03Z

Codecov Report

Attention: Patch coverage is 99.46524% with 1 line in your changes missing coverage. Please review.

Project coverage is 93.29%. Comparing base (2b866d8) to head (acb6af1).
Report is 15 commits behind head on master.

❗ Current head acb6af1 differs from pull request most recent head 45074fd

Please upload reports for the commit 45074fd to get more accurate results.

Files	Patch %	Lines
fpdf/unicode_script.py	99.43%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1193      +/-   ##
==========================================
+ Coverage   93.25%   93.29%   +0.03%     
==========================================
  Files          30       31       +1     
  Lines        9253     9524     +271     
  Branches     2104     2135      +31     
==========================================
+ Hits         8629     8885     +256     
- Misses        385      393       +8     
- Partials      239      246       +7

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

gmischler

👍

Painful having to do a linear search, but I guess it's not worth the effort here to build a range tree (given the caching).

Oh, and in case you care about 100% coverage, I think Codecov would like to see a test case that actually exhausts UNICODE_RANGE_TO_SCRIPT.

automatically detect unicode script

04228c9

andersonhc requested a review from gmischler as a code owner June 4, 2024 12:51

andersonhc changed the title ~~Handle multilingual string to improve text shaping results (fix #1187)~~ Handle multilingual strings to improve text shaping results (fix #1187) Jun 4, 2024

andersonhc added 2 commits June 4, 2024 10:21

Add files via upload

b9442c6

Fix fragment width with text shaping

45074fd

gmischler approved these changes Jun 5, 2024

View reviewed changes

add test and changelog

c036915

andersonhc merged commit fbbb3f7 into py-pdf:master Jun 6, 2024
11 checks passed

andersonhc mentioned this pull request Jun 6, 2024

Issue on Khmer Unicode Font Subscripts #1187

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle multilingual strings to improve text shaping results (fix #1187) #1193

Handle multilingual strings to improve text shaping results (fix #1187) #1193

andersonhc commented Jun 4, 2024

codecov-commenter commented Jun 4, 2024

gmischler left a comment

Handle multilingual strings to improve text shaping results (fix #1187) #1193

Handle multilingual strings to improve text shaping results (fix #1187) #1193

Conversation

andersonhc commented Jun 4, 2024

codecov-commenter commented Jun 4, 2024

Codecov Report

gmischler left a comment

Choose a reason for hiding this comment