Handle preceding whitespaces in write() for line break #947

SaiHarshaK · 2023-10-08T06:23:18Z

Fixes #902

This PR adds support to handle text starting with whitespaces which cause linebreak. In case of text with preceding whitespace being added any line with text, this would trigger a linebreak.
The issue currently is that on triggering linebreak, it currently throws an exception since the fragment is basically empty which would mean the perform_harfbuzz_shaping() would return None, and we would be unable to compute the shaped_text_width.

To solve this issue, we add a conditional check when perform_harfbuzz_shaping() return None, where we simply return shaped_text_width as (0, 0) (char len, char width)

Using the repro in the issue, have added UT for the same.
Before code changes:

test\text_shaping\test_text_shaping.py:169: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
fpdf\fpdf.py:218: in wrapper
    return fn(self, *args, **kwargs)
fpdf\fpdf.py:3674: in write
    new_page = self._render_styled_text_line(
fpdf\fpdf.py:2831: in _render_styled_text_line
    unscaled_width = frag.get_width(initial_cs=i != 0)
fpdf\line_break.py:179: in get_width
    (char_len, w) = self.font.get_text_width(
fpdf\fonts.py:199: in get_text_width
    return self.shaped_text_width(text, font_size_pt, text_shaping_parms)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = TTFFont(i=2, fontkey=mangal), text = [], font_size_pt = 40
text_shaping_parms = {'direction': None, 'features': {}, 'language': None, 'script': None, ...}

    def shaped_text_width(self, text, font_size_pt, text_shaping_parms):
        """
        When texts are shaped, the length of a string is not always the sum of all individual character widths
        This method will invoke harfbuzz to perform the text shaping and return the sum of "x_advance"
        and "x_offset" for each glyph. This method works for "left to right" or "right to left" texts.
        """
        _, glyph_positions = self.perform_harfbuzz_shaping(
            text, font_size_pt, text_shaping_parms
        )
        text_width = 0
>       for pos in glyph_positions:
E       TypeError: 'NoneType' object is not iterable

fpdf\fonts.py:212: TypeError

After code changes:

test/text_shaping/test_text_shaping.py::test_mixed_text_shaping PASSED                                       [100%]

Checklist:

The GitHub pipeline is OK (green),
meaning that both pylint (static code analyzer) and black (code formatter) are happy with the changes of this PR.
A unit test is covering the code added / modified by this PR
This PR is ready to be merged
[] In case of a new feature, docstrings have been added, with also some documentation in the docs/ folder
A mention of the change is present in CHANGELOG.md

Since this is a bugfix and not a feature, not adding anything to docs folder.

By submitting this pull request, I confirm that my contribution is made under the terms of the GNU LGPL 3.0 license.

gmischler · 2023-10-08T08:49:14Z

To solve this issue, we add a whitespace so that the fragment is not empty anymore and it able to print whitespace and go ahead with linebreak and continue rendering further output.

I'm really not sure if this is the correct approach. Why add extra text data that is not actually supposed to appear on page? How can you be sure that this never has any unintended side effects?

It would be much simpler and more logical to just add a check in shaped_text_width() and return 0.0 if the fragment is empty.

As a potential source of unwanted interaction: In #897 I am adding functionality to MultiLineBreak() so that it can remove any space characters at the start of each line. This is necessary to render converted HTML text correctly.

fpdf/line_break.py

andersonhc · 2023-10-08T11:54:12Z

@SaiHarshaK
I agree with gmischler, changing shaped_text_width() to return 0, 0 when harfbuzz returns None is a much simpler and cleaner solution.

SaiHarshaK · 2023-10-08T14:46:13Z

Got it, that makes sense. will do that and update this PR

SaiHarshaK · 2023-10-08T14:58:52Z

@andersonhc @gmischler Please check now

SaiHarshaK · 2023-10-09T06:52:51Z

@gmischler @andersonhc is there a way to rerun the tests? i think its complaining some existing test took too long than the limit set.

Not sure why it complains only for a specific version of python, since it ran successfully some versions on windows

SaiHarshaK · 2023-10-09T11:26:13Z

Thanks, it seems the reruns got these passed :)

Lucas-C · 2023-10-09T11:29:21Z

@gmischler @andersonhc is there a way to rerun the tests? i think its complaining some existing test took too long than the limit set.

Not sure why it complains only for a specific version of python, since it ran successfully some versions on windows

This is a reccuring problem that we have, tracked in #923

I re-run the pipeline, and everything is ✅ now.

SaiHarshaK · 2023-10-09T11:31:49Z

Can you get this merged @Lucas-C

Lucas-C · 2023-10-09T11:41:21Z

Can you get this merged @Lucas-C

I'll defer to @gmischler on this, as he is much more expert than me on the subject,
and I would not want this PR to conflict with his work in #897.

By the way, are you taking part to Hacktoberfest @SaiHarshaK?
Is it important for you that this PR gets merged or validated with the hacktoberfest-accepted label in October, if possible?

SaiHarshaK · 2023-10-09T12:06:01Z

Yes i am taking part in it, so it would be great if it gets merged.

Maybe the tag itself is sufficient but im not sure though

gmischler · 2023-10-09T18:54:17Z

I'll defer to @gmischler on this, as he is much more expert than me on the subject,
and I would not want this PR to conflict with his work in #897.

Actually, @andersonhc is the real expert here, as I haven't really touched the text shaping stuff.
Neither does #897 touch it, so there won't be any conflict.

andersonhc · 2023-10-11T03:36:35Z

@allcontributors please add SaiHarshaK for code

allcontributors · 2023-10-11T03:36:44Z

@andersonhc

I've put up a pull request to add @SaiHarshaK! 🎉

SaiHarshaK · 2023-10-11T04:35:47Z

Thank you!

Lucas-C · 2023-10-13T11:17:11Z

Thank YOU for your contribution to fpdf2 @SaiHarshaK 🙂

This has been released in 2.7.6

SaiHarshaK added 2 commits October 8, 2023 11:16

new UT and fix mixed_text_preceing_whitespace bug

17ba697

typo

ba4a12b

SaiHarshaK requested a review from gmischler as a code owner October 8, 2023 06:23

SaiHarshaK changed the title ~~Handle preceding whitespaces for line break and add corresponding UT~~ Handle preceding whitespaces for line break Oct 8, 2023

SaiHarshaK changed the title ~~Handle preceding whitespaces for line break~~ Handle preceding whitespaces in write() for line break Oct 8, 2023

add changelog

2e5943f

gmischler requested changes Oct 8, 2023

View reviewed changes

fpdf/line_break.py Outdated Show resolved Hide resolved

SaiHarshaK added 2 commits October 8, 2023 20:20

update format

6b5ec7a

remove unused file

e9f4bc5

gmischler approved these changes Oct 8, 2023

View reviewed changes

andersonhc added the hacktoberfest-accepted label Oct 11, 2023

allcontributors bot mentioned this pull request Oct 11, 2023

add SaiHarshaK as a contributor for code #952

Merged

Update CHANGELOG.md

520191e

andersonhc merged commit e918119 into py-pdf:master Oct 11, 2023
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle preceding whitespaces in write() for line break #947

Handle preceding whitespaces in write() for line break #947

SaiHarshaK commented Oct 8, 2023 •

edited by Lucas-C

Loading

gmischler commented Oct 8, 2023

andersonhc commented Oct 8, 2023

SaiHarshaK commented Oct 8, 2023

SaiHarshaK commented Oct 8, 2023

SaiHarshaK commented Oct 9, 2023

SaiHarshaK commented Oct 9, 2023

Lucas-C commented Oct 9, 2023 •

edited

Loading

SaiHarshaK commented Oct 9, 2023

Lucas-C commented Oct 9, 2023 •

edited

Loading

SaiHarshaK commented Oct 9, 2023

gmischler commented Oct 9, 2023

andersonhc commented Oct 11, 2023

allcontributors bot commented Oct 11, 2023

SaiHarshaK commented Oct 11, 2023

Lucas-C commented Oct 13, 2023 •

edited

Loading

Handle preceding whitespaces in write() for line break #947

Handle preceding whitespaces in write() for line break #947

Conversation

SaiHarshaK commented Oct 8, 2023 • edited by Lucas-C Loading

gmischler commented Oct 8, 2023

andersonhc commented Oct 8, 2023

SaiHarshaK commented Oct 8, 2023

SaiHarshaK commented Oct 8, 2023

SaiHarshaK commented Oct 9, 2023

SaiHarshaK commented Oct 9, 2023

Lucas-C commented Oct 9, 2023 • edited Loading

SaiHarshaK commented Oct 9, 2023

Lucas-C commented Oct 9, 2023 • edited Loading

SaiHarshaK commented Oct 9, 2023

gmischler commented Oct 9, 2023

andersonhc commented Oct 11, 2023

allcontributors bot commented Oct 11, 2023

SaiHarshaK commented Oct 11, 2023

Lucas-C commented Oct 13, 2023 • edited Loading

SaiHarshaK commented Oct 8, 2023 •

edited by Lucas-C

Loading

Lucas-C commented Oct 9, 2023 •

edited

Loading

Lucas-C commented Oct 9, 2023 •

edited

Loading

Lucas-C commented Oct 13, 2023 •

edited

Loading