Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mixed scripts with text shaping failure #902

Closed
gmischler opened this issue Aug 18, 2023 · 6 comments · Fixed by #947
Closed

Mixed scripts with text shaping failure #902

gmischler opened this issue Aug 18, 2023 · 6 comments · Fixed by #947

Comments

@gmischler
Copy link
Collaborator

Error details
When trying to use text shaping with text regions combining different scripts, I tripped over one combination that throws an error.

Minimal code
This code executes in the context of "test_text_shaping.py":

def test_mixed_text_shaping(tmp_path):
    pdf = FPDF()
    pdf.add_page()
    pdf.r_margin = 100

    pdf.add_font(
        family="KFGQPC", fname=HERE / "KFGQPC Uthmanic Script HAFS Regular.otf"
    )
    pdf.set_font("KFGQPC", size=36)
    pdf.set_text_shaping(True)
    pdf.write(txt="مثال على اللغة العربية. محاذاة لليمين.")
    pdf.add_font(family="Mangal", fname=HERE / "Mangal 400.ttf")
    pdf.set_font("Mangal", size=40)
    pdf.write(txt=" इण्टरनेट पर हिन्दी के साधन")

    assert_pdf_equal(pdf, HERE / "text_mixed_text_shaping.pdf", tmp_path)

The result is this:

test\text_shaping\test_text_shaping.py:169: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
fpdf\fpdf.py:218: in wrapper
    return fn(self, *args, **kwargs)
fpdf\fpdf.py:3674: in write
    new_page = self._render_styled_text_line(
fpdf\fpdf.py:2831: in _render_styled_text_line
    unscaled_width = frag.get_width(initial_cs=i != 0)
fpdf\line_break.py:179: in get_width
    (char_len, w) = self.font.get_text_width(
fpdf\fonts.py:199: in get_text_width
    return self.shaped_text_width(text, font_size_pt, text_shaping_parms)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = TTFFont(i=2, fontkey=mangal), text = [], font_size_pt = 40
text_shaping_parms = {'direction': None, 'features': {}, 'language': None, 'script': None, ...}

    def shaped_text_width(self, text, font_size_pt, text_shaping_parms):
        """
        When texts are shaped, the length of a string is not always the sum of all individual character widths
        This method will invoke harfbuzz to perform the text shaping and return the sum of "x_advance"
        and "x_offset" for each glyph. This method works for "left to right" or "right to left" texts.
        """
        _, glyph_positions = self.perform_harfbuzz_shaping(
            text, font_size_pt, text_shaping_parms
        )
        text_width = 0
>       for pos in glyph_positions:
E       TypeError: 'NoneType' object is not iterable

fpdf\fonts.py:212: TypeError

Note the space character in front of the devanagari text. Without that, it runs through.

@SaiHarshaK
Copy link

I'd like to have a go for this issue

@andersonhc
Copy link
Collaborator

I'd like to have a go for this issue

Awesome!

Take a look on our development guidelines and feel free to reach out if you have any question.

@SaiHarshaK
Copy link

I just tried out this code.

Few things I noticed:

  1. when i try to set pdf.set_text_shaping(False) (instead of True) it runs.
  2. When i comment out pdf.write(txt="مثال على اللغة العربية. محاذاة لليمين.") it runs, maybe there is an issue with both arabic and hindi characters here.
  3. Instead of leading whitespace i tried to put pdf.write(txt="\u200Bइण्टरनेट पर हिन्दी के साधन") and it runs.

I think i should try looking at the code flow when pdf.write() happens, and how it ends up at shaped_text_width() call.

@SaiHarshaK
Copy link

Looking at the code, the problem boils down to automatic_break_possible() evaluating to true and the line evalues to empty line - characters=[]

But in the case when the print() is starting from new line (if i insert a pdf.ln() the code passes through); the line evaluates to something like - characters=[' ', 'इ', 'ण', '्', 'ट', 'र', 'न', 'े', 'ट'] ; as it goes via this code

So i think the issue is with it taking a line with no characters, causing styled_txt_width to evaluate to 0 and thereby throw this exception

@andersonhc
Copy link
Collaborator

I guess the empty fragment is the problem. There is only character that doesnt exist in the font so there is nothing to render.
We just need to make sure it proceeds without throwing the exception

@SaiHarshaK
Copy link

My current train of thought is to update add_character(), where if character == SPACE and this is the first space, then im adding a another space into the frag (so that its not empty) before it saves the state into SpaceHint().

I tried this out and it does seem to work (render things), do you think this is an acceptable approach?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants