Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

While using Unicode fonts if markdown in multi_cell is on, then the rendering of muliti_cell in a subsequent line is broken #359

Closed
nova-03 opened this issue Mar 8, 2022 · 20 comments · Fixed by #384

Comments

@nova-03
Copy link

nova-03 commented Mar 8, 2022

This code renders a "broken" table if Unicode fonts are used:

    data = (
            ("First name", "Last name", "Age", "City"),
            ("Jules", "Smith", "34", "San Juan"),
               )
    l_height = pdf.font_size * 1.2
    cell_width = pdf.epw / 4

    pdf.set_fill_color(240)

    for j, row in enumerate(data):
        for cell in row:
            if j == 0:
                pdf.multi_cell(cell_width, l_height, '**' + cell + '**', border=1, fill=True, align='L', ln=3, max_line_height=l_height, markdown=True)
            else:
                pdf.multi_cell(cell_width, l_height, cell, border=1, align='L', ln=3, max_line_height=l_height)
        pdf.ln()

If you set "markdown=False" in the code above, the rendering is not broken any more.

This bug emerged after some latest changes in the master branch.

I am using Python 3.9.2 with the latest version of the master fpdf2 branch (2.5.1)

@nova-03 nova-03 added the bug label Mar 8, 2022
@nova-03 nova-03 changed the title If markdown in multi_cell is on, then the rendering of muliti_cell in a subsequent line is broken While using Unicode fonts if markdown in multi_cell is on, then the rendering of muliti_cell in a subsequent line is broken Mar 9, 2022
@gmischler
Copy link
Collaborator

That only seems to happen when the txt argument to multi_cell ends with a markdown item, like "xyz **abc**".
With just one other trailing character ("**abc** ") it will work correctly.

Apparently the character mapping algorithm "forgets" to switch back to the regular font after it is done, so the next invocation will look up the characters in the wrong set.
But I don't really understand that part, so this is where my analysis ends... 😉

@nova-03
Copy link
Author

nova-03 commented Mar 10, 2022

Thanks @gmischler for taking a look at this.

It seems to me that this problem actually has two parts:

Apparently the character mapping algorithm "forgets" to switch back to the regular font after it is done,
so the next invocation will look up the characters in the wrong set.

This is the first part which occurs also with non Unicode fonts.
When I run the code above with Helvetica font set, I get:
image
where the second line is also rendered bold although it shouldn't be.

The second part of the problem is displaying wrong characters.
When I run the code above with Arial (Unicode font) set like this:

    pdf.add_font('Arial', '', 'arial.ttf', True)
    pdf.set_font('Arial', '', 10)

I get:
image
Where not only that the second line is also rendered bold but the characters are completely random.

I hope these details help a bit.

@gmischler
Copy link
Collaborator

I think I found a solution, although I'm not sure if it's the correct one.

If we give the contents of the text handling part of ._render_styled_cell_text() their own graphics state stack level, then the symptoms go away.

        ... # ca. line 2185
        s_width, underlines = 0, []
        if text_line.fragments:
            s += " q "  # issue #359 <<<<<<
            if align == "R":
                dx = w - self.c_margin - styled_txt_width

        ...
        ...
        ...

                    link,
                )
            s += " Q "  # issue #359 <<<<<<
        if s:
            self._out(s)
        self.lasth = h
        ... # ca. line 2320

The " q ... Q " nearby issued depending on self.fill_color (#348?) may or may not be still needed after that.
The change would invalidate practically all reference PDFs containing text, all of which then need to be checked for unchanged appearance.
Maybe there's a less intrusive solution by checking first if there are any style changes in the fragments.
Or maybe this just hides the symptoms, and the real problem is located somewhere else entirely...

@Lucas-C
Copy link
Member

Lucas-C commented Mar 11, 2022

Thank you two for your investigations!

The change you suggested @gmischler would increase by 4 bytes the size of any text line rendered by fpdf2:
this is not ideal, but it its a solution that can be considered.

Is the next step here to submit a PR?

@gmischler
Copy link
Collaborator

gmischler commented Mar 11, 2022

The change you suggested @gmischler would increase by 4 bytes the size of any text line rendered by fpdf2:

We'd have to check if we can do it more selectively, and/or if we can remove the other q/Q pair from #348 in exchange.
So the 4 bytes are only the worst case scenario.
They are also a lot cheaper than explicitly resetting the font style to regular after each line...

Is the next step here to submit a PR?

I can include it in my currently open one... 😉

@Lucas-C
Copy link
Member

Lucas-C commented Mar 11, 2022

So the 4 bytes are only the worst case scenario.

Alright, but the "worst case scenario" is also the most common one...
Best case = user was setting the fill_color - Worst case = any other situation!

Also, practically speaking, all PDF reference tests files containing text will have to be updated.
I'd really prefer this to happen in a dedicated PR.
In fact, I'd really prefer that we find a better solution 🙂
I'll try to get a look at it next week.

@gmischler
Copy link
Collaborator

I'd really prefer that we find a better solution...

On closer inspection, there are several aspects of _render_styled_cell_text() that could be improved:

  1. Nitpick A: In favour of readability, I'd prefer to rename it to _render_styled_text_line().
  2. Nitpick B: In the section using self.ws, the resulting "# Tw" is inconsistently sent to the PDF directly via self._out() instead of adding it to s. Fixing that will change the sequence of PostScript commands in a small number of test files.
  3. Nitpick C: If we want the code to be efficient, we should not use s += "..." all the time, but rather add the substrings to a list and do a s = ' '.join(sl) at the end.
  4. I'm not sure why self.current_font and self.font_style etc. are ever changed here, since the changed values are only used locally (as far as I can see).
  5. If the very last fragment of the line has any font/style other than the original settings, or if we had to change the fill color, we can use "q ... Q" on the whole line to prevent leakage.
  6. Alternatively, under the same conditions, we can explicitly reset to the original settings at the end (which probably takes more space in the PDF).
  7. After looping over the fragments, there currently are two checks that can both independently add f" /F{self.current_font['i']} {self.font_size_pt:.2f} Tf" to the PDF. They both look like seperate attempts to fix the same leaking problem we're discussing here. They will likely become redundant if we implement either 5. or 6. I also don't understand why the second one is made dependent on text color.
  8. Somewhat related: After the refactor to introduce MultiLineBreak(), I don't quite understand why _multi_cell(split_only=True) still calls _render_styled_cell_text(). Simply skipping that would eliminate the need for _markdown_leak_end_style and most of the other cleanup there.

@gmischler
Copy link
Collaborator

gmischler commented Mar 16, 2022

Except for 6., I implemented/fixed all the points from the above list (based on #350), and found that the result works quite nicely, without any leaked formatting or random characters:

grafik

With just 5. and 8., there are 14 PDF files that this would change. Some of them get slightly larger by the added "q ... Q", some slightly smaller due to eliminated font commands, but in most cases it just changes the sequence of PostScript commands. Point 2. affected some more files, because the "Tw #" commands will end up in a different position, bringing the total to 27.
All in all, I don't see a general trend towards increased file sizes, other than where there was necessary information missing from the output, resulting in incorrect rendering.

The code ended up a bit shorter and much more straightforward than the current version, because several special case kludges became unnecessary. Some parts were a bit tricky because fpdf carries around an unhealthy amount of global state. I circumvented that where necessary, which makes the solution slightly less elegant than it otherwise could have been.

Still, adding this directly to #350 would be a lot of feature creep, so I'll put it to the side for the moment. I already have two open PRs after all...

@Lucas-C
Copy link
Member

Lucas-C commented Mar 19, 2022

Thank you @gmischler for those excellents suggestions!

Nitpick C: If we want the code to be efficient, we should not use s += "..." all the time, but rather add the substrings to a list and do a s = ' '.join(sl) at the end.

I'm really not sure about that... I think calling self._out should actually be faster, because it uses a bytearray internally. cf. issue #164 in reingart/pyfpdf.

You're right, we just get your already-open PRs merged first.

@nova-03
Copy link
Author

nova-03 commented Mar 19, 2022

Thank you both @gmischler and @Lucas-C for working on this issue!

Although we have a workaround, I'm still interested in getting this issue fixed since after this bug is being provoked also the unbreakable() does not work in our code. (There is no exception thrown but output() generates an empty document. We tried to create a trivial code example with this outcome, but had no success. - so no bug report on this.)

Can you please give me a rough estimation when we will have the fix in the master branch?

@Lucas-C
Copy link
Member

Lucas-C commented Mar 20, 2022

the unbreakable() does not work in our code. There is no exception thrown but output() generates an empty document. We tried to create a trivial code example with this outcome, but had no success. - so no bug report on this

Damn. If ever you can get some minimal code reproducing this, I'd be curious to see it

Can you please give me a rough estimation when we will have the fix in the master branch?

I'd say within a month or so.

@nova-03
Copy link
Author

nova-03 commented Mar 20, 2022

the unbreakable() does not work in our code. There is no exception thrown but output() generates an empty document. We tried to create a trivial code example with this outcome, but had no success. - so no bug report on this

Damn. If ever you can get some minimal code reproducing this, I'd be curious to see it

I've finally succeeded to create a minimal example - which uses built-in font and has markdown set to False:

    pdf = FPDF('P', 'mm', 'A4')
    pdf.set_auto_page_break(True, 20)
    pdf.add_page()

    pdf.set_font('Helvetica', '', 10)

    data = (
        ("First name", "Last name", "Age", "City"),
        ("Jules", "Smith", "34", "San Juan"),
        ("Mary", "Ramos", "45", "Orlando"),
        ("Carlson", "Banks", "19", "Los Angeles"),
        ("Lucas", "Cimon", "31", "Angers"),
        ("Jules", "Smith", "34", "San Juan"),
        ("Mary", "Ramos", "45", "Orlando"),
        ("Carlson", "Banks", "19", "Los Angeles"),
        ("Lucas", "Cimon", "31", "Angers"),
        ("Jules", "Smith", "34", "San Juan"),
        ("Mary", "Ramos", "45", "Orlando"),
        ("Carlson", "Banks", "19", "Los Angeles"),
        ("Lucas", "Cimon", "31", "Angers"),
        )
    l_height = pdf.font_size * 1.2
    cell_width = pdf.epw / 4
    no_of_lines_list = []

    for row in data:
        max_no_of_lines_in_cell = 1
        for cell in row:
            result = pdf.multi_cell(cell_width, l_height, cell, border=1, align='L', ln=3, max_line_height=l_height, split_only=True)
            no_of_lines_in_cell = len(result)
            if no_of_lines_in_cell > max_no_of_lines_in_cell:
                max_no_of_lines_in_cell = no_of_lines_in_cell
        no_of_lines_list.append(max_no_of_lines_in_cell)

    for j, row in enumerate(data):
        cell_height = no_of_lines_list[j] * l_height
        for cell in row:
            if j == 0:
                pdf.multi_cell(cell_width, cell_height, '**' + cell + '**', border=1, fill=False, align='L', ln=3, max_line_height=l_height, markdown=False)
            else:
                pdf.multi_cell(cell_width, cell_height, cell, border=1, align='L', ln=3, max_line_height=l_height)
        pdf.ln(cell_height)

    pdf.ln()

    with pdf.unbreakable() as pdf:
        for i in range(4):
            for row in data:
                max_no_of_lines_in_cell = 1
                for cell in row:
                    result = pdf.multi_cell(cell_width, l_height, cell, border=1, align='L', ln=3, max_line_height=l_height, split_only=True)
                    no_of_lines_in_cell = len(result)
                    if no_of_lines_in_cell > max_no_of_lines_in_cell:
                        max_no_of_lines_in_cell = no_of_lines_in_cell
                no_of_lines_list.append(max_no_of_lines_in_cell)

            for j, row in enumerate(data):
                cell_height = no_of_lines_list[j] * l_height
                for cell in row:
                    if j == 0:
                        pdf.multi_cell(cell_width, cell_height, '**' + cell + '**', border=1, fill=False, align='L', ln=3, max_line_height=l_height, markdown=False)
                    else:
                        pdf.multi_cell(cell_width, cell_height, cell, border=1, align='L', ln=3, max_line_height=l_height)
                pdf.ln(cell_height)

    return pdf.output()

The code above works fine if the table in the unbreakable context has been rendered only three or less times.

Can you please give me a rough estimation when we will have the fix in the master branch?

I'd say within a month or so.

Great, thank you.

@Lucas-C
Copy link
Member

Lucas-C commented Mar 22, 2022

Thank you for the reproducing case @nova-03

I found a fix regarding this issue with unbrekable(): #371

Let's jump to this other thread if any more discussion is needed, and keep this issue about the original Markdown-related problem.

@nova-03
Copy link
Author

nova-03 commented Mar 22, 2022

Thank you for the reproducing case @nova-03

You are welcome.
Thank you for taking your time to look at this and fix it.

I found a fix regarding this issue with unbrekable(): #371

Great!

Let's jump to this other thread if any more discussion is needed, and keep this issue about the original Markdown-related problem.

Agree. Actually I also wanted to open a separate issue for this, but was not sure what is your preference and if it is closely tied to markdwon problem since this is where we discovered it. :)

@Lucas-C
Copy link
Member

Lucas-C commented Apr 12, 2022

The PR fixing this has just been merged

@nova-03
Copy link
Author

nova-03 commented Apr 12, 2022

I can confirm, that the fix in the PR works great. Thank you both @Lucas-C and @gmischler!

But the PDF version is now set back to 1.3.
Has this been done deliberately, or is it a side effect of the merge?

@Lucas-C
Copy link
Member

Lucas-C commented Apr 12, 2022

This is the default version of PDFs producedby fpdf2.
You can change it through the .pdf_version property

@nova-03
Copy link
Author

nova-03 commented Apr 13, 2022

Yes, that's what I've thought, but when I implement a header which contains a SVG, then the version is automatically set to 1.4. If a header contains a PNG, then the version is left to 1.3.
Is this a feature or a bug?

Here the simple implementation of a header in a subclass:

class PDF(FPDF, HTMLMixin):

    def header(self):
        self.set_xy(100, 10)
        #self.image('fa_app/static/logo.png', 165, 5, 40, 0, '', 'https://test.com')
        self.image('fa_app/static/logo.svg', 165, 5, 40, 0, '', 'https://test.com')
        self.set_y(20)

@Lucas-C
Copy link
Member

Lucas-C commented Apr 13, 2022

Oh, OK, I understand.

Whenever a drawing_context is used, we bump the minimal PDF version to 1.4:
https://github.com/PyFPDF/fpdf2/blob/2.5.2/fpdf/fpdf.py#L1012

The is because the drawing API makes use of features (notably transparency and blending modes) that were introduced in PDF 1.4.

You can still reset the document PDF version to 1.3 if you wish, before calling .output()

@nova-03
Copy link
Author

nova-03 commented Apr 13, 2022

Thanks for making this clear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants