Indent HTML lists correctly (Issue 1073) #1170

lcgeneralprojects · 2024-05-17T05:14:11Z

Fixes #1073
Implements paragraph indentation via adjustments of pdf.x instead of using a natural number of whitespaces.
Breaks up list items into individual paragraphs.

Checklist:

The GitHub pipeline is OK (green),
meaning that both pylint (static code analyzer) and black (code formatter) are happy with the changes of this PR.
A unit test is covering the code added / modified by this PR
This PR is ready to be merged
In case of a new feature, docstrings have been added, with also some documentation in the docs/ folder
A mention of the change is present in CHANGELOG.md

Not sure how to actually name the HTML2FPDF.list_pseudo_margin attribute. It is used for determining the height of the \n line created when a <ul> or <ol> starting tag is handled.

Should the re-implementation of paragraph indentation be reflected in a doctstring, even though the tag_indents parameter of write_html() has not been touched? If so, where should it be placed?

By submitting this pull request, I confirm that my contribution is made under the terms of the GNU LGPL 3.0 license.

Some debugging and polish needed.

Some variables need tweaking. Needs testing. Code reuse unsatisfactory.

Potentially significant issues with tests: 1. test_html_ln_outside_p - IndexError: list index out of range. 2. test_html_ol_ul_line_height - actual distance between lines differs slightly from expected. Code reuse unsatisfactory.

Need feedback for handling <dd> and <blockquote>. Potentially significant issues with tests: 1. test_html_ol_ul_line_height - actual distance between lines differs slightly from expected. Need feedback for whether or not the new indentation that contradicts old tests is satisfactory. Code reuse unsatisfactory. Need feedback.

Bug present: bullets are made one per line instead of one per paragraph. Saving progress before introducing a `Bullet` class.

Feature implemented. Testing and adjustments of tests needed.

Prevented from `Paragraph.top_margin` being added to `pdf.y` of first lines of paragraphs with bullets.

Prevented from `Paragraph.top_margin` being added to `pdf.y` of first lines of paragraphs with bullets. `<ul>` and `<ol>` tags now cause a creation of a paragraph with the string `\n` being used to generate a fragment of the height `list_pseudo_margin`. Adjusted defaults for `li_tag_indent`.

fpdf/fpdf.py

fpdf/text_region.py

Changed `Paragraph.generate_bullet_frag()` into `generate_bullet_frag_and_tl`, and made it also generate the bullet text line. Dealing with the issue of inappropriately large distance between `<dt>` and their child `<dd>` elements when `Paragraph.top_margin` is 0.

Changed `Paragraph.generate_bullet_frag()` into `generate_bullet_frag_and_tl`, and made it also generate the bullet text line.

# Conflicts: # fpdf/html.py

Adjusted old tests.

gmischler

Nice work so far, but the devil is in the detail...

As I'm sure you've noticed, the interplay between the HTML parser, text regions, line wrapping, and rendering is non-trivial. I've added some pointers of how to fix the parts that don't quite add up yet.

fpdf/line_break.py

fpdf/html.py

fpdf/text_region.py

fpdf/html.py

test/html/test_html.py

…`list_vertical_margin`. Removed the `MultiLineBreak.indent` attribute. Added a test for long `<ol>` bullets.

…ic purposes.

# Conflicts: # CHANGELOG.md

fpdf/html.py

test/html/test_html.py

CHANGELOG.md

test/html/test_html.py

Added `test_html_list_vertical_margin`. Fixed the non-assignment of `HTML2FPDF.list_vertical_margin` when the constructor argument `list_vertical_margin` is not None.

fpdf/text_region.py

test/html/test_html.py

…agraph()`. Updated relevant docstring and `TextRegion.md` documentation. Adjusted `test_bulleted_paragraphs` to remove mentions of `rel_y_displacement`, changed instances of string `"rel_x_displacement"` to `bullet_r_margin` and introduced usage of `case["bullet_r_margin"]` in the test.

gmischler

All right, just two last clean-up items left, and then this will be ready to merge!

Quite an impressive list of fixes in the change log, which resulted in many intricate details to take care about.

docs/TextRegion.md

test/html/test_html.py

lcgeneralprojects · 2024-06-15T11:45:10Z

Not sure how that happened, considering that I was accepting changes from master when merging.

gmischler · 2024-06-15T11:54:55Z

Not sure how that happened, considering that I was accepting changes from master when merging.

I think I've merged #1198 after your last rebase. Just do another one.

gmischler · 2024-06-15T12:46:27Z

Wait, in 69c4107 you seem to have reverted all of your changes to test_html.py. You'll want to restore those, except for test_html_customize_ul().

gmischler · 2024-06-15T16:16:59Z

When you do a rebase and there are conflicts, you usually need to resolve those manually.

The files will contain marked sections that show the differences, and you can chose the right parts for each section.

lcgeneralprojects · 2024-06-15T17:33:46Z

I am aware. I opted to go for blanket-accepting changes from master because I thought that it will just resolve that particular conflict and not just replace the entire file.

Trying to figure out how to re-do the merge without destroying the history. (Although, yeah, I'm aware that I can just copy the relevant stuff and paste it manually. I would still rather try to find out how to deal with situations like these.)

gmischler · 2024-06-15T17:38:19Z

Don't worry about the history. We flatten everything into a single commit anyway when merging a PR.

# Conflicts: # test/html/html_customize_ul.pdf # test/html/test_html.py

Correctly this time.

gmischler · 2024-06-15T20:05:39Z

Merged.

Thanks for the useful contribution, @lcgeneralprojects ! 👍

Lucas-C · 2024-06-17T07:51:26Z

@allcontributors please add @lcgeneralprojects for bug, code

allcontributors · 2024-06-17T07:51:36Z

@Lucas-C

I've put up a pull request to add @lcgeneralprojects! 🎉

andersonhc · 2024-06-17T12:18:23Z

@lcgeneralprojects

Congrats on successfully completing this pull request. This was no small feat! You changed some complex parts of the code and despite the numerous iterations you showed patience and dedication to push through.

Thank you for your hard work and perseverance. Looking forward to more collaborations in the future!

lcgeneralprojects added 13 commits May 5, 2024 05:34

intermediate commit to save progress. Debugging needed.

f794193

Feature mostly implemented.

20c035e

Some debugging and polish needed.

Fixed the issue with indentation of nested lists.

eb93711

Feature implemented.

f8f17a5

Some variables need tweaking. Needs testing. Code reuse unsatisfactory.

Feature implemented.

77a1a31

Some variables need tweaking. Needs testing. Code reuse unsatisfactory.

Feature implemented for <li>.

e80d8d9

Potentially significant issues with tests: 1. test_html_ln_outside_p - IndexError: list index out of range. 2. test_html_ol_ul_line_height - actual distance between lines differs slightly from expected. Code reuse unsatisfactory.

Merge branch 'refs/heads/master' into issue_1073

dbcce1f

Issue mostly fixed.

fb59849

Bug present: bullets are made one per line instead of one per paragraph. Saving progress before introducing a `Bullet` class.

Issue fixed.

bc1fab8

Feature implemented. Testing and adjustments of tests needed.

Changed <ol> bullets to not introduce an extra whitespace.

d487f7d

Added the li_pseudo_marginattribute to HTML2FPDF.

2caa750

Prevented from `Paragraph.top_margin` being added to `pdf.y` of first lines of paragraphs with bullets.

lcgeneralprojects commented May 17, 2024

View reviewed changes

fpdf/fpdf.py Outdated Show resolved Hide resolved

Merge branch 'refs/heads/master' into issue_1073

070a41d

lcgeneralprojects commented May 17, 2024

View reviewed changes

fpdf/text_region.py Outdated Show resolved Hide resolved

lcgeneralprojects added 6 commits May 19, 2024 11:19

Merge branch 'refs/heads/master' into issue_1073

4ab204e

Fixed the inappropriate TextMode importation.

1e1eb29

Changed `Paragraph.generate_bullet_frag()` into `generate_bullet_frag_and_tl`, and made it also generate the bullet text line.

Merge remote-tracking branch 'origin/issue_1073' into issue_1073

dc3d8f8

# Conflicts: # fpdf/html.py

Introduced new test test_html_long_list_entries.

3f56811

Adjusted old tests.

Adjusted Changelog.md and relevant docstrings.

ce7cb9b

lcgeneralprojects marked this pull request as ready for review May 20, 2024 07:37

lcgeneralprojects requested a review from gmischler as a code owner May 20, 2024 07:37

gmischler changed the title ~~Issue 1073~~ Indent HTML lists correctly (Issue 1073) May 20, 2024

gmischler requested changes May 20, 2024

View reviewed changes

gmischler mentioned this pull request May 24, 2024

Nested HTML lists start with a newline #1148

Closed

lcgeneralprojects added 3 commits May 25, 2024 13:55

Changed the name of the relevant variables from list_top_margin to …

24626f9

…`list_vertical_margin`. Removed the `MultiLineBreak.indent` attribute. Added a test for long `<ol>` bullets.

Adjusted html code strings in test_hmtl_long_ol_bullets for aesthet…

208e3b3

…ic purposes.

Merge branch 'refs/heads/master' into issue_1073

8cceb1d

# Conflicts: # CHANGELOG.md

Changes to test_html_measurement_units.

cc7f247

gmischler reviewed Jun 10, 2024

View reviewed changes

fpdf/html.py Outdated Show resolved Hide resolved

gmischler reviewed Jun 10, 2024

View reviewed changes

test/html/test_html.py Outdated Show resolved Hide resolved

gmischler reviewed Jun 10, 2024

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

gmischler mentioned this pull request Jun 10, 2024

Get rid of FPDF class/instance attributes getting passed to HTML2PDF. #1198

Closed

gmischler requested changes Jun 11, 2024

View reviewed changes

test/html/test_html.py Outdated Show resolved Hide resolved

Adjusted CHANGELOG.md.

37a8d81

Added `test_html_list_vertical_margin`. Fixed the non-assignment of `HTML2FPDF.list_vertical_margin` when the constructor argument `list_vertical_margin` is not None.

gmischler reviewed Jun 12, 2024

View reviewed changes

fpdf/text_region.py Outdated Show resolved Hide resolved

gmischler reviewed Jun 12, 2024

View reviewed changes

test/html/test_html.py Outdated Show resolved Hide resolved

lcgeneralprojects and others added 2 commits June 12, 2024 22:05

Merge branch 'master' into issue_1073

d847eb8

gmischler approved these changes Jun 15, 2024

View reviewed changes

docs/TextRegion.md Outdated Show resolved Hide resolved

test/html/test_html.py Outdated Show resolved Hide resolved

lcgeneralprojects added 2 commits June 15, 2024 22:53

Merge branch 'refs/heads/master' into issue_1073

14ddcc4

# Conflicts: # test/html/html_customize_ul.pdf # test/html/test_html.py

Merged changes from master.

691ad2b

Correctly this time.

lcgeneralprojects force-pushed the issue_1073 branch from b880e1d to 691ad2b Compare June 15, 2024 18:04

Update TextRegion.md

2c475ab

gmischler merged commit 1547d4d into py-pdf:master Jun 15, 2024
11 checks passed

allcontributors bot mentioned this pull request Jun 17, 2024

add lcgeneralprojects as a contributor for bug, and code #1208

Merged

gmischler mentioned this pull request Jun 25, 2024

HTML: indent of lists on new line not flush #1212

Open

Lucas-C mentioned this pull request Jun 28, 2024

Split handling of HTML attributes & style CSS properties #1211

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indent HTML lists correctly (Issue 1073) #1170

Indent HTML lists correctly (Issue 1073) #1170

lcgeneralprojects commented May 17, 2024 •

edited

Loading

gmischler left a comment

gmischler left a comment

lcgeneralprojects commented Jun 15, 2024

gmischler commented Jun 15, 2024

gmischler commented Jun 15, 2024

gmischler commented Jun 15, 2024

lcgeneralprojects commented Jun 15, 2024 •

edited

Loading

gmischler commented Jun 15, 2024

gmischler commented Jun 15, 2024

Lucas-C commented Jun 17, 2024

allcontributors bot commented Jun 17, 2024

andersonhc commented Jun 17, 2024

Indent HTML lists correctly (Issue 1073) #1170

Indent HTML lists correctly (Issue 1073) #1170

Conversation

lcgeneralprojects commented May 17, 2024 • edited Loading

gmischler left a comment

Choose a reason for hiding this comment

gmischler left a comment

Choose a reason for hiding this comment

lcgeneralprojects commented Jun 15, 2024

gmischler commented Jun 15, 2024

gmischler commented Jun 15, 2024

gmischler commented Jun 15, 2024

lcgeneralprojects commented Jun 15, 2024 • edited Loading

gmischler commented Jun 15, 2024

gmischler commented Jun 15, 2024

Lucas-C commented Jun 17, 2024

allcontributors bot commented Jun 17, 2024

andersonhc commented Jun 17, 2024

lcgeneralprojects commented May 17, 2024 •

edited

Loading

lcgeneralprojects commented Jun 15, 2024 •

edited

Loading