Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indent HTML lists correctly (Issue 1073) #1170

Merged
merged 45 commits into from
Jun 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
f794193
intermediate commit to save progress. Debugging needed.
lcgeneralprojects May 5, 2024
20c035e
Feature mostly implemented.
lcgeneralprojects May 5, 2024
eb93711
Fixed the issue with indentation of nested lists.
lcgeneralprojects May 6, 2024
f8f17a5
Feature implemented.
lcgeneralprojects May 11, 2024
77a1a31
Feature implemented.
lcgeneralprojects May 11, 2024
e80d8d9
Feature implemented for <li>.
lcgeneralprojects May 12, 2024
9bddeca
Feature implemented for <li>.
lcgeneralprojects May 12, 2024
dbcce1f
Merge branch 'refs/heads/master' into issue_1073
lcgeneralprojects May 13, 2024
fb59849
Issue mostly fixed.
lcgeneralprojects May 16, 2024
bc1fab8
Issue fixed.
lcgeneralprojects May 16, 2024
d487f7d
Changed `<ol>` bullets to not introduce an extra whitespace.
lcgeneralprojects May 16, 2024
2caa750
Added the `li_pseudo_margin`attribute to `HTML2FPDF`.
lcgeneralprojects May 17, 2024
ceaa6d3
Added the `list_pseudo_margin`attribute to `HTML2FPDF`.
lcgeneralprojects May 17, 2024
070a41d
Merge branch 'refs/heads/master' into issue_1073
lcgeneralprojects May 17, 2024
4ab204e
Merge branch 'refs/heads/master' into issue_1073
lcgeneralprojects May 19, 2024
7b8923c
Fixed the inappropriate `TextMode` importation.
lcgeneralprojects May 19, 2024
1e1eb29
Fixed the inappropriate `TextMode` importation.
lcgeneralprojects May 19, 2024
dc3d8f8
Merge remote-tracking branch 'origin/issue_1073' into issue_1073
lcgeneralprojects May 19, 2024
3f56811
Introduced new test `test_html_long_list_entries`.
lcgeneralprojects May 20, 2024
ce7cb9b
Adjusted `Changelog.md` and relevant docstrings.
lcgeneralprojects May 20, 2024
24626f9
Changed the name of the relevant variables from `list_top_margin` to …
lcgeneralprojects May 25, 2024
208e3b3
Adjusted html code strings in `test_hmtl_long_ol_bullets` for aesthet…
lcgeneralprojects May 25, 2024
8cceb1d
Merge branch 'refs/heads/master' into issue_1073
lcgeneralprojects May 25, 2024
bf5f0fa
Added `self.pdf.normalize_text(bullet_string)` to `Paragraph.generate…
lcgeneralprojects May 25, 2024
82fbfda
Added `self.pdf.normalize_text(bullet_string)` to `Paragraph.generate…
lcgeneralprojects May 25, 2024
a75a948
Merge remote-tracking branch 'origin/issue_1073' into issue_1073
lcgeneralprojects May 25, 2024
fc38846
Adjusted handling of `fragment`s in the `Paragraph.generate_bullet_fr…
lcgeneralprojects May 25, 2024
2f69001
Added docstring to `Paragraph`.
lcgeneralprojects May 26, 2024
f7908e4
Used `black` on `html.py`
lcgeneralprojects May 26, 2024
5afb935
Merge branch 'refs/heads/master' into issue_1073
lcgeneralprojects May 27, 2024
4e7118b
Merged changes to the branch `master` into the branch `issue_1073`.
lcgeneralprojects May 27, 2024
1ead6a3
Introduced conversion of 'magic numbers', and default tag indent and …
lcgeneralprojects May 31, 2024
619c250
Merge branch 'master' into issue_1073
lcgeneralprojects Jun 6, 2024
418f213
Introduced unit conversion for `li_tag_indent`.
lcgeneralprojects Jun 6, 2024
c6c8d8b
Updated test files
lcgeneralprojects Jun 6, 2024
18dddd7
Renamed `bullet_rel_x_displacement`, `bullet_rel_y_displacement` and …
lcgeneralprojects Jun 6, 2024
b345631
Undone changes to handling non-default values for `li_tag_indent` and…
lcgeneralprojects Jun 6, 2024
fb3305b
Requested changes to conversion of default values implemented.
lcgeneralprojects Jun 9, 2024
cc7f247
Changes to `test_html_measurement_units`.
lcgeneralprojects Jun 9, 2024
37a8d81
Adjusted `CHANGELOG.md`.
lcgeneralprojects Jun 11, 2024
f6eb85b
Added the 'bullet_r_margin' parameter to `ParagraphCollectorMixin.par…
lcgeneralprojects Jun 12, 2024
d847eb8
Merge branch 'master' into issue_1073
lcgeneralprojects Jun 14, 2024
14ddcc4
Merge branch 'refs/heads/master' into issue_1073
lcgeneralprojects Jun 15, 2024
691ad2b
Merged changes from `master`.
lcgeneralprojects Jun 15, 2024
2c475ab
Update TextRegion.md
gmischler Jun 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,18 @@ This can also be enabled programmatically with `warnings.simplefilter('default',
* feature to identify the Unicode script of the input text and break it into fragments when different scripts are used, improving text shaping results
* [`FPDF.image()`](https://py-pdf.github.io/fpdf2/fpdf/fpdf.html#fpdf.fpdf.FPDF.image): now handles `keep_aspect_ratio` in combination with an enum value provided to `x`
* file names are mentioned in errors when `fpdf2` fails to parse a SVG image
* * feature to adjust spacing before lists via the `HTML2FPDF.list_vertical_margin` attribute
### Fixed
* [`fpdf.drawing.DeviceCMYK`](https://py-pdf.github.io/fpdf2/fpdf/drawing.html#fpdf.drawing.DeviceCMYK) objects can now be passed to [`FPDF.set_draw_color()`](https://py-pdf.github.io/fpdf2/fpdf/fpdf.html#fpdf.fpdf.FPDF.set_draw_color), [`FPDF.set_fill_color()`](https://py-pdf.github.io/fpdf2/fpdf/fpdf.html#fpdf.fpdf.FPDF.set_fill_color) and [`FPDF.set_text_color()`](https://py-pdf.github.io/fpdf2/fpdf/fpdf.html#fpdf.fpdf.FPDF.set_text_color) without raising a `ValueError`: [documentation](https://py-pdf.github.io/fpdf2/Text.html#text-formatting).
* individual `/Resources` directories are now properly created for each document page. This change ensures better compliance with the PDF specification but results in a slight increase in the size of PDF documents. You can still use the old behavior by setting `FPDF().single_resources_object = True`
* line size calculation for fragments when text shaping is used
* fixed incoherent indentation of long list entries - _cf._ [issue #1073](https://github.com/py-pdf/fpdf2/issues/1073)
* default values for `top_margin` and `bottom_margin` in `HTML2FPDF._new_paragraph()` calls are now correctly converted into chosen document units.
### Changed
* Removed an obscure and undocumented [feature](https://github.com/py-pdf/fpdf2/issues/1198) of [`FPDF.write_html()`](https://py-pdf.github.io/fpdf2/fpdf/fpdf.html#fpdf.fpdf.FPDF.write_html), which used to magically pass local variables as arguments.
* [`FPDF.table()`](https://py-pdf.github.io/fpdf2/Tables.html) now raises an error when a single row is too high to be rendered on a single page
* `HTML2FPDF.tag_indents` can now be non-integer. Indentation of HTML elements is now independent of font size and bullet strings.
* No spacing controlled by `HTML2FPDF.list_vertical_margin` is created for nested HTML `<li>` elements in contrast with prior respect for `Paragraph.top_margin` when handling `Paragraph`s created when handling `<ul>` and `<ol>` start tags.

## [2.7.9] - 2024-05-17
### Added
Expand All @@ -46,7 +51,7 @@ This can also be enabled programmatically with `warnings.simplefilter('default',
* a bug when rendering vector images with dashed lines that caused a warning message in Adobe Acrobat Reader
* ordering RTL fragments on bidirectional texts
* fixed type hint of member `level` in class [`OutlineSection`](https://py-pdf.github.io/fpdf2/fpdf/outline.html#fpdf.outline.OutlineSection) from `str` to `int`.
* SVG clipping paths being incorrectly painted - _cf._ [issue #1147](https://github.com/py-pdf/fpdf2/issues/1147)]
* SVG clipping paths being incorrectly painted - _cf._ [issue #1147](https://github.com/py-pdf/fpdf2/issues/1147)
* new translation of the tutorial in [Polski](https://py-pdf.github.io/fpdf2/Tutorial-pl.html) - thanks to @DarekRepos
### Changed
* improved the performance of `FPDF.start_section()` - _cf._ [issue #1092](https://github.com/py-pdf/fpdf2/issues/1092)
Expand Down
7 changes: 5 additions & 2 deletions docs/TextRegion.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,8 +82,11 @@ For more typographical control, you can use the following arguments. Most of tho

* text_align (Align, optional) - The horizontal alignment of the paragraph.
* line_height (float, optional) - factor by which the line spacing will be different from the font height. (default: by region)
* top_margin (float, optional) - how much spacing is added above the paragraph. No spacing will be added at the top of the paragraph if the current y position is at (or above) the top margin of the page. (Default: 0.0)
* bottom_margin (float, optional) - Those two values determine how much spacing is added below the paragraph. No spacing will be added at the bottom if it would result in overstepping the bottom margin of the page. (Default: 0.0)
* top_margin (float, optional) - how much spacing is added above the paragraph. No spacing will be added at the top of the paragraph if the current y position is at (or above) the top margin of the page. (Default: 0.0 mm)
* bottom_margin (float, optional) - Those two values determine how much spacing is added below the paragraph. No spacing will be added at the bottom if it would result in overstepping the bottom margin of the page. (Default: 0.0 mm)
* indent (float, optional): determines the indentation of the paragraph. (Default: 0.0 mm)
* bullet_r_margin (float, optional) - determines the relative displacement of the bullet along the x-axis. The distance is between the rightmost point of the bullet to the leftmost point of the paragraph's text. (Default: 2.0 mm)
* bullet_string (str, optional): determines the fragments and text lines of the bullet. (Default: "")
* skip_leading_spaces (float, optional) - removes all space characters at the beginning of each line.
* wrapmode (WrapMode, optional)

Expand Down
106 changes: 80 additions & 26 deletions fpdf/html.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
from .errors import FPDFException
from .fonts import FontFace
from .table import Table
from .util import int2roman
from .util import int2roman, get_scale_factor

LOGGER = logging.getLogger(__name__)
BULLET_WIN1252 = "\x95" # BULLET character in Windows-1252 encoding
Expand All @@ -34,7 +34,7 @@
"h6": FontFace(color=(150, 0, 0), size_pt=8),
"pre": FontFace(family="Courier"),
}
DEFAULT_TAG_INDENTS = {
DEFAULT_TAG_INDENTS_MM = {
"blockquote": 0,
"dd": 10,
"li": 5,
Expand Down Expand Up @@ -270,8 +270,8 @@ def __init__(
self,
pdf,
image_map=None,
li_tag_indent=5,
dd_tag_indent=10,
li_tag_indent=None,
dd_tag_indent=None,
table_line_separators=False,
ul_bullet_char=BULLET_WIN1252,
li_prefix_color=(190, 0, 0),
Expand All @@ -280,6 +280,7 @@ def __init__(
warn_on_tags_not_matching=True,
tag_indents=None,
tag_styles=None,
list_vertical_margin=None,
**_,
):
"""
Expand All @@ -296,8 +297,11 @@ def __init__(
heading_sizes (dict): [**DEPRECATED since v2.7.9**] font size per heading level names ("h1", "h2"...) - Set tag_styles instead
pre_code_font (str): [**DEPRECATED since v2.7.9**] font to use for <pre> & <code> blocks - Set tag_styles instead
warn_on_tags_not_matching (bool): control warnings production for unmatched HTML tags
tag_indents (dict): mapping of HTML tag names to numeric values representing their horizontal left identation
tag_indents (dict): mapping of HTML tag names to numeric values representing their horizontal left identation.
The indent values are in the chosen pdf document units.
tag_styles (dict): mapping of HTML tag names to colors
list_vertical_margin (float): size of margins that precede lists.
The margin value is in the chosen pdf document units.
"""
super().__init__()
self.pdf = pdf
Expand Down Expand Up @@ -334,8 +338,17 @@ def __init__(
self.align = ""
self.style_stack = [] # list of FontFace
self.indent = 0
self.line_height_stack = []
self.ol_type = [] # when inside a <ol> tag, can be "a", "A", "i", "I" or "1"
self.bullet = []
self.default_conversion_factor = (
get_scale_factor("mm") / self.pdf.k
) # factor for converting default values from mm to document units
if list_vertical_margin is None:
# Default value of 2 to be multiplied by the conversion factor
# for list_vertical_margin is given in mm
list_vertical_margin = 2 * self.default_conversion_factor
self.list_vertical_margin = list_vertical_margin
self.font_color = pdf.text_color.colors255
self.heading_level = None
self.heading_above = 0.2 # extra space above heading, relative to font size
Expand All @@ -352,8 +365,11 @@ def __init__(
# "inserted" is a special attribute indicating that a cell has be inserted in self.table_row

if not tag_indents:
tag_indents = {}
if dd_tag_indent != DEFAULT_TAG_INDENTS["dd"]:
tag_indents = {
k: v * self.default_conversion_factor
for k, v in DEFAULT_TAG_INDENTS_MM.items()
}
if dd_tag_indent is not None:
warnings.warn(
(
"The dd_tag_indent parameter is deprecated since v2.7.9 "
Expand All @@ -364,7 +380,7 @@ def __init__(
stacklevel=get_stack_level(),
)
tag_indents["dd"] = dd_tag_indent
if li_tag_indent != DEFAULT_TAG_INDENTS["li"]:
if li_tag_indent is not None:
warnings.warn(
(
"The li_tag_indent parameter is deprecated since v2.7.9 "
Expand All @@ -376,11 +392,11 @@ def __init__(
)
tag_indents["li"] = li_tag_indent
for tag in tag_indents:
if tag not in DEFAULT_TAG_INDENTS:
if tag not in DEFAULT_TAG_INDENTS_MM:
raise NotImplementedError(
f"Cannot set indent for HTML tag <{tag}> (contributions are welcome to add support for this)"
)
self.tag_indents = {**DEFAULT_TAG_INDENTS, **tag_indents}
self.tag_indents = {**DEFAULT_TAG_INDENTS_MM, **tag_indents}

if not tag_styles:
tag_styles = {}
Expand Down Expand Up @@ -420,7 +436,13 @@ def __init__(
)

def _new_paragraph(
self, align=None, line_height=1.0, top_margin=0, bottom_margin=0
self,
align=None,
line_height=1.0,
top_margin=0,
bottom_margin=0,
indent=0,
bullet="",
):
self._end_paragraph()
self.align = align or ""
Expand All @@ -432,6 +454,8 @@ def _new_paragraph(
skip_leading_spaces=True,
top_margin=top_margin,
bottom_margin=bottom_margin,
indent=indent,
bullet_string=bullet,
)
self.follows_trailing_space = True
self.follows_heading = False
Expand Down Expand Up @@ -545,10 +569,20 @@ def handle_starttag(self, tag, attrs):
parse_style(attrs)
self._tags_stack.append(tag)
if tag == "dt":
self._write_paragraph("\n")
self._new_paragraph(
gmischler marked this conversation as resolved.
Show resolved Hide resolved
line_height=(
self.line_height_stack[-1] if self.line_height_stack else None
),
)
tag = "b"
if tag == "dd":
self._write_paragraph("\n" + "\u00a0" * self.tag_indents["dd"])
self.follows_heading = True
self._new_paragraph(
line_height=(
self.line_height_stack[-1] if self.line_height_stack else None
),
indent=self.tag_indents["dd"] * (self.indent + 1),
)
if tag == "strong":
tag = "b"
if tag == "em":
Expand Down Expand Up @@ -659,38 +693,52 @@ def handle_starttag(self, tag, attrs):
size=tag_style.size_pt or self.font_size,
)
self.indent += 1
self._new_paragraph(top_margin=3, bottom_margin=3)
if self.tag_indents["blockquote"]:
self._write_paragraph("\u00a0" * self.tag_indents["blockquote"])
self._new_paragraph(
# Default values to be multiplied by the conversion factor
# for top_margin and bottom_margin here are given in mm
top_margin=3 * self.default_conversion_factor,
bottom_margin=3 * self.default_conversion_factor,
indent=self.tag_indents["blockquote"] * self.indent,
)
if tag == "ul":
self.indent += 1
bullet_char = (
ul_prefix(attrs["type"]) if "type" in attrs else self.ul_bullet_char
)
self.bullet.append(bullet_char)
line_height = None
if "line-height" in attrs:
try:
# YYY parse and convert non-float line_height values
line_height = float(attrs.get("line-height"))
self.line_height_stack.append(float(attrs.get("line-height")))
except ValueError:
pass
self._new_paragraph(line_height=line_height)
else:
self.line_height_stack.append(None)
if self.indent == 1:
self._new_paragraph(top_margin=self.list_vertical_margin, line_height=0)
self._write_paragraph("\u00a0")
self._end_paragraph()
if tag == "ol":
self.indent += 1
start = int(attrs["start"]) if "start" in attrs else 1
self.bullet.append(start - 1)
self.ol_type.append(attrs.get("type", "1"))
line_height = None
if "line-height" in attrs:
try:
# YYY parse and convert non-float line_height values
line_height = float(attrs.get("line-height"))
self.line_height_stack.append(float(attrs.get("line-height")))
except ValueError:
pass
self._new_paragraph(line_height=line_height)
else:
self.line_height_stack.append(None)
if self.indent == 1:
self._new_paragraph(top_margin=self.list_vertical_margin, line_height=0)
self._write_paragraph("\u00a0")
self._end_paragraph()
if tag == "li":
self._ln(2)
# Default value of 2 for h to be multiplied by the conversion factor
# in self._ln(h) here is given in mm
self._ln(2 * self.default_conversion_factor)
self.set_text_color(*self.li_prefix_color)
if self.bullet:
bullet = self.bullet[self.indent - 1]
Expand All @@ -701,9 +749,14 @@ def handle_starttag(self, tag, attrs):
bullet += 1
self.bullet[self.indent - 1] = bullet
ol_type = self.ol_type[self.indent - 1]
bullet = f"{ol_prefix(ol_type, bullet)}. "
indent = "\u00a0" * self.tag_indents["li"] * self.indent
self._write_paragraph(f"{indent}{bullet} ")
bullet = f"{ol_prefix(ol_type, bullet)}."
self._new_paragraph(
line_height=(
self.line_height_stack[-1] if self.line_height_stack else None
),
indent=self.tag_indents["li"] * self.indent,
bullet=bullet,
)
self.set_text_color(*self.font_color)
if tag == "font":
# save previous font state:
Expand Down Expand Up @@ -902,6 +955,7 @@ def handle_endtag(self, tag):
self.indent -= 1
if tag == "ol":
self.ol_type.pop()
self.line_height_stack.pop()
self.bullet.pop()
if tag == "table":
self.table.render()
Expand Down
Loading
Loading