Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhanced block rendering (floats, collapsing margins...) #299

Merged
merged 10 commits into from
Jul 13, 2019

Conversation

poire-z
Copy link
Contributor

@poire-z poire-z commented Jul 13, 2019

Adds alternative block rendering code, which provides the following features/enhancements over legacy:

  • better/proper handling of forced and unforced page splits (no more break between a top border and the first text line)
  • collapsing vertical margins
  • "auto" and negative vertical and horizontal margins
  • block widths and heights (traditional or W3C box model)
  • support for floats and clear

(Some previous discussions about this happened in #234 (comment) and #294.)

Available features are toggable with the following flags (lvrend.h):

// Enhanced rendering flags
ENHANCED
ALLOW_PAGE_BREAK_WHEN_NO_CONTENT   // Allow consecutive page breaks when only separated
                                   // by margin/padding/border.
// Vertical margins
COLLAPSE_VERTICAL_MARGINS          // Collapse vertical margins
ALLOW_VERTICAL_NEGATIVE_MARGINS    // Allow individual negative margins in the calculation, the
                                   // final collapsed margin is ensure to be zero or positive.
ALLOW_NEGATIVE_COLLAPSED_MARGINS   // Allow the final vertical collapsed margin to be negative
                                   // (may mess with page splitting and text selection).
// Horizontal margins
ENSURE_MARGIN_AUTO_ALIGNMENT       // Ensure CSS "margin: auto", for aligning blocks.
ALLOW_HORIZONTAL_NEGATIVE_MARGINS  // Allow negative margins (otherwise, they are set to 0)
ALLOW_HORIZONTAL_BLOCK_OVERFLOW    // Allow block content to overflow its block container.
ALLOW_HORIZONTAL_PAGE_OVERFLOW     // Allow block content to overflow the page rect, showing
                                   // in the margin, and possibly clipped out.
// Widths and heights
USE_W3C_BOX_MODEL                  // Use W3C box model (CSS width and height do not include
                                   // paddings and borders)
ALLOW_STYLE_W_H_ABSOLUTE_UNITS     // Allow widths and heights in absolute units (when ensured)
ENSURE_STYLE_WIDTH                 // Ensure CSS widths and heights on all elements (otherwise
ENSURE_STYLE_HEIGHT                // only on <HR> and images, and when sizing floats).
// Floats
WRAP_FLOATS                        // Wrap floats in an internal floatBox element.
PREPARE_FLOATBOXES                 // Avoid style hash mismatch when toggling FLOAT_FLOATBOXES,
                                   // but make embedded floats inline when no more floating.
FLOAT_FLOATBOXES                   // Actually render floatBoxes floating.
// These 2, although allowing a more correct rendering of floats, can impact drawing performances and text/links selection:
DO_NOT_CLEAR_OWN_FLOATS            // Prevent blocks from clearing their own floats.
ALLOW_EXACT_FLOATS_FOOTPRINTS      // When 5 or less outer floats have impact on a final
                                   // block, store their ids instead of the 2 top left/right
                                   // rectangle, allowing text layout staircase-like.
// Enable everything
FULL_FEATURED

We'll provide 4 sets of features from frontend code, currently set as:

--                                               legacy flat book web
-- ENHANCED                           0x00000001          x    x   x
-- ALLOW_PAGE_BREAK_WHEN_NO_CONTENT   0x00000002                   x
--
-- COLLAPSE_VERTICAL_MARGINS          0x00000010          x    x   x
-- ALLOW_VERTICAL_NEGATIVE_MARGINS    0x00000020          x    x   x
-- ALLOW_NEGATIVE_COLLAPSED_MARGINS   0x00000040                   x
--
-- ENSURE_MARGIN_AUTO_ALIGNMENT       0x00000100               x   x
-- ALLOW_HORIZONTAL_NEGATIVE_MARGINS  0x00000200                   x
-- ALLOW_HORIZONTAL_BLOCK_OVERFLOW    0x00000400                   x
-- ALLOW_HORIZONTAL_PAGE_OVERFLOW     0x00000800                   x
--
-- USE_W3C_BOX_MODEL                  0x00001000          x    x   x
-- ALLOW_STYLE_W_H_ABSOLUTE_UNITS     0x00002000                   x
-- ENSURE_STYLE_WIDTH                 0x00004000               x   x
-- ENSURE_STYLE_HEIGHT                0x00008000                   x
--
-- WRAP_FLOATS                        0x00010000          x    x   x
-- PREPARE_FLOATBOXES                 0x00020000          x    x   x
-- FLOAT_FLOATBOXES                   0x00040000               x   x
-- DO_NOT_CLEAR_OWN_FLOATS            0x00100000               x   x
-- ALLOW_EXACT_FLOATS_FOOTPRINTS      0x00200000               x   x

local BLOCK_RENDERING_FLAGS = {
    0x00000000, -- legacy block rendering
    0x00030031, -- flat mode (with prepared floatBoxes, so inlined, to avoid display hash mismatch)
    0x00375131, -- book mode (floating floatBoxes, limited widths support)
    0x7FFFFFFF, -- web mode, all features/flags
}

Many comments about the specs and observations in the code.
A few more here:

There are 2 different sections of code for positionning floats, because there are two contexts for floats:

  • block floats: DIVs among DIVs, that don't immediately interact with any text.
  • "embedded" floats: SPANs inside paragraph/formatted text, that do interact with previous laid out text and upcoming text, and whose positionning may need to be delayed.

Given how crengine works:

  • Block floats, once rendered, are absolutely positionned and their x/y saved in the cache.
  • Final blocks (paragraphs of text) are too, but everytime they are used, for drawing, searching for or selecting text or links, they are formatted again. And so should embedded floats, expecting reproducible results between the first rendering and subsequent ones.

Initially, I had two limitations that made things simple:

  • blocks and paragraph did "clear" their own floats, so no overflow and drawing was straightforward.
  • the footprint of upper block floats over a final paragraph block was simply 2 rectangles, one at top left, one at top right, so possibly making blank holes in the text when multiple small outer floats are merged into a single bounding rectangle. There footprints need to be saved in the cache so they are known when that paragraph is formatted again.

Killing these 2 limitations needed more complex code (that makes drawing and selection less straightforward), that can be enabled with DO_NOT_CLEAR_OWN_FLOATS and ALLOW_EXACT_FLOATS_FOOTPRINTS. There should be no reason after that to want floats without these :) but keeping them as toggable may help discriminating which feature code is at play when we'll be debugging.

The following screenshot illustrates these limitations, as the 2nd one is still there when there are more than 5 floats involved (1st part, with in red, the footprint rectangles) - but not when there are 5 or less (2nd part):

float_footprints

Everything was checked against Firefox (and Prince for page breaks), and we render mostly everything quite similarly - including floats within floats or table cells!:

image

There are still some little things we don't do according to the specs in some edge cases:

  • when floats are involved, we may emit too early the collapsed vertical margin (because we need outer floats/inner text relation to be fixed when rendering the text)
  • when some block container contains only floats, and so get themselves a 0-height, we may push early some vertical margins, and drop subsequent involved block margins until some real non-float content is emited.
  • after a forced page break, Prince does not keep the top margin (of a H1 for example - which looks better to me), while the specs say we should. So we do keep it on the page.
  • negative collapsed vertical margins screw text selection
  • in-page footnotes will split a page in the middle of a float (while we try to avoid that as much as we can - although if there is a bunch of floats stacked above each other, and their boundaries does not coincide with surrounding text lines boundaries, some may be cut to split pages).

I'll update Wikipedia EPUB stylesheet to have a different set of styles depending on the block rendering mode, so we can decide to have thumbnails floating or not, with a fixed width or not, which looks pretty neat:

wikipedia1

Thumbnails galleries can be made floating, but it can look a bit ugly (but that's how floats work :)

wikigallery2

Would be better to use display: inline-block for these, which shouldn't be too hard to implement after all this.

Should allow closing:
koreader/koreader#2652 Floating image with caption does not respect width/float
koreader/koreader#2843 CSS auto margins are ignored, cannot center element
koreader/koreader#2858 CSS: Margins do not collapse properly
koreader/koreader#2878 Negative margins are ignored
koreader/koreader#3432 Some CSS weaknesses

Sharing my huge test-float-misc.html with many test cases (as a HTML file + images, and not an EPUB, so we can easily load it in browsers for comparison): test-float-misc.zip

(The "EBR5/9 Implement enhanced renderBlockElement" commit diff is a bit ugly because it mixes DrawBorder (that I didn't touch) and my FlowState/RenderBlockElementEnhanced code - better to just read the no-diff added ccode to lvrend.cpp starting with FlowState).

Dumping a few URLs I kept for reference, or just had a look at for early inspiration:

Page breaks:
https://www.w3.org/TR/CSS2/page.html#allowed-page-breaks
https://www.w3.org/TR/CSS22/page.html#page-breaks
https://www.w3.org/TR/css-break-3/
Previous crengine work on page breaks: #33 #40 #49

Collapsing margins:
https://www.w3.org/TR/CSS21/box.html#collapsing-margins
https://www.w3.org/TR/CSS21/visudet.html#Computing_widths_and_margins
https://www.w3.org/TR/css-break-3/#break-margins
https://www.w3.org/TR/css-break-4/#propdef-margin-break
https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Box_Model/Mastering_margin_collapsing
https://www.princexml.com/forum/topic/462/margin-collision-and-page-break-before not always appreciated

Floats:
https://pavpanchekha.com/blog/css-floats.html
https://webkit.org/blog/118/webcore-rendering-v-floats/
http://book.mixu.net/css/1-positioning.html
https://bugzilla.mozilla.org/show_bug.cgi?id=630181
Other float handling implementations:
https://github.com/philborlin/CSSBox/blob/master/src/main/java/org/fit/cssbox/layout/FloatList.java
https://github.com/silexlabs/Cocktail/blob/master/cocktail/core/floats/FloatsManager.hx
https://github.com/litehtml/litehtml html_tag.cpp types.h
http://source.netsurf-browser.org/netsurf.git/tree/content/handlers/html/layout.c

By keeping track of static buffer being used by a upper LVFormatter,
and using dynamic buffers in re-entrant calls if it is.
Adds BLOCK_RENDERING_* flags and macros to allow
enabling/disabling various block rendering features.
Saved in a global gRenderBlockRenderingFlags variable,
made part of the global hash so a re-rendering is
triggered on change.

Main available features/enhancements over legacy:
- better/proper handling of forced and unforced page splits
  (no more break between a top border and the first text line)
- collapsing vertical margins
- "auto" and negative vertical and horizontal margins
- block widths and heights (traditional or W3C box model)
- support for floats and clear
Also add support for CSS3 "break-before" property names
(synonym of the CSS2 "page-break-before") and friends.
Also enable parsing negative margins.
initNodeRendMethod() and lvrend.cpp setNodeStyle():
handle floats, wrap them in a <floatBox> element, update
the ->float_ and ->display property if needed, and decide
the appropriate rendering method of nodes.

getNodeListMarker(): handle possible new floatBox between UL > LI
lvstsheet.cpp: skip floatBox nodes (like autoBoxing nodes are) in
parent/ancestor rules checking.
It was storing the border box top left position (relative to
parent container border box), and its width and height.
Add slots to also store:
- various useful flags.
- for erm_final nodes, the inner content box top and left
  position (relative to this node border box) and width.
- some extra slots to store outer floats footprint over
  an erm_final node.
- top and bottom overflows (when inner floats overflow
  this node's box), for erm_final and erm_block nodes.

These will help rendering, drawing, and elementFromPoint()
and createXPointer(pt) alike functions.

This increases RenderRectAccessor size by a factor of 4, but
it seems to compress really well as most of these slots will
stay zero in regular nodes when no floats are involved.
Renamed renderBlockElement() to renderBlockElementLegacy().
Added new renderBlockElementEnhanced() with the new block
rendering codes, tunable with BLOCK_RENDERING_* flags.
Added a wrapper renderBlockElement() to decide which of
renderBlockElementLegacy() or renderBlockElementEnhanced()
to use depending on gRenderBlockRenderingFlags.

getRenderedWidths(): handle measurement rules depending
on selected BLOCK_RENDERING_* flags.

Added a FlowState object (block formatting context manager) to
be used by renderBlockElementEnhanced() that deals with:
- vertical margins collapsing.
- block floats positionning and clearing, and float overflows
  of their container.
- allow or not page splitting (in these margins and regular text
  lines) by being a proxy to lvpagesplitter's context.AddLine().
- transfer of block floats footprints to erm_final nodes formatted
  by lvtextfm.
- transfer of overflowing embedded floats from lvtextm to main flow

lvtextfm.cpp: accept BlockFloatFootprint (implementation is part
of next commit).

lvpagesplitter.cpp:
- handle lines fully or partialy backward (can happen when negative
  collapsed margins are enabled).
- handle new RN_SPLIT_DISCARD_AT_START flag for margins after an
  unforced break.

ldomNode::renderFinalBlock(): store/restore BlockFloatFootprint
in RenderRectAccessor, as this function is called on many occasions
after rendering.
ldomNode::getAbsRect(): adds inner= parameter to optionally get the
abs rect of the content box instead of the border box.

Tables: some tweaks when called from enhanced_rendering.

Fix (in legacy and enhanced modes): when list-style-position=inside
and marker propagated to first erm_final child, store the LI node
index in RenderRectAccessor so it is cached (unlike previous hack,
which wasn't cached, so markers weren't displayed on next book loads
from cache).
Handle embedded floats (floats defined inside erm_final nodes, whose
positionning interacts with surrounding text), and <BR clear=>.

Handle BlockFloatFootprint (passed from RenderBlockElementEnhanced,
and stored in RenderRectAccessor) when formatting or reformatting
from cache.

alignLine(): small rewrite because of floats.
DrawDocument(): draw blocks with overflows (overflowing their
container), possibly switching to 2-steps drawing (background
first, then content) if needed.

LFormattedText::Draw(): handle drawing of embedded floats,
with adjusted native selection marks (when text selection
is in progress).
Update the various methods used to get a XPointer to the
element at a screen/document point/page, or a screen Rect
from a XPointer/XRange, to handle and navigate thru floats
and boxes' overflows:

    LVDocView::getPageDocumentRange()
    LVDocView::getBookmark()
    ldomDocument::createXPointer(lvPoint)
    ldomXPointer::getRect()
    ldomXRange::getRectEx()
    ldomNode::elementFromPoint(lvPoint)

Make them simpler in enhanced rendering mode by using
the saved content box fields in RenderRectAccessor
(fmt.getInnerWidth() and friends).

For ldomNode::elementFromPoint(): alternate logic when in
enhanced rendering mode because of collapsed margins, also
checking in bottom overflow before checking next sibling.
New private CSS property "-cr-only-if:" to activate next
properties depending on gRenderBlockRenderingFlags features
or document type.
Conditions can be AND'ed with:
    -cr-only-if: legacy epub-document;
A condition can be negated by prefixing it with a hyphen:
    -cr-only-if: -epub-document;
It applies until the end of the rule, or until
a new "-cr-only-if:" is met.

Update epub.css:
- remove "text-align: center" from H1...H6, because it looks
  like publisher rarely specify/reset it to "text-align: left"
  even if they obviously want that (centering can be forced
  with a style tweak).
- remove "text-align: justify" from P: this should just be
  inherited (eg. from a <CENTER> container), possibly from
  our BODY "text-align: justify".
- remove "page-break-before: always" on headings (H1...H6)
  in EPUB documents (because publishers don't expect it and
  most reading software only break page on DocFragment).
  Keep it on H1 only on non-EPUB documents (and keep it
  on H2 and H3 in legacy rendering mode).
- remove horizontal margins on TABLE as a table with
  "width: 100%" would otherwise overflow block/page.
- add clear/float support for some tags/attributes like:
  <BR clear=all>, <IMG align=left>, <TABLE align=right>.
Copy link
Member

@Frenzie Frenzie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most impressive.

/* Don't break page on headings in EPUBs: publishers may not expect it,
* as most EPUB renderers only break page on a new DocFragment. */
h1 {
-cr-only-if: -epub-document; /* only if NOT EPUB document */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This faux-property applies to the entire selector?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From last commit's message :)

New private CSS property "-cr-only-if:" to activate next
properties depending on gRenderBlockRenderingFlags features
or document type.
Conditions can be AND'ed with:
    -cr-only-if: legacy epub-document;
A condition can be negated by prefixing it with a hyphen:
    -cr-only-if: -epub-document;
It applies until the end of the rule, or until
a new "-cr-only-if:" is met.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can combine them into a single selector.
For example, I'm experimenting with this for Wikipedia EPUBs:

body > div > div.thumb {
    /* Allow main thumbnails to float (only the main
     * ones, not the deeper ones in gallery, which is
     * complicated to get right */
    float: right !important;
    /* Change some of their styles when floating */
    -cr-only-if: float-floatboxes;
        clear: right;
        margin:  0em 0em 0.2em 0.5em !important;
        font-size: 80% !important;
    /* Floats' inner elements' widths would still be used to
     * size this float - but if none, text length would be.
     * Ensure we have a fixed width for thumbnails when not
     * in "web" mode */
    -cr-only-if: float-floatboxes -allow-style-w-h-absolute-units;
        width: 33% !important; /* or 40% ? */
}

@poire-z
Copy link
Contributor Author

poire-z commented Jan 4, 2020

And only now I find this wonderful site with tons of small HTML files with sample CSS cases and documentation - that we can feed to KOReader to see how well it does!
https://www.brunildo.org/test/

@Frenzie
Copy link
Member

Frenzie commented Jan 4, 2020

I'd imagine those are all part of the W3C test suites these days though. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants