Better `span.markdown-embed` parity with native Obisidan print to pdf #180

dleeftink · 2024-04-17T16:56:52Z

This took a while to track down, after noticing my @media print { ... } css files where rendering differently using your plugin compared to Obsidian's native print functionality.

Issue: broken block contexts

A better (but still not ideal) solution to have span.markdown-embeds behave similarly to native Obsidian, requires nested block-level elements to be inserted after the document.body.innerHTML has been set in line 206. Namely, certain block level elements (divs, blockquotes, etc.) force a break in the block context of a parent element (most notably <p>) which messes up the resulting HTML if set using document.body.innerHTML:

<p>
  Hello
  <span class="internal-embed">
    <div>broken</div>
  </span>
  world
</p>

Leads to the following output with orphaned nodes when parsed by .innerHTML:

<p>
  Hello
  <span class="internal-embed"></span>
</p>
<div>broken</div>
world
<p></p>

A possible solution is base64 encoding the content of span.markdown-embed elements starting from the last element by changing

obsidian-better-export-pdf/src/render.ts

Lines 234 to 246 in 16229e5

    
           export function fixEmbedSpan(doc: Document) { 
        
             const spans = doc.querySelectorAll("span.markdown-embed"); 
        
             spans.forEach((span: HTMLElement) => { 
        
               const newDiv = document.createElement("div"); 
        
               copyAttributes(newDiv, span.attributes); 
        
               newDiv.innerHTML = span.innerHTML; 
        
               span.parentNode?.replaceChild(newDiv, span); 
        
             }); 
        
           }

to this encoding function and then recursively decoding the base64 encoded elements after document.body.innerHTML has been set:

obsidian-better-export-pdf/src/modal.ts

Line 206 in 16229e5

    
                   document.body.innerHTML = decodeURIComponent(\`${encodeURIComponent(this.doc.body.innerHTML)}\`);

using this decoding function.

Workaround

This workaround maintains nested block level elements without forcing breaks in the block context of the parent element and messing up the layout. The following functions are added

encodeEmbeds(doc)
- Starts from the last element using [...document.querySelectorAll('span.markdown-embeds')].reverse();
- Encodes the innerHTML of span.markdown-embed elements using span.innerHTML = btoa(span.innerHTML)
decodeAndReplaceEmbed(element)
- Utility within the appendWebview function
- Decodes the innerHTML of span.markdown-embeds recursively

Note: the encodeEmbeds(doc) function may need to be rewritten to encode the deepest nested (sibling) elements first instead of starting with the last element. However, simply reversing the order of iteration seems fine in my testing, as deeply nested elements always come later when selected using document.querySelectorAll().

dleeftink · 2024-04-18T03:16:40Z

Before: broken blocks

After: original blocks

l1xnan · 2024-04-18T03:35:23Z

Before: broken blocks

After: original blocks

Can you compare it with the official export effect?

dleeftink · 2024-04-18T03:40:36Z

As, display without the css borders? Or the result from the 'Export/Debug' actions?

l1xnan · 2024-04-18T03:46:02Z

As, display without the css borders? Or the result from the 'Export/Debug' actions?

Use the official Export to PDF feature to generate the PDF, then compare the results with the current output. We aim to maintain consistency with the official rendering as much as possible.

dleeftink · 2024-04-18T03:48:49Z

Will do. Here is the prettified HTML from the debugger in the meantime (notice the difference at 'some trailing text'):

Before

<div class="print">
  <div class="markdown-preview-view markdown-rendered show-properties">
    <h3 data-heading="hello">
      hello<a href="af://h3-0" class="md-print-anchor"></a>
    </h3>
    \n
    <p>
      Paragraph 1 with
      <span
        alt="bpdf(nest) > Nested block"
        src="bpdf(nest)#Nested block"
        class="internal-embed markdown-embed inline-embed is-loaded"
      ></span>
    </p>
    <div class="markdown-embed-title"></div>
    <div class="markdown-preview-view markdown-rendered show-indentation-guide">
      <h2 data-heading="Nested block">
        Nested block<a href="af://h2-1" class="md-print-anchor"></a>
      </h2>
      \n
      <p>Embedded paragraph.</p>
    </div>
    some trailing text. <!--broken block-->
    <p></p>
    \n
    <p>Paragraph 2</p>
    \n
    <h2 data-heading="world">
      world<a href="af://h2-2" class="md-print-anchor"></a>
    </h2>
    \n
    <p>Paragraph 3</p>
  </div>
</div>

After

<div class="print">
  <div class="markdown-preview-view markdown-rendered show-properties">
    <h3 data-heading="hello">
      hello<a href="af://h3-0" class="md-print-anchor"></a>
    </h3>
    \n
    <p>
      Paragraph 1 with
      <span
        alt="bpdf(nest) > Nested block"
        src="bpdf(nest)#Nested block"
        class="internal-embed markdown-embed inline-embed is-loaded"
        ><div class="markdown-embed-title"></div>
        <div class="markdown-preview-view markdown-rendered show-indentation-guide">
          <h2 data-heading="Nested block">
            Nested block<a href="af://h2-1" class="md-print-anchor"></a>
          </h2>
          \n
          <p>Embedded paragraph.</p>
        </div></span
      >
      some trailing text. <!--original block-->
    </p>
    \n
    <p>Paragraph 2</p>
    \n
    <h2 data-heading="world">
      world<a href="af://h2-2" class="md-print-anchor"></a>
    </h2>
    \n
    <p>Paragraph 3</p>
  </div>
</div>

l1xnan · 2024-04-18T03:56:15Z

Great, could you please test non-ASCII characters?

dleeftink · 2024-04-18T04:02:40Z

Here is the PDF output:

Before

After

dleeftink · 2024-04-18T04:26:48Z

Great, could you please test non-ASCII characters?

I have added two btoa_utf8 and atob_utf8 utility functions to handle unicode characters.

dleeftink · 2024-04-18T04:49:59Z

In general however, I think it would be better to move away from this.preview.executeJavascript(...) to build the PDF DOM. This would require a more significant rewrite along the lines of:

// Setting up containers
const printEl = document.body.createDiv("print");
const viewEl = printEl.createDiv({
  cls: "markdown-preview-view markdown-rendered " + cssclasses.join(" ")
});

...

// Rendering markdown
await MarkdownRenderer.render(app, lines.join("\n"), viewEl, file.path, comp);
await MarkdownRenderer.postProcess(...);

...

// Cloning elements                                  
const printDoc = document.implementation.createHTMLDocument("document");
printDoc.body.appendChild(printEl.cloneNode(true));

// Copy Obsidian styles
printDoc.head =  document.head.innerHTML;

// Preparing data transfer
const blob = new Blob([printDoc], {
  type: "text/html",
});

// Create object URL
const printUrl = window.URL.createObjectURL(blob);

...

// Transfer HTML to print preview
this.preview.loadUrl(printUrl);
this.preview.webContents.once("did-finish-load", async () => { 

  // Clean up after --> not sure URL's can be revoked from within another webview
  URL.revokeObjectURL(printUrl);
  ... additional logic ...

})

But even then, the issue of .innerHTML breaking the block context of nested blocks may persist. The reason why it works in native Obisidian is that embeds are inserted after the initial DOM has been laid out, which circumvents block context breakage. Whether this method used by the Obsidian devs follows the proper HTML5 spec is a discussion I'll leave for another day...

dleeftink · 2024-04-18T08:39:20Z

In fact, the way Obsidian handles internal embeds via nested divs is inherently incompatible with how pagedjs lays out pages, as its layouting algorithm requires HTML to adhere to the spec. Even when implementing a mechanism where block level elements are replaced with inline elements, this would result in user css styles likely becoming incompatible with the post-processed DOM.

I've been investigating integrating pagedjs and Obsidian for a while now, and am happy to draw up a roadmap or set of requirements if you're interested.

l1xnan · 2024-04-18T09:15:18Z

In fact, the way Obsidian handles internal embeds via nested divs is inherently incompatible with how pagedjs lays out pages, as its layouting algorithm requires HTML to adhere to the spec. Even when implementing a mechanism where block level elements are replaced with inline elements, this would result in user css styles likely becoming incompatible with the post-processed DOM.

I've been investigating integrating pagedjs and Obsidian for a while now, and am happy to draw up a roadmap or set of requirements if you're interested.

I initially attempted to integrate Paged.js and Obsidian, but failed (blank pages couldn't display). Do you have any good ideas?

dleeftink · 2024-04-18T10:35:01Z

I initially attempted to integrate Paged.js and Obsidian, but failed (blank pages couldn't display). Do you have any good ideas?

The latest beta has a lot of pagination fixes, but still has some issues when elements have display: none, overflow: hidden or nested <br> elements inside complex trees (e.g. tables). Generally, pagedjs works best with a 'flat' DOM structure where each element has an intrinsic width and height (hence why Obsidian embeds/code block replacements may cause issues).

However, the devs are quick to respond to your particular issues as I haven't encountered the blank page issue previously. My guess is that the chunker and paginator have not properly finished. It is also good practice to isolate the pagedJSPolyFill inside a webview or <iframe> and have it run unaffected by styles and scripts present in the parent context.

In my view, the biggest hurdle to make pagedJS work with Obsidian is agreeing on how nested markdown is transformed into a single document: how to signal which notes are chapters, sections or paragraphs? How is the table of contents generated? How to best collect footnotes/sidesnotes and multi-column markdown? Resolve inconsistent heading levels? Merge/split list and tables? Dataview elements? The list goes on.

My current approach is generating a DAG and an AST of each note in the DAG and have a list of custom markdown transformers that are able to handle various cases. Making this work for all edge cases is no small feat.

dleeftink added 2 commits April 18, 2024 01:56

Modify fixEmbeds to encodeEmbeds

c9b6526

Add recursive decodeAndReplaceEmbed() function

9ce1173

dleeftink added 2 commits April 18, 2024 14:17

Add btoa_utf8(string) function

84efe23

Add atob_utf8(string)

78e1532

l1xnan merged commit ca09acc into l1xnan:master Apr 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better `span.markdown-embed` parity with native Obisidan print to pdf #180

Better `span.markdown-embed` parity with native Obisidan print to pdf #180

dleeftink commented Apr 17, 2024 •

edited

dleeftink commented Apr 18, 2024

l1xnan commented Apr 18, 2024

Before: broken blocks

After: original blocks

dleeftink commented Apr 18, 2024

l1xnan commented Apr 18, 2024

dleeftink commented Apr 18, 2024

l1xnan commented Apr 18, 2024

dleeftink commented Apr 18, 2024

dleeftink commented Apr 18, 2024

dleeftink commented Apr 18, 2024 •

edited

dleeftink commented Apr 18, 2024

l1xnan commented Apr 18, 2024

dleeftink commented Apr 18, 2024

	export function fixEmbedSpan(doc: Document) {
	const spans = doc.querySelectorAll("span.markdown-embed");

	spans.forEach((span: HTMLElement) => {
	const newDiv = document.createElement("div");

	copyAttributes(newDiv, span.attributes);

	newDiv.innerHTML = span.innerHTML;

	span.parentNode?.replaceChild(newDiv, span);
	});
	}

Better span.markdown-embed parity with native Obisidan print to pdf #180

Better span.markdown-embed parity with native Obisidan print to pdf #180

Conversation

dleeftink commented Apr 17, 2024 • edited

Issue: broken block contexts

Workaround

dleeftink commented Apr 18, 2024

Before: broken blocks

After: original blocks

l1xnan commented Apr 18, 2024

Before: broken blocks

After: original blocks

dleeftink commented Apr 18, 2024

l1xnan commented Apr 18, 2024

dleeftink commented Apr 18, 2024

Before

After

l1xnan commented Apr 18, 2024

dleeftink commented Apr 18, 2024

Before

After

dleeftink commented Apr 18, 2024

dleeftink commented Apr 18, 2024 • edited

dleeftink commented Apr 18, 2024

l1xnan commented Apr 18, 2024

dleeftink commented Apr 18, 2024

Better `span.markdown-embed` parity with native Obisidan print to pdf #180

Better `span.markdown-embed` parity with native Obisidan print to pdf #180

dleeftink commented Apr 17, 2024 •

edited

dleeftink commented Apr 18, 2024 •

edited