Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better span.markdown-embed parity with native Obisidan print to pdf #180

Merged
merged 4 commits into from Apr 18, 2024

Conversation

dleeftink
Copy link
Contributor

@dleeftink dleeftink commented Apr 17, 2024

This took a while to track down, after noticing my @media print { ... } css files where rendering differently using your plugin compared to Obsidian's native print functionality.

Issue: broken block contexts

A better (but still not ideal) solution to have span.markdown-embeds behave similarly to native Obsidian, requires nested block-level elements to be inserted after the document.body.innerHTML has been set in line 206. Namely, certain block level elements (divs, blockquotes, etc.) force a break in the block context of a parent element (most notably <p>) which messes up the resulting HTML if set using document.body.innerHTML:

<p>
  Hello
  <span class="internal-embed">
    <div>broken</div>
  </span>
  world
</p>

Leads to the following output with orphaned nodes when parsed by .innerHTML:

<p>
  Hello
  <span class="internal-embed"></span>
</p>
<div>broken</div>
world
<p></p>

A possible solution is base64 encoding the content of span.markdown-embed elements starting from the last element by changing

export function fixEmbedSpan(doc: Document) {
const spans = doc.querySelectorAll("span.markdown-embed");
spans.forEach((span: HTMLElement) => {
const newDiv = document.createElement("div");
copyAttributes(newDiv, span.attributes);
newDiv.innerHTML = span.innerHTML;
span.parentNode?.replaceChild(newDiv, span);
});
}

to this encoding function and then recursively decoding the base64 encoded elements after document.body.innerHTML has been set:

document.body.innerHTML = decodeURIComponent(\`${encodeURIComponent(this.doc.body.innerHTML)}\`);

using this decoding function.

Workaround

This workaround maintains nested block level elements without forcing breaks in the block context of the parent element and messing up the layout. The following functions are added

  1. encodeEmbeds(doc)

    • Starts from the last element using [...document.querySelectorAll('span.markdown-embeds')].reverse();
    • Encodes the innerHTML of span.markdown-embed elements using span.innerHTML = btoa(span.innerHTML)
  2. decodeAndReplaceEmbed(element)

    • Utility within the appendWebview function
    • Decodes the innerHTML of span.markdown-embeds recursively

Note: the encodeEmbeds(doc) function may need to be rewritten to encode the deepest nested (sibling) elements first instead of starting with the last element. However, simply reversing the order of iteration seems fine in my testing, as deeply nested elements always come later when selected using document.querySelectorAll().

@dleeftink
Copy link
Contributor Author

Before: broken blocks

betterPDF(24 04 18)nesting(issue)

After: original blocks

betterPDF(24 04 18)nesting(issue)fix

@l1xnan
Copy link
Owner

l1xnan commented Apr 18, 2024

Before: broken blocks

betterPDF(24 04 18)nesting(issue)

After: original blocks

betterPDF(24 04 18)nesting(issue)fix

Can you compare it with the official export effect?

@dleeftink
Copy link
Contributor Author

As, display without the css borders? Or the result from the 'Export/Debug' actions?

@l1xnan
Copy link
Owner

l1xnan commented Apr 18, 2024

As, display without the css borders? Or the result from the 'Export/Debug' actions?

Use the official Export to PDF feature to generate the PDF, then compare the results with the current output. We aim to maintain consistency with the official rendering as much as possible.

@dleeftink
Copy link
Contributor Author

Will do. Here is the prettified HTML from the debugger in the meantime (notice the difference at 'some trailing text'):

Before

<div class="print">
  <div class="markdown-preview-view markdown-rendered show-properties">
    <h3 data-heading="hello">
      hello<a href="af://h3-0" class="md-print-anchor"></a>
    </h3>
    \n
    <p>
      Paragraph 1 with
      <span
        alt="bpdf(nest) > Nested block"
        src="bpdf(nest)#Nested block"
        class="internal-embed markdown-embed inline-embed is-loaded"
      ></span>
    </p>
    <div class="markdown-embed-title"></div>
    <div class="markdown-preview-view markdown-rendered show-indentation-guide">
      <h2 data-heading="Nested block">
        Nested block<a href="af://h2-1" class="md-print-anchor"></a>
      </h2>
      \n
      <p>Embedded paragraph.</p>
    </div>
    some trailing text. <!--broken block-->
    <p></p>
    \n
    <p>Paragraph 2</p>
    \n
    <h2 data-heading="world">
      world<a href="af://h2-2" class="md-print-anchor"></a>
    </h2>
    \n
    <p>Paragraph 3</p>
  </div>
</div>

After

<div class="print">
  <div class="markdown-preview-view markdown-rendered show-properties">
    <h3 data-heading="hello">
      hello<a href="af://h3-0" class="md-print-anchor"></a>
    </h3>
    \n
    <p>
      Paragraph 1 with
      <span
        alt="bpdf(nest) > Nested block"
        src="bpdf(nest)#Nested block"
        class="internal-embed markdown-embed inline-embed is-loaded"
        ><div class="markdown-embed-title"></div>
        <div class="markdown-preview-view markdown-rendered show-indentation-guide">
          <h2 data-heading="Nested block">
            Nested block<a href="af://h2-1" class="md-print-anchor"></a>
          </h2>
          \n
          <p>Embedded paragraph.</p>
        </div></span
      >
      some trailing text. <!--original block-->
    </p>
    \n
    <p>Paragraph 2</p>
    \n
    <h2 data-heading="world">
      world<a href="af://h2-2" class="md-print-anchor"></a>
    </h2>
    \n
    <p>Paragraph 3</p>
  </div>
</div>

@l1xnan
Copy link
Owner

l1xnan commented Apr 18, 2024

Great, could you please test non-ASCII characters?

@dleeftink
Copy link
Contributor Author

Here is the PDF output:

Before

betterPDF(24 04 18)nesting(issue)native-vs-plugin(before)

After

betterPDF(24 04 18)nesting(issue)native-vs-plugin(after)

@dleeftink
Copy link
Contributor Author

Great, could you please test non-ASCII characters?

I have added two btoa_utf8 and atob_utf8 utility functions to handle unicode characters.

betterPDF(24 04 18)nesting(issue)fix(unicode)

@dleeftink
Copy link
Contributor Author

dleeftink commented Apr 18, 2024

In general however, I think it would be better to move away from this.preview.executeJavascript(...) to build the PDF DOM. This would require a more significant rewrite along the lines of:

// Setting up containers
const printEl = document.body.createDiv("print");
const viewEl = printEl.createDiv({
  cls: "markdown-preview-view markdown-rendered " + cssclasses.join(" ")
});

...

// Rendering markdown
await MarkdownRenderer.render(app, lines.join("\n"), viewEl, file.path, comp);
await MarkdownRenderer.postProcess(...);

...

// Cloning elements                                  
const printDoc = document.implementation.createHTMLDocument("document");
printDoc.body.appendChild(printEl.cloneNode(true));

// Copy Obsidian styles
printDoc.head =  document.head.innerHTML;

// Preparing data transfer
const blob = new Blob([printDoc], {
  type: "text/html",
});

// Create object URL
const printUrl = window.URL.createObjectURL(blob);

...

// Transfer HTML to print preview
this.preview.loadUrl(printUrl);
this.preview.webContents.once("did-finish-load", async () => { 

  // Clean up after --> not sure URL's can be revoked from within another webview
  URL.revokeObjectURL(printUrl);
  ... additional logic ...

})

But even then, the issue of .innerHTML breaking the block context of nested blocks may persist. The reason why it works in native Obisidian is that embeds are inserted after the initial DOM has been laid out, which circumvents block context breakage. Whether this method used by the Obsidian devs follows the proper HTML5 spec is a discussion I'll leave for another day...

@l1xnan l1xnan merged commit ca09acc into l1xnan:master Apr 18, 2024
@dleeftink
Copy link
Contributor Author

In fact, the way Obsidian handles internal embeds via nested divs is inherently incompatible with how pagedjs lays out pages, as its layouting algorithm requires HTML to adhere to the spec. Even when implementing a mechanism where block level elements are replaced with inline elements, this would result in user css styles likely becoming incompatible with the post-processed DOM.

I've been investigating integrating pagedjs and Obsidian for a while now, and am happy to draw up a roadmap or set of requirements if you're interested.

@l1xnan
Copy link
Owner

l1xnan commented Apr 18, 2024

In fact, the way Obsidian handles internal embeds via nested divs is inherently incompatible with how pagedjs lays out pages, as its layouting algorithm requires HTML to adhere to the spec. Even when implementing a mechanism where block level elements are replaced with inline elements, this would result in user css styles likely becoming incompatible with the post-processed DOM.

I've been investigating integrating pagedjs and Obsidian for a while now, and am happy to draw up a roadmap or set of requirements if you're interested.

I initially attempted to integrate Paged.js and Obsidian, but failed (blank pages couldn't display). Do you have any good ideas?

@dleeftink
Copy link
Contributor Author

I initially attempted to integrate Paged.js and Obsidian, but failed (blank pages couldn't display). Do you have any good ideas?

The latest beta has a lot of pagination fixes, but still has some issues when elements have display: none, overflow: hidden or nested <br> elements inside complex trees (e.g. tables). Generally, pagedjs works best with a 'flat' DOM structure where each element has an intrinsic width and height (hence why Obsidian embeds/code block replacements may cause issues).

However, the devs are quick to respond to your particular issues as I haven't encountered the blank page issue previously. My guess is that the chunker and paginator have not properly finished. It is also good practice to isolate the pagedJSPolyFill inside a webview or <iframe> and have it run unaffected by styles and scripts present in the parent context.

In my view, the biggest hurdle to make pagedJS work with Obsidian is agreeing on how nested markdown is transformed into a single document: how to signal which notes are chapters, sections or paragraphs? How is the table of contents generated? How to best collect footnotes/sidesnotes and multi-column markdown? Resolve inconsistent heading levels? Merge/split list and tables? Dataview elements? The list goes on.

My current approach is generating a DAG and an AST of each note in the DAG and have a list of custom markdown transformers that are able to handle various cases. Making this work for all edge cases is no small feat.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants