Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF generation #1

Closed
victorklos opened this issue Jan 18, 2019 · 11 comments

Comments

2 participants
@victorklos
Copy link

commented Jan 18, 2019

Expected Behavior

A generated pdf document after pdoc3 --pdf time.

Actual Behavior

usage: pdoc3 [-h] [--version] [--filter STRING] [--html] [--html-dir DIR]
             [--html-no-source] [--overwrite] [--external-links]
             [--template-dir DIR] [--link-prefix STRING] [--http HOST:PORT]
             MODULE [MODULE ...]
pdoc3: error: unrecognized arguments: --pdf

Steps to Reproduce

  1. Read homepage https://pdoc3.github.io/pdoc/
  2. Execute command above

Additional info

  • pdoc version: 0.5.1
@victorklos

This comment has been minimized.

Copy link
Author

commented Jan 18, 2019

BTW I have some experience with pandoc so if you need help please let me know...

@kernc

This comment has been minimized.

Copy link
Contributor

commented Jan 18, 2019

Thanks. How would you implement PDF generation using pandoc?

@victorklos

This comment has been minimized.

Copy link
Author

commented Jan 18, 2019

First step would be to generate a single page output (even HTML would be interesting in itself, e.g. if you want to email the documentation). Pandoc needs an input and a template. The input could be said HTML document, or markdown or what it is you currently generate. The template is in LaTeX. Pandoc would than be required on the path.

Creating PDF files through LaTeX is a bit of a pain through, as it depends on texlive which is over 2GB when installed fully. Maybe offer the option to compile through docker? Most developers have that installed nowadays I guess...

Some alternatives are possible:

  • focus on a single-page HTML output and convert it through a browser or local PDF printer
  • write your own PDF output generator, e.g. based on reportlab or pdfkit (not recommended)
@kernc

This comment has been minimized.

Copy link
Contributor

commented Jan 18, 2019

I tried pandoc to convert pdoc documentation index.html as well as some other HTML. It didn't work. The latex converters seem picky about everything, including non-ASCII characters and referenced SVG images. There are some indications (jgm/pandoc#1793 (comment)) that alternative engines should be preferred for better results. Engine html5, though, requires wkhtmltopdf, which itself is a largish binary the user would require and can then as well be used standalone.

Pdoc indeed already contains some provisions for printing:

<style media="print">${css.print()}</style>

<%def name="print()" filter="minify_css">
@media print {
#sidebar h1 {
page-break-before: always;
}
.source {
display: none;
}
}
@media print {
* {
background: transparent !important;
color: #000 !important; /* Black prints faster: h5bp.com/s */
box-shadow: none !important;
text-shadow: none !important;
}
a,
a:visited {
text-decoration: underline;
}
a[href]:after {
content: " (" attr(href) ")";
}
abbr[title]:after {
content: " (" attr(title) ")";
}
/*
* Don't show links for images, or javascript/internal links
*/
.ir a:after,
a[href^="javascript:"]:after,
a[href^="#"]:after {
content: "";
}
pre,
blockquote {
border: 1px solid #999;
page-break-inside: avoid;
}
thead {
display: table-header-group; /* h5bp.com/t */
}
tr,
img {
page-break-inside: avoid;
}
img {
max-width: 100% !important;
}
@page {
margin: 0.5cm;
}
p,
h2,
h3 {
orphans: 3;
widows: 3;
}
h1,
h2,
h3,
h4,
h5,
h6 {
page-break-after: avoid;
}
}
</%def>

If we could somehow leverage the common web browsers, some instance of which exists in almost all environments, that'd be great! I was looking into Selenium / WebDriver API and whether it supports printing to file, but it appears this is not the (common) case.

I found this Reddit thread, comparing several possible approaches, and of the listed I feel like preferring running Chrome the most.

chromium --headless --disable-gpu --print-to-pdf=output.pdf input.html

Second to that maybe WeasyPrint, but that has a list of dependencies that may not be so easy to support in all environments (e.g. Windos without a C/C++ compiler).

@kernc kernc changed the title PDF generation is mentioned in docs but seems unavailable PDF generation Jan 18, 2019

@victorklos

This comment has been minimized.

Copy link
Author

commented Jan 18, 2019

I like the chromium route best. How much work would it be to generate a single HTML file? The current index file can become the TOC, the rest chapters, all links internal.

@kernc

This comment has been minimized.

Copy link
Contributor

commented Jan 18, 2019

Currently, every module is rendered and written out separately:

pdoc/pdoc/cli.py

Lines 253 to 275 in e7868e2

def write_html_files(m: pdoc.Module):
f = module_html_path(m)
dirpath = path.dirname(f)
if not os.access(dirpath, os.R_OK):
os.makedirs(dirpath)
try:
with open(f, 'w+', encoding='utf-8') as w:
w.write(m.html(
external_links=args.external_links,
link_prefix=args.link_prefix,
source=not args.html_no_source,
))
except Exception:
try:
os.unlink(f)
except Exception:
pass
raise
for submodule in m.submodules():
write_html_files(submodule)

So not too much, but it would certainly bear generating a new mako template that handles a list of modules and in which all pdoc.Doc.url() calls are trimmed to URL fragments. Then, I guess, a new command line switch can be added.

Is ths something you would care to work on?

@kernc

This comment has been minimized.

Copy link
Contributor

commented Jan 18, 2019

Or, probably better yet, adapting existing HTML template. It already handles a list of modules in a way when pdoc is run as --http web server:

<%def name="show_module_list(modules)">
<h1>Python module list</h1>
% if not modules:
<p>No modules found.</p>
% else:
<dl id="http-server-module-list">
% for name, desc in modules:
<div class="flex">
<dt><a href="${link_prefix}${name}">${name}</a></dt>
<dd>${desc | glimpse, to_html}</dd>
</div>
% endfor
</dl>
% endif
</%def>

@kernc kernc added this to the 0.6.0 milestone Jan 24, 2019

@kernc kernc added the enhancement label Jan 27, 2019

@kernc

This comment has been minimized.

Copy link
Contributor

commented Feb 3, 2019

@victorklos I made some progress #20 using markdown and pandoc. If you're still interested, please have a look.

@victorklos

This comment has been minimized.

Copy link
Author

commented Feb 5, 2019

Great! I will, but earliest this weekend...

@victorklos

This comment has been minimized.

Copy link
Author

commented Feb 5, 2019

The generated PDF already looks great though! Maybe add some pdoc3 styling later on.

@kernc

This comment has been minimized.

Copy link
Contributor

commented Feb 6, 2019

It now prints straight to markdown, so any styling would need to be overridden on pandoc/LaTeX level, or a new set of CSS written and included as raw HTML for conversion through intermediate HTML.

@kernc kernc closed this in #20 Apr 22, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.