Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDF output size #848

Open
neodynamic opened this issue May 22, 2019 · 30 comments
Open

PDF output size #848

neodynamic opened this issue May 22, 2019 · 30 comments

Comments

@neodynamic
Copy link

Running this sample https://github.com/mono/SkiaSharp/blob/master/samples/Gallery/Shared/Samples/CreatePdfSample.cs which creates a simple two pages PDF file, the created file (under Windows) is about 510KB

Is there any compression setting to get the output PDF file size lighter? 510KB for a two pages PDF with a simple text seems to be somehow heavy... Any hints?

@Redth Redth added this to Needs Triage in Triage May 23, 2019
@charlesroddie
Copy link

@neodynamic Can you see what Skia produces?

@neodynamic
Copy link
Author

@charlesroddie no, and you?

@mattleibow
Copy link
Contributor

You could have a look at setting the quality down: https://docs.microsoft.com/en-us/dotnet/api/skiasharp.skdocumentpdfmetadata

@neodynamic
Copy link
Author

We've been reviewing this matter and we can conclude that the concerns about pdf output file size cannot be improved because the following... it seems that Skia (native lib) PDF backend design will embed any font file needed to render the text at the target device. That page states the following:

We can't assume that an arbitrary font will be available at PDF view time, so we embed all fonts in accordance with modern PDF guidelines.

The sample here https://github.com/mono/SkiaSharp/blob/master/samples/Gallery/Shared/Samples/CreatePdfSample.cs will use the default font in the system, which under Windows, it's likely to be Segoe UI which TTF file size is about 900KB
The output PDF file for that simple test where a single text is drawn is about 510KB. That big size for such a simple pdf is because Segoe UI font file is embedded in the file by Skia design. We've made another tests by drawing Chinese text using the Yu Gothic font which file size is about 13MB! and the output pdf file is about 8MB! which confirms that the size is because the font files being embedded. Linked fonts seems not to be supported which could make the pdf output file size smaller.

If no one here has more comments on this matter, then @mattleibow you can close this issue.

@Gillibald
Copy link
Contributor

You could use HarfBuzz's subsetting to reduce the font's size. Then load that font to produce the PDF. Sadly this isn't supported by HarfBuzzSharp. Yet....

@neodynamic
Copy link
Author

Yes, that could be the only way to reduce pdf output file size...

@mattleibow
Copy link
Contributor

Looking at the skia code, it seems there is 2 subsetters built in. But, this is disabled because we are not building with either icu or harfbuzz/sfntly.

However, there is a hook that makes subsetting work, but it is not a "public API". But, since it is fairly simple, we might be able to do something. The API hasn't changed much, so it might just be safe to do something.

I'll have a look at what we can do. Can't promise anything as I haven't had a look at exactly how the PDF is constructed, but it seems to only write the fonts when the PDF is closed, so we could potentially add a argument there, or in the metadata in the constructor. They actually have an enum there that allows you to pick either harfbuzz or sfntly. Seems to be not too hard to add one for us, and then we can use any font subsetter.

@mattleibow
Copy link
Contributor

mattleibow commented Jul 10, 2020

Started a thing on the skia bugtracker. I want to do this right: https://bugs.chromium.org/p/skia/issues/detail?id=10491
And discussion: https://groups.google.com/g/skia-discuss/c/XIvDEEwZrAM

@Alexbits
Copy link

Alexbits commented Aug 26, 2020

You could have a look at setting the quality down: https://docs.microsoft.com/en-us/dotnet/api/skiasharp.skdocumentpdfmetadata

Hi @mattleibow. I've tried to set lower EncodingQuality and RasterDpi and they have no impact on the output file at all. It outputs the same file size and quality. Latest SkiaSharp on Windows 10.

@reinux
Copy link

reinux commented Aug 23, 2021

Any progress on this? Japanese/Chinese fonts are easily 10MB+ (per weight), so this becomes nigh unusable.

@johmarjac
Copy link

johmarjac commented Oct 19, 2021

Cant believe this is an issue. You should let the developer choose whether to embed the font file or not.

@jeffska
Copy link

jeffska commented Oct 27, 2021

@mattleibow I was able to build the Windows libSkiaSharp using Skia's support for Harfbuzz subsetting. It seems to work fine. My test PDF that was over 280 KB went down to less than 10 KB with the subsetting. Other than changing the Skia build switches, the only thing I had to do was edit Skia's Harfbuzz BUILD.gn since the forked version appears to be out of sync with the Harfbuzz commit in the DEPS.

Can you think of any reason that this wouldn't be a viable solution?

@wstaelens
Copy link

any updates regarding the file size/fonts?

@reinux
Copy link

reinux commented Jan 14, 2022

Should I assume this has been abandoned and rebuild my project using another PDF library?

@johmarjac
Copy link

Should I assume this has been abandoned and rebuild my project using another PDF library?

In case you go for a different library, don't use QuestPDF as it uses SkiaSharp under the hood and suffers from the same big file sizes.

@reinux
Copy link

reinux commented Jan 14, 2022

Thanks for the tip. I don't understand how something like this wouldn't be recognized as a fatal issue.

If no one here has more comments on this matter, then @mattleibow you can close this issue.

Like, what.

@johmarjac
Copy link

Thanks for the tip. I don't understand how something like this wouldn't be recognized as a fatal issue.

If no one here has more comments on this matter, then @mattleibow you can close this issue.

Like, what.

Depends on the use case really. If you only generate a single pdf, no one cares for a 2 MB PDF on their PC. But I needed it for a production series for part protocol where I have a part every 2 seconds so every 2 seconds I need to save a pdf to network share to archive the part measure results. Then a 2 MB file every 2 seconds costs a hell lot of storage and that's just not going to work

@wstaelens
Copy link

wstaelens commented Jan 14, 2022

indeed, thousands of small files are processed. 1000 * 2MB (while it is normally like 159KB - 236KB really means a big difference, in network traffic, processing time, diskspace etc...)

it is related to fonts, but should be investigated by skia...

another reason: Most ISP mailboxes/corporate policies still have a mailbox email size limit of 10MB or 15MB. meaning 5 attachments vs. 15 - 20....

@jeffska
Copy link

jeffska commented Jan 14, 2022

Even though I got the native Skia subsetting working with a custom build, I wasn't happy with it. It's a very naive implementation,, and doesn't perform well for larger (like CJK) fonts. It's better than nothing, but wasn't sufficient for my use.

I ended up using a two-pass approach by building the font subsets before rendering and then passing those in to SkiaSharp.

@KillyMXI
Copy link

KillyMXI commented Jan 14, 2022

Is there a way to force Skia to render all text as paths?
Is there a way to make a path from a particular text?

@KillyMXI
Copy link

KillyMXI commented Jan 17, 2022

I'm trying to run pdf checker on a file generated with SkiaSharp, with no strings in it.
The only non-empty notice in the report:

Cleanup Results
    Errors:
        None
    Information:
        Contains conservatively compressed streams:
            Uncompressed: (141 instances)
    Checks Completed:
        suboptimal-compression

Optimizing it with 3rd party tools allowed to go from ~500kb to 100kb.

Looks like there is something besides embedded fonts that could be optimized in Skia.

My sample is generated with Svg.Skia and the source only consists of vector lines. I've no idea what can be so inefficient there.

Trying to mess with SKDocumentPdfMetadata actually results in bigger file size. I would expect it to be a no-op, but if I supply any RasterDpi value or non-default EncodingQuality value, the file size jumps up another ~400kb. This doesn't make sense.

@domagojmedo
Copy link

Any updates for this?
@mattleibow

@Greybird
Copy link

Greybird commented Jul 18, 2022

I ended up using a two-pass approach by building the font subsets before rendering and then passing those in to SkiaSharp.

@jeffska : would you have a gist or some place where we could take a look at what you put in place to build the font subsets externally ?

@TimLee88
Copy link

TimLee88 commented Apr 4, 2023

So, How is the progress?

@wstaelens
Copy link

Wondering the same @TimLee88

@wstaelens
Copy link

images in PDFs don't seem to support 1bpp which increases also the pdf size, correct?

@giz303
Copy link

giz303 commented Feb 2, 2024

Hi, has anything happened here? We are looking for a solution.
We had been using https://github.com/Sicos1977/ChromeHtmlToPdf to convert from SVG to PDF and moved to SkiaSharp to get rid of the Google chrome processes.

But now the PDF files which had been between 20 and 40 KB are now over 500 KB big.
Since we convert a lot of files in production, and need to send these PDF files over ethernet to terminals, we would like the file sizes to be lower again.
All used fonts are available on the terminals and there is no need to embed them in the PDF file.

So, is there someone workling on this issue or will this not be implemented at all?

@jeffska
Copy link

jeffska commented Feb 2, 2024

Hi, has anything happened here? We are looking for a solution. We had been using https://github.com/Sicos1977/ChromeHtmlToPdf to convert from SVG to PDF and moved to SkiaSharp to get rid of the Google chrome processes.

But now the PDF files which had been between 20 and 40 KB are now over 500 KB big. Since we convert a lot of files in production, and need to send these PDF files over ethernet to terminals, we would like the file sizes to be lower again. All used fonts are available on the terminals and there is no need to embed them in the PDF file.

Have you inspected the PDF to make sure the SVG isn't just being rasterized?

@wstaelens
Copy link

we've seen an increase because 1bpp images are not supported. Resulting in larger pdf's (every 1bpp image is converted to 24bpp). The 1bpp pdf's happen when multifunctional devices make scans..

Would like to see some support also for 1bpp... as this makes pdf's much much bigger. (especially when 1bpp glyph bitmaps are used)

@flensrocker
Copy link

I don't care so much about performance or lib size and I already use HarfBuzzSharp for measuring text widths.
Is there any way to (optionally) enable font subsetting with HarfBuzz?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Triage
  
Needs Triage
Development

No branches or pull requests