Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there an example of streaming directly to the response stream in ASP.NET Core? #52

Open
casperOne opened this issue Nov 5, 2021 · 16 comments · Fixed by #65
Open
Assignees
Milestone

Comments

@casperOne
Copy link

casperOne commented Nov 5, 2021

Version: 2021.10.1

First, I want to say this popped up on my radar because of the Reddit thread that was posted:

https://www.reddit.com/r/csharp/comments/ox3klz/questpdf_my_opensource_c_library_for_creating_pdf/

And it's exactly what I was looking for.

That said, I've generated a document, and can get the results by calling the GeneratePdf method to return a byte array or write to an instance of a MemoryStream.

However, in an attempt to save memory allocations, I'm looking to write directly to the result stream in ASP.NET Core.

When I do this:

// document is an IDocument implementation
document.GeneratePdf(HttpContext.Response.Body);

I get a NullReferenceException with the following stack trace:

System.NullReferenceException
  HResult=0x80004003
  Message=Object reference not set to an instance of an object.
  Source=QuestPDF
  StackTrace:
   at QuestPDF.Drawing.PdfCanvas.EndDocument()
   at QuestPDF.Drawing.DocumentGenerator.RenderPass[TCanvas](PageContext pageContext, TCanvas canvas, Container content, DocumentMetadata documentMetadata)
   at QuestPDF.Drawing.DocumentGenerator.RenderDocument[TCanvas](TCanvas canvas, IDocument document)
   at QuestPDF.Drawing.DocumentGenerator.GeneratePdf(Stream stream, IDocument document)
   at QuestPDF.Fluent.GenerateExtensions.GeneratePdf(IDocument document, Stream stream)

I've looked in the closed issues and seen that there was an issue at one point around closing the stream, but it's been resolved.

Is there anything special that needs to be done when writing directly to the response stream in ASP.NET Core?

As mentioned, when making the following calls instead:

// Render to bytes
var result = document.GeneratePdf();

// Render to a stream.
document.GeneratePdf(new MemoryStream());

It does not throw.

@MarcinZiabek
Copy link
Member

Hello,

I think that you are incorrectly utilizing the streaming capability within the asp mvc. Please try this and let me know if it works 😁

public async Task<ActionResult> DownloadReport()
{
    await using var stream = new MemoryStream();
    myReport.GeneratePdf(stream);
    return File(stream, "application/pdf", "myReport.pdf");
}

@casperOne
Copy link
Author

@MarcinZiabek Thanks for the prompt response.

That does work, but as mentioned, I'm looking to avoid allocations by writing to an intermediate stream (the new MemoryStream allocation is the same as returning to a new byte array).

I suspect that it may be due to the fact that the GeneratePdf overload that writes to a stream does not do so async, and that is a requirement if a stream is being written to directly in ASP.NET core.

It would be great if this was supported, but I can make do with copying for now.

@MarcinZiabek
Copy link
Member

That does work, but as mentioned, I'm looking to avoid allocations by writing to an intermediate stream (the new MemoryStream allocation is the same as returning to a new byte array).

Can you please provide more details and justification for why it may work this way? So far, I have been using this method to stream large files up to several gigabytes (from blob storage, through asp core, to the client) and never saw any significant memory allocations, in contrast to using just a byte array. Maybe I just miss some important detail here...

@MarcinZiabek
Copy link
Member

I suspect that it may be due to the fact that the GeneratePdf overload that writes to a stream does not do so async, and that is a requirement if a stream is being written to directly in ASP.NET core.

That is correct, QuestPDF does not support async operations. The only place where the async pattern would be useful is streaming images. However, because of the layouting algorithm, the library needs to know the exact size of the image at the very beginning of the process (which usually require just loading the image into memory) and then keep it to nearly the very end when the PDF file is produced with SkiaSharp. Alternatively, the library would need to load the image into memory twice which is no good either.

@casperOne
Copy link
Author

That does work, but as mentioned, I'm looking to avoid allocations by writing to an intermediate stream (the new MemoryStream allocation is the same as returning to a new byte array).

Can you please provide more details and justification for why it may work this way? So far, I have been using this method to stream large files up to several gigabytes (from blob storage, through asp core, to the client) and never saw any significant memory allocations, in contrast to using just a byte array. Maybe I just miss some important detail here...

I have been able to get around my use case by rendering to a local array; because of the size of the PDFs, I worry about allocations building up on the LOH.

@MarcinZiabek
Copy link
Member

I have been able to get around my use case by rendering to a local array; because of the size of the PDFs, I worry about allocations building up on the LOH.

Can you please provide more information? I was thinking about this use case and indeed, you may be right, the stream API does not help much when used with the MomeryStream object. It would be great to improve the library to reduce its memory consumption and GA pressure :)

@casperOne
Copy link
Author

@MarcinZiabek The documents that I'm generating are currently ~1.6 MB, so greater than 85K which is required to go on the LOH.

In a high-throughput situation, this can easily cause problems, as the LOH:

  • Does not get collected until generation 1 and 2 are collection
  • Is not always compacted (which leads potential difficulty with further allocations on the LOH later on)

Source:

https://docs.microsoft.com/en-us/dotnet/standard/garbage-collection/large-object-heap#when-is-a-large-object-collected

@MarcinZiabek
Copy link
Member

This is truly an interesting case, thank you for sharing this link! I am planning to read more about this concept. We also need to make sure that the core platform operates in a similar fashion. It would be great to have any real benchmarks showing this is indeed a problem.

  1. The overall size of the PDF document is a result of embedded fonts. It is possible to manually perform font subsetting to reduce PDF file size. It can be done by any online tool. It creates a new font file with only required glyphs. This may reduce PDF file size tenfold or more. I am slowly working on the process of performing this automatically.
  2. You can also reduce the file size by using image compression - by default, QuestPDF uses PNG file format.

I am still not sure if optimizing this streaming access will result in any significant results at this point. When you generate the document with images, those images need to be present in memory during the entire generation process. So, by average, we can reduce LOH overhead only by half in the most optimistic scenario. And this is assuming that SkiaSharp does not introduce any buffering or workload on its own.

I am not against any optimisation. In fact, any help is always more than welcome 😁

@schulz3000
Copy link
Contributor

I can reproduce the issue with HttpContext.Response.Body stream.
The problem is that the underlying Stream implementation does not support the Position property.
But the SkiaSharp implementation call this property in SkiaSharp.SKManagedWStream.OnBytesWritten

I think there are two possible solutions.

  • throw an exception early if we detect an input stream has set CanSeek to false
  • write a wrapper to provide the Position property to SkiaSharp

I will provide a PR for the wrapper solution.

@MarcinZiabek
Copy link
Member

Thank you for providing this solution and improvement. I will analyse it as soon as I can. I understand your explanation.

Is there any possibility that the wrapper idea may introduce any bugs in other areas? After all:

  1. It hides real position property.
  2. It disables/removes some stream capabilities (e.g. CanRead, CanSeek, etc.) that may be required by SkiaSharp in the future (e.g. for performance optimization scenarios).

@Usergitbit
Copy link

Is this fixed? Trying to pass HttpContext.Response.Body as the stream still throws a null reference exception for me.

@MarcinZiabek
Copy link
Member

I will take a look within a couple of days. Maybe something was incorrectly merged. Thank you for rising my attention!

@MarcinZiabek MarcinZiabek reopened this Feb 11, 2022
@MarcinZiabek MarcinZiabek self-assigned this Feb 18, 2022
@MarcinZiabek MarcinZiabek added this to the 2022.2.4 milestone Feb 18, 2022
@MarcinZiabek
Copy link
Member

MarcinZiabek commented Feb 18, 2022

I decided to roll back changes developed in #65 in Quest 2022.2.5.

Indeed, in some cases it allows streaming directly to the Response.Body stream. However, this also introduces a significant risk. When an exception is thrown, the QuestPDF library attempts to close and dispose the SkiaSharp SkDocument object. This involves disposing the managed stream provided as an argument. If the provided stream happens to be Response.Body, a momery security issue happens and the FatalException is thrown. This is unaccaptable on production environments.

In other cases, the solution works but the "Synchronous operations are disallowed. Call WriteAsync or set AllowSynchronousIO to true instead" exception is thrown - this is caused by ASP.NET. Basically, it excepts that when we directly stream to the response stream, we should use the stream.WriteAsync method. Unfortunatelly, the SkiaSharp calls just the stream.Write method internally and this cannot be easily changed.

It is possible that I do not fully understand the whole problem. For now, I want to mitigate the risk. If anyone wants to help me analyse this problem with more details and on all environments, please do 😁

@snow2zhou
Copy link

Maybe you can use function GeneratePdf() to get byte[] first:
byte[] data = document.GeneratePdf();
Then you can transfer byte[] to stream:
Stream stream = new MemoryStream(data);
Finally, you can return a FileStreamResult to user :
FileStreamResult actionresult = new FileStreamResult(stream, "application/pdf");
actionresult.FileDownloadName = "myPDF.pdf";

@DerAlbertCom
Copy link

You can use https://github.com/Microsoft/Microsoft.IO.RecyclableMemoryStream as MemoryStream Replacement for less LOH Fragmentation.

@DerAlbertCom
Copy link

Here an Example:

public class QuestPdfResult : ActionResult
{
    private readonly IDocument _document;
    private readonly string _filename;


    public QuestPdfResult(IDocument document, string filename)
    {
        _document = document;
        _filename = filename;
    }

    public override async Task ExecuteResultAsync(ActionContext context)
    {
        var httpContext = context.HttpContext;
        var streamManager = httpContext.RequestServices.GetRequiredService<RecyclableMemoryStreamManager>();

        using var memoryStream = streamManager.GetStream();
        _document.GeneratePdf(memoryStream);

        httpContext.Response.ContentType = "application/pdf";
        httpContext.Response.Headers.ContentDisposition = $"attachment; filename=\"{_filename}\"";
        memoryStream.Position = 0;
        await memoryStream.CopyToAsync(httpContext.Response.Body);
    }
}

you have to Add the RecyclableMemoryStreamManager to DI.

services.AddSingleton<RecyclableMemoryStreamManager>();

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants