Is there an example of streaming directly to the response stream in ASP.NET Core? #52

casperOne · 2021-11-05T12:46:36Z

Version: 2021.10.1

First, I want to say this popped up on my radar because of the Reddit thread that was posted:

https://www.reddit.com/r/csharp/comments/ox3klz/questpdf_my_opensource_c_library_for_creating_pdf/

And it's exactly what I was looking for.

That said, I've generated a document, and can get the results by calling the GeneratePdf method to return a byte array or write to an instance of a MemoryStream.

However, in an attempt to save memory allocations, I'm looking to write directly to the result stream in ASP.NET Core.

When I do this:

// document is an IDocument implementation
document.GeneratePdf(HttpContext.Response.Body);

I get a NullReferenceException with the following stack trace:

System.NullReferenceException
  HResult=0x80004003
  Message=Object reference not set to an instance of an object.
  Source=QuestPDF
  StackTrace:
   at QuestPDF.Drawing.PdfCanvas.EndDocument()
   at QuestPDF.Drawing.DocumentGenerator.RenderPass[TCanvas](PageContext pageContext, TCanvas canvas, Container content, DocumentMetadata documentMetadata)
   at QuestPDF.Drawing.DocumentGenerator.RenderDocument[TCanvas](TCanvas canvas, IDocument document)
   at QuestPDF.Drawing.DocumentGenerator.GeneratePdf(Stream stream, IDocument document)
   at QuestPDF.Fluent.GenerateExtensions.GeneratePdf(IDocument document, Stream stream)

I've looked in the closed issues and seen that there was an issue at one point around closing the stream, but it's been resolved.

Is there anything special that needs to be done when writing directly to the response stream in ASP.NET Core?

As mentioned, when making the following calls instead:

// Render to bytes
var result = document.GeneratePdf();

// Render to a stream.
document.GeneratePdf(new MemoryStream());

It does not throw.

The text was updated successfully, but these errors were encountered:

MarcinZiabek · 2021-11-05T12:52:53Z

Hello,

I think that you are incorrectly utilizing the streaming capability within the asp mvc. Please try this and let me know if it works 😁

public async Task<ActionResult> DownloadReport()
{
    await using var stream = new MemoryStream();
    myReport.GeneratePdf(stream);
    return File(stream, "application/pdf", "myReport.pdf");
}

casperOne · 2021-11-05T13:04:46Z

@MarcinZiabek Thanks for the prompt response.

That does work, but as mentioned, I'm looking to avoid allocations by writing to an intermediate stream (the new MemoryStream allocation is the same as returning to a new byte array).

I suspect that it may be due to the fact that the GeneratePdf overload that writes to a stream does not do so async, and that is a requirement if a stream is being written to directly in ASP.NET core.

It would be great if this was supported, but I can make do with copying for now.

MarcinZiabek · 2021-11-05T13:30:04Z

That does work, but as mentioned, I'm looking to avoid allocations by writing to an intermediate stream (the new MemoryStream allocation is the same as returning to a new byte array).

Can you please provide more details and justification for why it may work this way? So far, I have been using this method to stream large files up to several gigabytes (from blob storage, through asp core, to the client) and never saw any significant memory allocations, in contrast to using just a byte array. Maybe I just miss some important detail here...

MarcinZiabek · 2021-11-05T13:36:37Z

I suspect that it may be due to the fact that the GeneratePdf overload that writes to a stream does not do so async, and that is a requirement if a stream is being written to directly in ASP.NET core.

That is correct, QuestPDF does not support async operations. The only place where the async pattern would be useful is streaming images. However, because of the layouting algorithm, the library needs to know the exact size of the image at the very beginning of the process (which usually require just loading the image into memory) and then keep it to nearly the very end when the PDF file is produced with SkiaSharp. Alternatively, the library would need to load the image into memory twice which is no good either.

casperOne · 2021-11-07T23:54:32Z

That does work, but as mentioned, I'm looking to avoid allocations by writing to an intermediate stream (the new MemoryStream allocation is the same as returning to a new byte array).

Can you please provide more details and justification for why it may work this way? So far, I have been using this method to stream large files up to several gigabytes (from blob storage, through asp core, to the client) and never saw any significant memory allocations, in contrast to using just a byte array. Maybe I just miss some important detail here...

I have been able to get around my use case by rendering to a local array; because of the size of the PDFs, I worry about allocations building up on the LOH.

MarcinZiabek · 2021-11-08T00:03:43Z

I have been able to get around my use case by rendering to a local array; because of the size of the PDFs, I worry about allocations building up on the LOH.

Can you please provide more information? I was thinking about this use case and indeed, you may be right, the stream API does not help much when used with the MomeryStream object. It would be great to improve the library to reduce its memory consumption and GA pressure :)

casperOne · 2021-11-20T16:26:08Z

@MarcinZiabek The documents that I'm generating are currently ~1.6 MB, so greater than 85K which is required to go on the LOH.

In a high-throughput situation, this can easily cause problems, as the LOH:

Does not get collected until generation 1 and 2 are collection
Is not always compacted (which leads potential difficulty with further allocations on the LOH later on)

Source:

https://docs.microsoft.com/en-us/dotnet/standard/garbage-collection/large-object-heap#when-is-a-large-object-collected

MarcinZiabek · 2021-11-22T23:20:39Z

This is truly an interesting case, thank you for sharing this link! I am planning to read more about this concept. We also need to make sure that the core platform operates in a similar fashion. It would be great to have any real benchmarks showing this is indeed a problem.

The overall size of the PDF document is a result of embedded fonts. It is possible to manually perform font subsetting to reduce PDF file size. It can be done by any online tool. It creates a new font file with only required glyphs. This may reduce PDF file size tenfold or more. I am slowly working on the process of performing this automatically.
You can also reduce the file size by using image compression - by default, QuestPDF uses PNG file format.

I am still not sure if optimizing this streaming access will result in any significant results at this point. When you generate the document with images, those images need to be present in memory during the entire generation process. So, by average, we can reduce LOH overhead only by half in the most optimistic scenario. And this is assuming that SkiaSharp does not introduce any buffering or workload on its own.

I am not against any optimisation. In fact, any help is always more than welcome 😁

schulz3000 · 2021-12-10T20:08:25Z

I can reproduce the issue with HttpContext.Response.Body stream.
The problem is that the underlying Stream implementation does not support the Position property.
But the SkiaSharp implementation call this property in SkiaSharp.SKManagedWStream.OnBytesWritten

I think there are two possible solutions.

throw an exception early if we detect an input stream has set CanSeek to false
write a wrapper to provide the Position property to SkiaSharp

I will provide a PR for the wrapper solution.

MarcinZiabek · 2021-12-13T16:00:42Z

Thank you for providing this solution and improvement. I will analyse it as soon as I can. I understand your explanation.

Is there any possibility that the wrapper idea may introduce any bugs in other areas? After all:

It hides real position property.
It disables/removes some stream capabilities (e.g. CanRead, CanSeek, etc.) that may be required by SkiaSharp in the future (e.g. for performance optimization scenarios).

Usergitbit · 2022-02-11T14:59:13Z

Is this fixed? Trying to pass HttpContext.Response.Body as the stream still throws a null reference exception for me.

MarcinZiabek · 2022-02-11T18:52:59Z

I will take a look within a couple of days. Maybe something was incorrectly merged. Thank you for rising my attention!

MarcinZiabek · 2022-02-18T21:10:26Z

I decided to roll back changes developed in #65 in Quest 2022.2.5.

Indeed, in some cases it allows streaming directly to the Response.Body stream. However, this also introduces a significant risk. When an exception is thrown, the QuestPDF library attempts to close and dispose the SkiaSharp SkDocument object. This involves disposing the managed stream provided as an argument. If the provided stream happens to be Response.Body, a momery security issue happens and the FatalException is thrown. This is unaccaptable on production environments.

In other cases, the solution works but the "Synchronous operations are disallowed. Call WriteAsync or set AllowSynchronousIO to true instead" exception is thrown - this is caused by ASP.NET. Basically, it excepts that when we directly stream to the response stream, we should use the stream.WriteAsync method. Unfortunatelly, the SkiaSharp calls just the stream.Write method internally and this cannot be easily changed.

It is possible that I do not fully understand the whole problem. For now, I want to mitigate the risk. If anyone wants to help me analyse this problem with more details and on all environments, please do 😁

snow2zhou · 2022-08-05T09:31:09Z

Maybe you can use function GeneratePdf() to get byte[] first:
byte[] data = document.GeneratePdf();
Then you can transfer byte[] to stream:
Stream stream = new MemoryStream(data);
Finally, you can return a FileStreamResult to user :
FileStreamResult actionresult = new FileStreamResult(stream, "application/pdf");
actionresult.FileDownloadName = "myPDF.pdf";

DerAlbertCom · 2022-09-02T14:11:13Z

You can use https://github.com/Microsoft/Microsoft.IO.RecyclableMemoryStream as MemoryStream Replacement for less LOH Fragmentation.

DerAlbertCom · 2022-09-02T14:34:51Z

Here an Example:

public class QuestPdfResult : ActionResult
{
    private readonly IDocument _document;
    private readonly string _filename;


    public QuestPdfResult(IDocument document, string filename)
    {
        _document = document;
        _filename = filename;
    }

    public override async Task ExecuteResultAsync(ActionContext context)
    {
        var httpContext = context.HttpContext;
        var streamManager = httpContext.RequestServices.GetRequiredService<RecyclableMemoryStreamManager>();

        using var memoryStream = streamManager.GetStream();
        _document.GeneratePdf(memoryStream);

        httpContext.Response.ContentType = "application/pdf";
        httpContext.Response.Headers.ContentDisposition = $"attachment; filename=\"{_filename}\"";
        memoryStream.Position = 0;
        await memoryStream.CopyToAsync(httpContext.Response.Body);
    }
}

you have to Add the RecyclableMemoryStreamManager to DI.

services.AddSingleton<RecyclableMemoryStreamManager>();

schulz3000 mentioned this issue Dec 10, 2021

fix rendering direct to aspnet core response stream #65

Merged

MarcinZiabek mentioned this issue Jan 21, 2022

Null reference exception generating the PDF #105

Closed

MarcinZiabek closed this as completed in cb49356 Jan 30, 2022

MarcinZiabek reopened this Feb 11, 2022

MarcinZiabek self-assigned this Feb 18, 2022

MarcinZiabek added this to the 2022.2.4 milestone Feb 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there an example of streaming directly to the response stream in ASP.NET Core? #52

Is there an example of streaming directly to the response stream in ASP.NET Core? #52

casperOne commented Nov 5, 2021 •

edited

MarcinZiabek commented Nov 5, 2021

casperOne commented Nov 5, 2021

MarcinZiabek commented Nov 5, 2021

MarcinZiabek commented Nov 5, 2021

casperOne commented Nov 7, 2021

MarcinZiabek commented Nov 8, 2021

casperOne commented Nov 20, 2021

MarcinZiabek commented Nov 22, 2021

schulz3000 commented Dec 10, 2021

MarcinZiabek commented Dec 13, 2021

Usergitbit commented Feb 11, 2022

MarcinZiabek commented Feb 11, 2022

MarcinZiabek commented Feb 18, 2022 •

edited

snow2zhou commented Aug 5, 2022

DerAlbertCom commented Sep 2, 2022

DerAlbertCom commented Sep 2, 2022

Is there an example of streaming directly to the response stream in ASP.NET Core? #52

Is there an example of streaming directly to the response stream in ASP.NET Core? #52

Comments

casperOne commented Nov 5, 2021 • edited

MarcinZiabek commented Nov 5, 2021

casperOne commented Nov 5, 2021

MarcinZiabek commented Nov 5, 2021

MarcinZiabek commented Nov 5, 2021

casperOne commented Nov 7, 2021

MarcinZiabek commented Nov 8, 2021

casperOne commented Nov 20, 2021

MarcinZiabek commented Nov 22, 2021

schulz3000 commented Dec 10, 2021

MarcinZiabek commented Dec 13, 2021

Usergitbit commented Feb 11, 2022

MarcinZiabek commented Feb 11, 2022

MarcinZiabek commented Feb 18, 2022 • edited

snow2zhou commented Aug 5, 2022

DerAlbertCom commented Sep 2, 2022

DerAlbertCom commented Sep 2, 2022

casperOne commented Nov 5, 2021 •

edited

MarcinZiabek commented Feb 18, 2022 •

edited