Support remote assets and additional image formats#46
Conversation
Co-authored-by: jongalloway <68539+jongalloway@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR expands the PPTX renderer’s asset pipeline to support remote HTTP(S) images and additional raster formats, improving format detection and making failures more actionable during rendering.
Changes:
- Add a
RemoteAssetResolverthat downloads/caches HTTP(S) images to temporary files during a render. - Extend image format handling (BMP/WebP) and add magic-byte MIME detection for extensionless/unknown files.
- Update rendering behavior and tests to cover remote asset success/failure, disabled remote assets, and unsupported formats.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/MarpToPptx.Tests/PptxRendererTests.cs | Adds coverage for remote image resolution, HTTP failures, remote-disabled behavior, and BMP/WebP/unsupported formats. |
| src/MarpToPptx.Pptx/Rendering/RemoteAssetResolver.cs | Introduces a resolver that fetches remote images via HttpClient, caches by URL, and cleans up temp files. |
| src/MarpToPptx.Pptx/Rendering/PptxRenderOptions.cs | Adds AllowRemoteAssets and injectable RemoteAssetHandler for configurability/testability. |
| src/MarpToPptx.Pptx/Rendering/OpenXmlPptxRenderer.cs | Plumbs remote asset resolution into rendering and updates image content-type detection/error reporting. |
| src/MarpToPptx.Pptx/Rendering/ImageMetadataReader.cs | Adds BMP/WebP size parsing and stream-based magic-byte content-type detection. |
Comments suppressed due to low confidence (1)
tests/MarpToPptx.Tests/PptxRendererTests.cs:283
- These new remote-asset tests assert picture presence/text, but they don’t run the OpenXmlPackageValidator like other tests in this file. Adding validation here would better catch regressions (e.g., incorrect content types/relationships) introduced by remote image embedding.
using var document = PresentationDocument.Open(outputPath, false);
var slidePart = document.PresentationPart!.SlideParts.First();
Assert.Single(slidePart.Slide!.Descendants<P.Picture>());
Assert.DoesNotContain("Missing image", slidePart.Slide!.Descendants<A.Text>().Select(t => t.Text));
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /// When <c>true</c> (the default), HTTP and HTTPS image URLs are fetched at render time. | ||
| /// Set to <c>false</c> to treat all remote references as missing assets. | ||
| /// </summary> | ||
| public bool AllowRemoteAssets { get; init; } = true; |
There was a problem hiding this comment.
AllowRemoteAssets defaults to true, which changes rendering from “no network access” to “performs outbound HTTP(S) requests” unless callers opt out. That’s a security/operational concern for library consumers (SSRF and unexpected network dependency). Consider defaulting to false (opt-in) or very explicitly documenting the risk/expected usage in server contexts.
| /// When <c>true</c> (the default), HTTP and HTTPS image URLs are fetched at render time. | |
| /// Set to <c>false</c> to treat all remote references as missing assets. | |
| /// </summary> | |
| public bool AllowRemoteAssets { get; init; } = true; | |
| /// When <c>true</c>, HTTP and HTTPS image URLs are fetched at render time. | |
| /// The default is <c>false</c>, which treats all remote references as missing assets | |
| /// and avoids outbound HTTP(S) requests unless explicitly enabled. | |
| /// </summary> | |
| public bool AllowRemoteAssets { get; init; } = false; |
|
|
||
| private readonly HttpClient _client; | ||
| private readonly bool _ownsClient; | ||
| private readonly Dictionary<string, string> _cache = new(StringComparer.OrdinalIgnoreCase); |
There was a problem hiding this comment.
RemoteAssetResolver caches by URL using StringComparer.OrdinalIgnoreCase. URL paths can be case-sensitive, so this may incorrectly treat distinct resources as the same cache key. Use a case-sensitive comparer (Ordinal) or normalize URLs in a way that preserves semantics.
| private readonly Dictionary<string, string> _cache = new(StringComparer.OrdinalIgnoreCase); | |
| private readonly Dictionary<string, string> _cache = new(StringComparer.Ordinal); |
| using var response = _client.GetAsync(url).ConfigureAwait(false).GetAwaiter().GetResult(); | ||
| if (!response.IsSuccessStatusCode) | ||
| { | ||
| errorMessage = $"HTTP {(int)response.StatusCode} {response.ReasonPhrase}"; | ||
| return null; | ||
| } | ||
|
|
||
| var contentType = response.Content.Headers.ContentType?.MediaType; | ||
| var extension = GetExtensionFromUrl(url) | ||
| ?? GetExtensionFromContentType(contentType) | ||
| ?? ".bin"; | ||
|
|
||
| var tempPath = Path.Combine(Path.GetTempPath(), $"marp2pptx_{Guid.NewGuid():N}{extension}"); | ||
|
|
||
| using (var fileStream = File.Create(tempPath)) | ||
| using (var contentStream = response.Content.ReadAsStream()) | ||
| { | ||
| contentStream.CopyTo(fileStream); | ||
| } |
There was a problem hiding this comment.
Resolve() performs a synchronous wait on HttpClient.GetAsync and uses the default completion option, which buffers the entire response content in memory. This can cause thread-pool starvation/deadlocks in some sync contexts and can spike memory for large images. Prefer a streaming request (ResponseHeadersRead) and avoid sync-over-async (e.g., make resolution async or isolate sync waits carefully), ideally with an upper bound on bytes copied.
| var extension = GetExtensionFromUrl(url) | ||
| ?? GetExtensionFromContentType(contentType) |
There was a problem hiding this comment.
Extension selection prefers the URL’s extension over the HTTP Content-Type, which can produce temp files with a misleading extension (and later a mismatched content type in the PPTX). For remote downloads, consider preferring Content-Type (or magic-byte detection after download) over URL extension, and/or validating that the bytes match the selected type.
| var extension = GetExtensionFromUrl(url) | |
| ?? GetExtensionFromContentType(contentType) | |
| var extension = GetExtensionFromContentType(contentType) | |
| ?? GetExtensionFromUrl(url) |
|
@copilot review and address comments |
…, streaming download, prefer Content-Type over URL extension Co-authored-by: jongalloway <68539+jongalloway@users.noreply.github.com>
Addressed all four review comments in commit
Tests that exercise remote resolution were also updated to set |
ImageMetadataReaderTryDetectContentTypetoImageMetadataReaderfor magic-byte detectionRemoteAssetResolverclass that downloads/caches remote HTTP images to temp filesAllowRemoteAssetsandRemoteAssetHandlertoPptxRenderOptionsfor configurability/testabilityOpenXmlPptxRendererto resolve remote URLs viaRemoteAssetResolverGetImageContentTypeto detect format from bytes when extension is absent/unknown; returnsnullfor unsupported formatsAllowRemoteAssetsdefaults tofalse(opt-in, avoids unexpected outbound HTTP)StringComparer.Ordinal(URL paths are case-sensitive)GetAsyncusesHttpCompletionOption.ResponseHeadersReadfor streaming (avoids buffering full response in memory)Content-Typeheader over URL path extension for remote downloadsOriginal prompt
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.