Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File corruption with markdown to pptx conversion having PNG data url and template #9113

Closed
BrianDiggs opened this issue Sep 29, 2023 · 4 comments
Labels

Comments

@BrianDiggs
Copy link

Summary

When converting a Markdown file to PowerPoint (pptx), if there is an embedded PNG image which is included using a data: URL and a template is used for the output, a "corrupted" pptx file is created. "Corrupted" is in quotes because upon opening the file in PowerPoint there is a dialog box stating "PowerPoint found a problem with content in XXX.pptx. PowerPoint can attempt to repair the presentation. If you trust the source of this presentation, click Repair." Clicking "Repair" gives the presentation that is expected.

Software Versions

  • OS: Windows 10 Enterprise
  • Pandoc 3.1.8 from pandoc-3.1.8-windows-x86_64.zip
  • Microsoft® PowerPoint® for Microsoft 365 MSO (Version 2302 Build 16.0.16130.20690) 32-bit

Reproducible example

I have attempted to use files from the Pandoc testing source as much as possible to aid in reproduction. However, I assembled the files into one place for the purposes of making this example. The files used are:

  • test/pptx/reference-depth.pptx
  • test/command/chap2/spider.png

Command:

pandoc.exe test-dataurl.md --reference-doc=reference-depth.pptx -o test-dataurl-templated.pptx

test-dataurl.md is included.

Expected result

A pptx file which has a single slide with the picture of the spider that opens without requiring repair

Actual result

A pptx file which has the correct content, but requires "Repair" when opening; file attached.

Notes

This requires a very specific combination of things; changing any one of them will result in a pptx file which opens without requiring "Repair"

The reference to the image must be a data: URL. If it is just pointing to a local file, there is no problems. I have verified that the base64 encoded string is correct for the image. I have generated it directly as well as using Pandoc itself by creating a --self-contained HTML version of the Markdown file and copying the data URL from that.

The reference to the image must be a PNG file. Data URLs of GIF and JPEG files were tested and did not create a file requiring "Repair". I have tried it with other PNG files and the problem is not specific to this PNG file.

A customized template file must be used. If no --reference-doc is given, the generated file does not require "Repair". In my reproducible example, I used the reference-depth.pptx file because it is used in the automated testing and shows the errant behavior. However, I can create the same issue by taking the reference template (as described in the Pandoc manual)

pandoc -o custom-reference.pptx --print-default-data-file reference.pptx

opening custom-reference.pptx up in PowerPoint, going to the Master Slide view, and adding a background image to the Master slide. I think an image has to be involved. When I only added a colored box, I did not have an issue. I have attached that template as well.

I have versions of the output pptx showing each of these three requirements are necessary if those would aid in diagnosing or debugging. I have not included them initially to reduce the number of attachments.

Workaround

For now, I am going to attempt to change the upstream process which generates my Markdown files to encode a GIF image format instead of a PNG. If possible, that should avoid, but not solve, the issue.

test-dataurl.md
test-dataurl-templated.pptx
custom-reference.pptx

@BrianDiggs BrianDiggs added the bug label Sep 29, 2023
@BrianDiggs
Copy link
Author

I verified this bug is present on Mac as well

MacOS 12.6.8
Pandoc 3.1.8 installed via homebrew

Tested using the attached test-dataurl.md and custom-reference.pptx.

pandoc test-dataurl.md --reference-doc=custom-reference.pptx -o test-dataurl-templated-mac.pptx

Microsoft Office (PowerPoint) was not available on the mac, but the "Repair" request was present when attempting to open the resulting file on the original Windows machine. Attached the output file from that run.

test-dataurl-templated-mac.pptx

@jgm
Copy link
Owner

jgm commented Oct 17, 2023

I notice that in [Content Types].xml we have

<Default Extension="png" ContentType="image/png;base64" /><Default Extension="png" ContentType="image/png" />

And this might be the problem -- conflicting specifications of the default extension for png.

Can you reproduce the issue without a reference.docx, if you have a simple document with one regular png image and one included via a data uri?

@BrianDiggs
Copy link
Author

That's a combination I had not even considered. But you were correct; this combination also gives a PowerPoint file which claims it needs to be repaired. That greatly simplifies the reproducible example by completely removing the --reference-doc.

New Reproducible Example

Needed files from source:

  • test/command/chap2/spider.png

Source file test-dataurl-and-file.md included below

pandoc.exe test-dataurl-and-file.md -o test-dataurl-and-file.pptx

Output file test-dataurl-and-file.pptx is attached.

Notes

I tried this with a .jpg file (using lalune.jpg from the source). Using a markdown file with both a local file and a data uri included, the created PowerPoint file did not request a repair when opened. It does appear to be limited to .png files.

test-dataurl-and-file.md
test-dataurl-and-file.pptx

@jgm jgm closed this as completed in 1529ff4 Oct 18, 2023
@jgm
Copy link
Owner

jgm commented Oct 18, 2023

Fixed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants