Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't export SVG from docx file #7244

Closed
tjschuler opened this issue Apr 24, 2021 · 5 comments
Closed

Can't export SVG from docx file #7244

tjschuler opened this issue Apr 24, 2021 · 5 comments

Comments

@tjschuler
Copy link

I am experiencing an issue when a docx file has a SVG image in it, the extract-media option only extracts the PNG preview image.

I am using pandoc 2.13

A command you can use to reproduce it is: pandoc --extract-media bananas -f docx example.docx

I have attached a sample file to reproduce it with. example.docx

@jgm
Copy link
Owner

jgm commented Apr 26, 2021

for convenience, here's how it is represented:

        <w:drawing>
          <wp:inline distT="0" distB="0" distL="0" distR="0"
          wp14:anchorId="6189F2C1" wp14:editId="2D9EC143">
            <wp:extent cx="5334000" cy="3552825" />
            <wp:effectExtent l="0" t="0" r="0" b="9525" />
            <wp:docPr id="1" name="Graphic 1" />
            <wp:cNvGraphicFramePr>
              <a:graphicFrameLocks xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main"
              noChangeAspect="1" />
            </wp:cNvGraphicFramePr>
            <a:graphic xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main">

              <a:graphicData uri="http://schemas.openxmlformats.org/drawingml/2006/picture">

                <pic:pic xmlns:pic="http://schemas.openxmlformats.org/drawingml/2006/picture">

                  <pic:nvPicPr>
                    <pic:cNvPr id="1" name="Graphic 1" />
                    <pic:cNvPicPr />
                  </pic:nvPicPr>
                  <pic:blipFill>
                    <a:blip r:embed="rId8">
                      <a:extLst>
                        <a:ext uri="{28A0092B-C50C-407E-A947-70E740481C1C}">

                          <a14:useLocalDpi xmlns:a14="http://schemas.microsoft.com/office/drawing/2010/main"
                          val="0" />
                        </a:ext>
                        <a:ext uri="{96DAC541-7B7A-43D3-8B79-37D633B846F1}">

                          <asvg:svgBlip xmlns:asvg="http://schemas.microsoft.com/office/drawing/2016/SVG/main"
                          r:embed="rId9" />
                        </a:ext>
                      </a:extLst>
                    </a:blip>
                    <a:stretch>
                      <a:fillRect />
                    </a:stretch>
                  </pic:blipFill>
                  <pic:spPr>
                    <a:xfrm>
                      <a:off x="0" y="0" />
                      <a:ext cx="5334000" cy="3552825" />
                    </a:xfrm>
                    <a:prstGeom prst="rect">
                      <a:avLst />
                    </a:prstGeom>
                  </pic:spPr>
                </pic:pic>
              </a:graphicData>
            </a:graphic>
          </wp:inline>
        </w:drawing>

@fsoedjede
Copy link
Sponsor

Hello @jgm

Are you still considering implementing this issue?

My suggestion would be to add an option, like docx+styles, which will enable exporting the svg file in the media directory. In the Pandoc ast, the svg path could be added as an attribute of the element.

The new option could be used like this: docx+extract_svgs.

Without extract_svgs, everything should work as currently.

Example

The native structure without the option:

Image
    ( "" , [] , [("width","30"),("height","40")] )
    [ Str "Image" , Space , Str "title" ]
    ( "media/image1.png" , "fig:" )

The native structure with the option:

Image
    ( "" , [] , [("width","30"),("height","40"),("svgpath", "media/image1.svg")] )
    [ Str "Image" , Space , Str "title" ]
    ( "media/image1.png" , "fig:" )

Having that, one can follow the epub best practice described there in #2766 and go from docx to markdown to epub (or latex) without loosing the svg file

What do you think?

@jgm
Copy link
Owner

jgm commented Jul 14, 2023

I'm actually more tempted just to (by default) extract the svg instead of the png, since the png has the role of a fallback.
Then no filter would be required. Not sure if this would break anything.

@fsoedjede
Copy link
Sponsor

fsoedjede commented Jul 14, 2023

I'm totally ok for your suggestion.
It can be considered as an improvement

As many format supports svg, it should be ok.
If one format does not handle it well, rsvg-convert could be used as currently for PDF (https://github.com/jgm/pandoc/blob/3.1.5/src/Text/Pandoc/PDF.hs#L223)

@jgm jgm closed this as completed in c6ac174 Jul 14, 2023
@fsoedjede
Copy link
Sponsor

Thanks @jgm for the quick implementation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants