Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

download remote images in converting markdown to latex/pdf #750

Closed
cboettig opened this issue Feb 14, 2013 · 14 comments
Closed

download remote images in converting markdown to latex/pdf #750

cboettig opened this issue Feb 14, 2013 · 14 comments

Comments

@cboettig
Copy link

Markdown (or possibly html) files with remote images cannot be directly converted to pdfs. As the --self-contained option already supports downloading of image files, perhaps a similar trick could be used to download and store local images for embedding in the latex or pdf document?

@nghuuphuoc
Copy link

+1 for this feature

@jgm
Copy link
Owner

jgm commented Sep 16, 2013

This is already implemented in 1.12. I guess I forgot to close this bug.

@jgm jgm closed this as completed Sep 16, 2013
@andreaskoch
Copy link

It is possible that this feature has been lost again?

I am trying to convert a web page (remote html file) to rich text file (rtf) using pandoc version 1.12.4.2 on Windows and pandoc does download and embed images in the output file:

pandoc -s "http://example.com/some-page-with-images.html" -o output.rtf

I have also experimented with the --self-contained and --data-dir flags, but these don't seem to make a difference for this scenario:

pandoc -s "http://example.com/some-page-with-images.html" -o output.rtf --self-contained
pandoc -s "http://example.com/some-page-with-images.html" -o output.rtf --self-contained --data-dir="C:\data"

All I get is red colored reference to the image that has not been downloaded:

[image: http://example.com/image_thumb.png]

The rest of the document is converted correctly.

If I download the web-page and its images to disk and then convert the the HTML file to RTF locally it works just fine - all images are properly included in the rich text document:

pandoc -s "C:\temp\some-page-with-images.html" -o output.rtf

It also doesn't work if the images are embedded into the web page using base64 data uris.
Am I missing something or is this just not possible at the moment?

@jgm
Copy link
Owner

jgm commented Aug 6, 2014

I just tried it with the dev version and the images came
through. I've made some improvements here recently (though I would
have thought that 1.12.4.2 would have worked). Anyway, I think
you'll be satisfied with the coming release. If you want to send
the the URL you were trying, I can test.

@andreaskoch
Copy link

You can reproduce the steps with with the haskell website: http://www.haskell.org/platform/

 pandoc -s http://www.haskell.org/platform/ -o out.rtf

remote

Converting the website locally works:

pandoc -s .\Haskell.htm -o out.rtf

local

environment

jgm added a commit that referenced this issue Aug 6, 2014
We need this information for relative URLs!

This should resolve the continuing problem noted in #750.
@andreaskoch
Copy link

The problem is solved in the latest 1.13 release . All remote images are properly embedded into the RTF document :-)

Thank you very much for this great tool!

@puterleat
Copy link

This problem seems to have reappeared, at least in version 1.16.0.2. This example again illustrates:

pandoc -s http://www.haskell.org/platform/ -o out.rtf

@jgm
Copy link
Owner

jgm commented Sep 20, 2016

The problem is that we can only embed raster images in
RTF, and these images are SVG.

+++ puterleat [Sep 20 16 00:50 ]:

This problem seems to have reappeared, at least in version 1.16.0.2.
This example again illustrates:

pandoc -s [1]http://www.haskell.org/platform/ -o out.rtf


You are receiving this because you commented.
Reply to this email directly, [2]view it on GitHub, or [3]mute the
thread.

References

  1. http://www.haskell.org/platform/
  2. download remote images in converting markdown to latex/pdf #750 (comment)
  3. https://github.com/notifications/unsubscribe-auth/AAAL5HiT2uZr1j4vWNZfp9b41BsDC45rks5qr5BLgaJpZM4AbohJ

@puterleat
Copy link

I don't think it's just an SVG issue though, e.g.:

pandoc -s http://oldschool.runescape.com -o out.rtf

@jgm
Copy link
Owner

jgm commented Sep 20, 2016

That works fine. Are you looking at it in OSX's TextEdit by
chance? That does not show images. Try opening it in Word
or LibreOffice or something.

If you look at out.rtf in an editor, you'll see the data for
the images.

+++ puterleat [Sep 20 16 01:33 ]:

I don't think it's just an SVG issue though, e.g.:

pandoc -s [1]http://oldschool.runescape.com -o out.rtf


You are receiving this because you commented.
Reply to this email directly, [2]view it on GitHub, or [3]mute the
thread.

References

  1. http://oldschool.runescape.com/
  2. download remote images in converting markdown to latex/pdf #750 (comment)
  3. https://github.com/notifications/unsubscribe-auth/AAAL5B8q7uXvTLijqwuS_JKcbi0UPdx2ks5qr5o9gaJpZM4AbohJ

@Lambik
Copy link

Lambik commented Sep 20, 2016

Hi,

I don't know if this is related, or if I'm hijacking this thread, but I just installed pandoc from ubuntu's apt-get, and tried this command:

pandoc -s --toc -o test.epub -r html 'http://www.squidi.net/three/entry.php?id=1'

The output does not contain the images. Maybe because they are relative (like <img src='set01/img/entry001-illusion.gif' alt='[illusion.gif]'>)?

The errors I see are:

pandoc: Could not find media `poster/poster001.png', skipping...
pandoc: Could not find media `set01/img/entry001-illusion.gif', skipping...
(and so on for the other images)

I've tried adding the --self-contained option, but that doesn't seem to work (probably because it's epub output).

Is this related? (and is there a solution?)

@jgm
Copy link
Owner

jgm commented Sep 20, 2016

Not directly related. I'd suggest opening a new issue.
Pandoc can handle relative links, but I think it's doing
it wrong in this case.

+++ Tom Muylle [Sep 20 16 06:21 ]:

Hi,

I don't know if this is related, or if I'm hijacking this thread, but I
just installed pandoc from ubuntu's apt-get, and tried this command:
pandoc -s --toc -o test.epub -r html 'http://www.squidi.net/three/entry.php?id=1
'

The output does not contain the images. Maybe because they are relative
(like [illusion.gif])?

The errors I see are:
pandoc: Could not find media poster/poster001.png', skipping... pandoc: Could not find mediaset01/img/entry001-illusion.gif', skipping...
(and so on for the other images)

I've tried adding the --self-contained option, but that doesn't seem to
work (probably because it's epub output).

Is this related? (and is there a solution?)


You are receiving this because you commented.
Reply to this email directly, [1]view it on GitHub, or [2]mute the
thread.

References

  1. download remote images in converting markdown to latex/pdf #750 (comment)
  2. https://github.com/notifications/unsubscribe-auth/AAAL5FvRwWiBeRt-IBGvueZrgFcOfsSbks5qr93zgaJpZM4AbohJ

@Lambik
Copy link

Lambik commented Sep 20, 2016

Thanks, sorry for the hijack

@rushid93
Copy link

I am facing the same issue while converting the html file to the rtf using the pandoc-1.18.
I am getting red coloured text instead of the image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants