Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

# converting to %23 in links to external PDFs #110

Closed
apkawel opened this issue Nov 26, 2019 · 20 comments
Closed

# converting to %23 in links to external PDFs #110

apkawel opened this issue Nov 26, 2019 · 20 comments

Comments

@apkawel
Copy link

apkawel commented Nov 26, 2019

I am trying to link to a particular page of a PDF file. If you cut and paste the following link in your browser, it loads the file and scrolls to the correct page:

https://edca.3dca.flcourts.org/DcaDocs/2019/0248/2019-248_Brief_230675_RC09202D20Record20on20Appeal.pdf#page=298

I try to link to the file in LaTeX / hyperref:

\href{https://edca.3dca.flcourts.org/DcaDocs/2019/0248/2019-248_Brief_230675_RC09202D20Record20on20Appeal.pdf#page=298}{R.~295 {PDF 298}} (stipulation).

But when the document compiles and I click the link, the # gets converted to %23 in the browser, and the PDF does not load on the external server. The same thing happens if I escape #.

I don't recall this happening before. Shouldn't the hyperlink above render exactly as written?

@davidcarlisle
Copy link
Member

davidcarlisle commented Nov 26, 2019

You didn't provide a test document but I tried

\documentclass{article}
\usepackage{hyperref}
\begin{document}
\href{https://edca.3dca.flcourts.org/DcaDocs/2019/0248/2019-248_Brief_230675_RC09202D20Record20on20Appeal.pdf#page=298}{R.~295 {PDF 298}} (stipulation).
\end{document}

with pdflatex texlive 2019 and using firefox to read the pdf the link works and opened the document on the requested page.

It may be a failing in your pdf viewer hard to say with no information.

@u-fischer
Copy link
Member

Please always show a complete example.
I extended your snippet and tested on windows 10 with sumatra and adobe reader and the link is fine.

@tchlux
Copy link

tchlux commented Mar 1, 2020

Minimum working example

\documentclass{article}
\usepackage{hyperref}

\begin{document}

\href{https://www.google.com/#searchform}{Google followed by /\#searchform.}

\end{document}

Result

image

image

System

MacOS Catalina, Version 10.15.3.

Preview as PDF viewer, version 11.0 (999.4).

Safari as browser, version 13.0.5 (15608.5.11).

$ pdflatex --version

pdfTeX 3.14159265-2.6-1.40.17 (TeX Live 2016)
kpathsea version 6.2.2
Copyright 2016 Han The Thanh (pdfTeX) et al.
There is NO warranty.  Redistribution of this software is
covered by the terms of both the pdfTeX copyright and
the Lesser GNU General Public License.
For more information about these matters, see the file
named COPYING and the pdfTeX source.
Primary author of pdfTeX: Han The Thanh (pdfTeX) et al.
Compiled with libpng 1.6.21; using libpng 1.6.21
Compiled with zlib 1.2.8; using zlib 1.2.8
Compiled with xpdf version 3.04

@u-fischer
Copy link
Member

@tchlux sorry but texlive 2016 is old. In a current texlive with a current hyperref it works fine for me.

@tchlux
Copy link

tchlux commented Mar 1, 2020

(continued from my post above)

Installed TeX Live 2019 (brew cask install mactex-no-gui) and get the same error as before when viewing through Safari and Preview applications.

@u-fischer did you test with the PDF viewers in Safari or Preview?

Installed the Firefox web browser (brew cask install firefox). When viewing the document through Firefox the problem disappears and the link works correctly. This might be an issue with the translation of PDF embedded links in Safari and Preview.

System

firefox web browser, version 73.0.1 (64-bit).

$ pdflatex --version

pdfTeX 3.14159265-2.6-1.40.20 (TeX Live 2019)
kpathsea version 6.3.1
Copyright 2019 Han The Thanh (pdfTeX) et al.
There is NO warranty.  Redistribution of this software is
covered by the terms of both the pdfTeX copyright and
the Lesser GNU General Public License.
For more information about these matters, see the file
named COPYING and the pdfTeX source.
Primary author of pdfTeX: Han The Thanh (pdfTeX) et al.
Compiled with libpng 1.6.36; using libpng 1.6.36
Compiled with zlib 1.2.11; using zlib 1.2.11
Compiled with xpdf version 4.01

@muzimuzhi
Copy link
Contributor

Using this pdf file (hyperref-hash-in-url.pdf) produced by running pdflatex (pdftex 3.14159265-2.6-1.40.20) on your example, I find that

  • Adobe Acrobat Reader DC v2020.006.20034 always keeps the # and the url works with every browser, but
  • Preview.app v11.0 and Skim v1.5.6 always replace the # in url by %23 and the url causes problems in every brower

no matter which one of the three browsers Chrome v79.0.3945.130, Safari v13.0.4, and FireFox v73.0.1 is set to be the default browser. I am in macOS 10.15.2.

So this might be an issue with the encoding used in url, or with different pdf readers.


@tchlux

When viewing the document through Firefox the problem disappears and the link works correctly.

On my side, when using Preview.app with FireFox, the link dose not works correctly.


Other info

With the help of python library PyPDF2, the following python script prints https://www.google.com/#searchform. It seems the encoding of written url is ok.

#!/usr/bin/env python3

from PyPDF2 import PdfFileReader

f_in = 'hyperref-hash-in-url.pdf'
pdf = PdfFileReader(f_in)
page = pdf.getPage(0)
annot = page['/Annots'][0].getObject()

print(annot['/A']['/URI'])

By the URL Standard, sec. 4.3, the U+0023 (#) before url-fragment string in each valid url should always be explicit, not percent-encoded. Hence the behaviors of all three tested browsers are also alright.

A valid URL string must be either a relative-URL-with-fragment string or an absolute-URL-with-fragment string.

  • A relative-URL-with-fragment string must be a relative-URL string, optionally followed by U+0023 (#) and a URL-fragment string.
  • An absolute-URL-with-fragment string must be an absolute-URL string, optionally followed by U+0023 (#) and a URL-fragment string.

@u-fischer
Copy link
Member

@u-fischer did you test with the PDF viewers in Safari or Preview?

I'm on windows. But as @muzimuzhi wrote: The url in the pdf is correctly encoded. So the problem must be due to a bug of your pdf viewer.

@PhelypeOleinik
Copy link
Member

This issue also popped up a few days ago in stackexchange (link), where the
user reported the TeXShop viewer doing the same thing, and also a comment
under the question says that this happened with "a Mac user who used some
PDF viewer which had a tendency to escape # in URLs". I think this might
be a valid bug report for both TeXShop and Preview...

@tchlux
Copy link

tchlux commented Mar 2, 2020

When viewing the document through Firefox the problem disappears and the link works correctly.

On my side, when using Preview.app with FireFox, the link dose not works correctly.

@muzimuzhi when I say "viewing the document", I mean that I am viewing the PDF within Firefox, and clicking the linked text simply traverses the link in the same tab. Our outcomes agree. Thanks for the thorough tests!

I agree with @PhelypeOleinik, this appears to be a bug in Preview, Safari, TexShop, and Skim where the # symbols in embedded links are getting incorrectly converted to %23 in the corresponding URLs.

@davidcarlisle
Copy link
Member

@tchlux while it may be a pdf reader issue primarily, that doesn't mean it is necessarily impossible that something could be done from this side. In particular is it possible to make (or find) PDF made with other applications that do have # fragid links in URL that work here, if so we could look what internal PDF markup they are using..... (I don't have easy access to a Mac to try myself).

@muzimuzhi
Copy link
Contributor

muzimuzhi commented Mar 2, 2020

In brief

The following info indicates that

  • it is highly possible that pdftex writes the right URI, with right encoding to output pdf;
  • the problem is pdf reader specific, and the maintainer of Skim.app suggested it is a bug from Apple.

Hence I have reported to Apple through its "Feedback Assistant.app". It seems this report is not publicly accessible.

Accumulated test results, under macOS

PDF Reader Version Browser Is hash (#) converted?
Adobe Acrobat Reader DC 2020.006.20034 All No
TeXstudio's internal PDF viewer 2.12.22 All No
TeXwork's internal PDF viewer 0.6.3 All No
Firefox 73.0.1 Firefox No
Preview.app 11.0 All Yes
Skim 1.5.6 All Yes
Safari 13.0.4 Safari Yes
  • Here "All browers" includes Chrome, Firefox, and Safari.

Other version info

  • macOS 10.15.2
  • Chrome v79.0.3945.130

About Skim, the macOS only pdf reader

I have found a similar bug reported to Skim in Nov 2019, and the maintainer of Skim responded that it is a bug from Apple

It is certainly not a problem specific of Skim and nothing we can fix. We
simply pass the URL verbatim on to the system to handle. So it is either
PDFKit escaping characters erroneously, or the system code handling the
URL. Either are bugs from Aplle, which can only be fixed by Apple.

About encodings used in URL (or URI)

From the PDF Reference, in a URI Action,

  • URI has type "ASCII string", should be encoded in 7-bit ASCII, and is described in standard RFC 2396
    (Table 8.56 in PDF Ref. v1.7, Sec. 8.5.3)
  • URI has type "ASCII string", should be encoded in UTF8, and is described in standard RFC 3986
    (Table 210 in PDF Ref. v2.0, Sec. 12.6.4.8)

From RFC 3986, sec. 2.2 (which obsoletes RFC 2396), hash (U+0023, #) is defined as one of the reserved characters and should not be percent-encoded while used as delimiter:

URIs that differ in the replacement of a reserved character with its
corresponding percent-encoded octet are not equivalent. Percent-
encoding a reserved character, or decoding a percent-encoded octet
that corresponds to a reserved character, will change how the URI is
interpreted by most applications.

Also, prepend \pdfcompresslevel=0 to the example tex provided in #110 (comment), compile with pdflatex, then open the output pdf with a text editor in UTF8 encoding, one can directly see the content of that pdf annotation

% 4 0 obj
<<
/Type /Annot
/Border[0 0 1]/H/I/C[0 1 1]
/Rect [147.716 653.748 298.775 665.704]
/Subtype/Link/A<</Type/Action/S/URI/URI(https://www.google.com/#searchform)>>
>>

@tchlux
Copy link

tchlux commented Mar 2, 2020

Update: This post just further verifies what @muzimuzhi says above.


@davidcarlisle good point. I have run some more tests involving "printing" a web page to a PDF with different browsers. Here is my test webpage:

<a href="https://www.google.com/#searchform">Google fragment ID link</a>

When generating PDFs with different browsers, here are the results:

'Printed' with Google Chrome, Version 80.0.3987.122 (Official Build) (64-bit)

PDF 'Viewed' with:

  • Google Chrome – link works as expected
  • Firefox – link works as expected
  • Safari – link breaks, # is replaced by %23
  • Preview – link breaks, # is replaced by %23

'Printed' with Firefox, Version 73.0.1 (64-bit)

Link is not included in the output PDF.

'Printed' with Safari, Version 13.0.5 (15608.5.11)

The URL within the file has had # replaced by %23, making all viewing applications open an invalid link. I verified this with the same method used by @muzimuzhi above. Looks like this bug runs even deeper for Apple applications..


I see this as further evidence that Safari, Preview, TexShop, and Skim are bugged and there isn't an easy way around the issue.

If someone has a reason to believe they have a PDF with working # fragment ID links after considering these results, post it somewhere and I can test it. It appears that any # symbol in a URL will be (incorrectly) replaced with %23 by some applications.

@i-chaochen
Copy link

i-chaochen commented May 28, 2020

Same problem here.

# in the href becomes %23

MacOS with Skim as pdf-reader

Anyone knows how to solve this?

@WSDeWitt
Copy link

I'm also struggling with this issue (# to %23 in urls clicked from the pdf in Mac Preview or Safari, but no problem clicking from the pdf loaded in Chrome). I found that using the url package (\usepackage{url}) instead of hyperref does not result in this behavior (link opens correctly using Preview and Safari).

@u-fischer
Copy link
Member

@WSDeWitt if you use only the url package there are no real links (annotations) in the pdf. The pdf viewer will then simply guess that some text is perhaps meant to be a link. This can also work if you simply type https://... in your document.

@WSDeWitt
Copy link

OIC, so that's just Preview parsing the link.

@Teagum
Copy link

Teagum commented Aug 2, 2020

I have the same issue with a document compiled with xelatex from TeX Live 2019 on macOS 10.15.5. # in \hyperref{}{} targets are incorrectly converted to %23 in the documents' link. Same behaviour in Preview and TexShop's pdf viewer. The links is correct when I open the document in Adobe Acrobat Reader.

@u-fischer
Copy link
Member

This is as the discussion shows not an hyperref issue: it creates the correct link. But some pdf viewers on mac don't handle this correctly. The problem should be reported elsewhere and I'm closing here.

@ShaggyMunky
Copy link

I encountered this issue as well. I was not using hyperref, but it was a major cause for concern since the majority of the audience that we were reaching utilized Safari on iOS, the workaround we came to was using a bitly link.

@Teagum
Copy link

Teagum commented Aug 4, 2020

the workaround we came to was using a bitly link.

That's a great idea. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants