Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relative filepath to local file broken in PDF #5941

Closed
s-cassidy opened this issue Mar 12, 2020 · 8 comments · Fixed by #5945
Closed

Relative filepath to local file broken in PDF #5941

s-cassidy opened this issue Mar 12, 2020 · 8 comments · Fixed by #5945
Labels
Milestone

Comments

@s-cassidy
Copy link

s-cassidy commented Mar 12, 2020

  • KOReader version: 2020.03
  • Device: Linux

Issue

Quite a specific use-case, but currently it seems KOreader only supports absolute file paths for links to other files on the device. If I create a pdf file with a hyperlink to another pdf file within the same folder (using LaTeX hyperref package) giving the relative filepath ./filename.pdf then other pdf readers I have tried have the correct behaviour of opening filename.pdf if it exists in the folder. However, KOreader gives an error saying it is an invalid link. The link does work in KOreader if I replace it with an absolute file path.

This would be a useful feature/fix for me because I would like the same links to work whether I read the pdfs on my laptop or Kobo device.

@Frenzie
Copy link
Member

Frenzie commented Mar 12, 2020

Would you be able to attach a couple of test PDFs to illustrate the problem? It's a matter of when someone gets a few minutes to look at it they don't have to spend them making a test case. ;-)

@poire-z
Copy link
Contributor

poire-z commented Mar 12, 2020

We're supposed to handle them correctly (at least, I checked that on local html files linking to other html files):

-- For relative local file links
local directory, filename = util.splitFilePathName(self.ui.document.file) -- luacheck: no unused
self.document_dir = directory

function ReaderLink:onGotoLink(link, neglect_current_location, allow_footnote_popup)
logger.dbg("onGotoLink:", link)
local link_url
if self.ui.document.info.has_pages then
-- internal pdf links have a "page" attribute, while external ones have an "uri" attribute
if link.page then -- Internal link
logger.dbg("Internal link:", link)
if not neglect_current_location then
self:addCurrentLocationToStack()
end
self.ui:handleEvent(Event:new("GotoPage", link.page + 1))
return true
end
link_url = link.uri -- external link
else

logger.dbg("External link:", link_url)
local is_http_link = link_url:find("^https?://") ~= nil
if is_http_link and self:onGoToExternalLink(link_url) then
return true
end
-- Check if it is a link to a local file
local linked_filename = link_url:gsub("^file:", "") -- remove local file protocol if any
local anchor
if linked_filename:find("?") then -- remove any query string (including any following anchor)
linked_filename, anchor = linked_filename:match("^(.-)(%?.*)$")
elseif linked_filename:find("#") then -- remove any anchor
linked_filename, anchor = linked_filename:match("^(.-)(#.*)$")
end
linked_filename = ffiutil.joinPath(self.document_dir, linked_filename) -- get full path
linked_filename = ffiutil.realpath(linked_filename) -- clean full path from ./ or ../
if linked_filename and lfs.attributes(linked_filename, "mode") == "file" then
local DocumentRegistry = require("document/documentregistry")
local provider = DocumentRegistry:getProvider(linked_filename)
if provider then
-- Display filename with anchor or query string, so the user gets
-- this information and can manually go to the appropriate place
local display_filename = linked_filename
if anchor then
display_filename = display_filename .. anchor
end
UIManager:show(ConfirmBox:new{
text = T(_("Would you like to read this local document?\n\n%1\n"), BD.filepath(display_filename)),
ok_callback = function()
UIManager:scheduleIn(0.1, function()
self.ui:switchDocument(linked_filename)
end)
end
})
else
UIManager:show(InfoMessage:new{
text = T(_("Link to unsupported local file:\n%1"), BD.url(link_url)),
})
end
return true
end

So, might be worth switching to debug mode to see what kind of link string we get from the PDF.

@s-cassidy
Copy link
Author

s-cassidy commented Mar 13, 2020

linktest.pdf
linktest2.pdf

Here is the example. The link points to ./linktest2.pdf. I have tested the link in the evince pdf viewer and it works. However this is what happens in KOreader
image

@Frenzie
Copy link
Member

Frenzie commented Mar 13, 2020

03/13/20-10:11:24 DEBUG onGotoLink: {
    ["y1"] = 143.85998535156,
    ["x1"] = 280.88299560547,
    ["y0"] = 132.16998291016,
    ["x0"] = 261.25399780273,
    ["uri"] = "file://./linktest2.pdf"
}
03/13/20-10:11:24 DEBUG External link: file://./linktest2.pdf

Works like this (which seems more accurate anyway… does CREngine just say file:?):

diff --git a/frontend/apps/reader/modules/readerlink.lua b/frontend/apps/reader/modules/readerlink.lua
index 26378f3b..98a405f0 100644
--- a/frontend/apps/reader/modules/readerlink.lua
+++ b/frontend/apps/reader/modules/readerlink.lua
@@ -524,7 +524,7 @@ function ReaderLink:onGotoLink(link, neglect_current_location, allow_footnote_po
     end
 
     -- Check if it is a link to a local file
-    local linked_filename = link_url:gsub("^file:", "") -- remove local file protocol if any
+    local linked_filename = link_url:gsub("^file://", "") -- remove local file protocol if any
     local anchor
     if linked_filename:find("?") then -- remove any query string (including any following anchor)
         linked_filename, anchor = linked_filename:match("^(.-)(%?.*)$")

@Frenzie Frenzie added this to the 2020.04 milestone Mar 13, 2020
@Frenzie Frenzie added the bug label Mar 13, 2020
@Frenzie Frenzie changed the title Allow relative filepath in links to local files [FR?] Relative filepath to local file broken in PDF Mar 13, 2020
@poire-z
Copy link
Contributor

poire-z commented Mar 13, 2020

does CREngine just say file:?

I think it gives what's in the HTML file untouched.
But I think we should support both file:somepath and file://somepath - one might be more per-specs than the other, but I think I've seen both.
May be link_url:gsub("^file:/?/?", "") would be enough ?

That was my text file:

<!DOCTYPE html>
<html>
<head>
<style>
p { color: red; text-indent: 1.2em;}
</style>
</head>
<body>

This is some text.<br/>
<a href="http://www.eazeazeaz.com">This is some text</a>.<br/>This is some text.<br/>This is some text.<br/>This is some text.<br/>
This is some text.<br/>
<a href="/">/</a>.<br/>
<a href="etc">etc</a>.<br/>
<a href="file:">file:</a>.<br/>
<a href="/etc/passwd">/etc/passwd</a>.<br/>
<a href="file:/etc/passwd">file:/etc/passwd</a>.<br/>
<a href="/etc/X11/rgb.txt">/etc/X11/rgb.txt</a>.<br/>
<a href="file:/etc/X11/rgb.txt">file://etc/X11/rgb.txt</a>.<br/>
<a href="/tmp/test.txt.txt">/tmp/test.txt.txt</a>.<br/>
<a href="file:/tmp/test.txt.txt">file:/tmp/test.txt.txt</a>.<br/>
<a href="file://tmp/test.txt.txt">file://tmp/test.txt.txt</a>.<br/>
<a href="file:///tmp/test.txt.txt">file:///tmp/test.txt.txt</a>.<br/>

<hr/>

<div class=thisisaDIV style="color: green">
This is some text.<br/>
This is <a href="Wikipedia/test-link2.html">test-link2</a> text.<br/>
This is <a href="Wikipedia/test-link2.html#toto tutu#titi tata">test-link2 with #</a> text.<br/>
This is <a href="Wikipedia/test-link2.html?q=eza&rez=eza#toto tutu#titi tata">test-link2 with ?</a> text.<br/>
This is some text.
</div>

</body>
</html>

@Frenzie
Copy link
Member

Frenzie commented Mar 13, 2020

<a href="file:/tmp/test.txt.txt">file:/tmp/test.txt.txt</a>.<br/>
<a href="file://tmp/test.txt.txt">file://tmp/test.txt.txt</a>.<br/>

Note that Firefox normalizes these to the correct file:///tmp/etc.. That's file:// as the protocol, and /tmp/etc as the path.

If Firefox does that then presumably so should we, but for PDF the incorrect varieties are probably less relevant.

The pattern you suggested should do the trick for that.

Frenzie added a commit to Frenzie/koreader that referenced this issue Mar 13, 2020
Only borked ones were supported. The new behavior mimics Firefox.

The following incorrect varieties will be treated the same as the correct `file:///tmp/test.txt`.
* file:/tmp/test.txt
* file://tmp/test.txt

Fixes <koreader#5941>.

See <koreader#5941 (comment)> for discussion.
@poire-z
Copy link
Contributor

poire-z commented Mar 13, 2020

Even the 2nd one <a href="file://tmp/test.txt.txt"> which, with the per-specs file:// scheme looks like it is a relative path tmp/text.txt.txt, FF normalizes it to a fullpath /tmp/test.txt.txt ?!

I'd may be be inclined to see file: and file:// as possible supported schemes, and what's after as the path - and file:/ as invalid/non supported , so, may be, it it works: link_url:gsub("^file:(//)?", "")

@Frenzie
Copy link
Member

Frenzie commented Mar 13, 2020

Ah, indeed. My bad. https://en.wikipedia.org/wiki/File_URI_scheme

Frenzie added a commit that referenced this issue Mar 13, 2020
Only one style was supported.

Fixes <#5941>.

See <#5941 (comment)> for discussion.
@Frenzie Frenzie modified the milestones: 2020.04, 2020.03.2 Mar 13, 2020
mwoz123 pushed a commit to mwoz123/koreader that referenced this issue Mar 29, 2020
Only one style was supported.

Fixes <koreader#5941>.

See <koreader#5941 (comment)> for discussion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants