Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to open external links in Athena zim file #3485

Closed
MohitMaliFtechiz opened this issue Sep 20, 2023 · 9 comments
Closed

Unable to open external links in Athena zim file #3485

MohitMaliFtechiz opened this issue Sep 20, 2023 · 9 comments
Assignees
Milestone

Comments

@MohitMaliFtechiz
Copy link
Collaborator

Describe the bug
The Kiwix application is unable to open the external URL for the Athena zim file since the external URLs are different in this zim file.

  • The external links URL format for this file is https://kiwix.app/A/mp_/https://www.unige.ch/sciences/terre/research/Groups/mineral_resources/archive/duparc/duparc.php It looks like inside zim URL that's why the application tries to open it inside it, but it is an external URL.

Expected behavior
It should open these external URLs in the external browser instead of within the application as they are external URLs not inside the zim files.

Steps to reproduce the behavior:

  • Download athena_fr_all_2023-05.zim.

  • Open it in the application.

  • Try to open also Archive at SEES, and Earth and Environmental Sciences articles.

  • You will see the error.

  • I have tested the Athena zim file on https://pwa.kiwix.org/ and I found these URLs are external and should open in the external browser. as shown in the below screenshots.

    also Archive at SEES Earth and Environmental Sciences
    Screenshot from 2023-09-20 18-24-53 Screenshot from 2023-09-20 18-24-58

Environment

  • Version of Kiwix Android: 3.8.0
  • Device: Redmi Note 9
  • OS version: Android 12
external.link.issue.in.athena.mp4

@kelson42 Can you please share which methodology is used in https://pwa.kiwix.org/ to check the external URLs?

@kelson42
Copy link
Collaborator

@MohitMaliFtechiz Not related to #2519 ?

@MohitMaliFtechiz
Copy link
Collaborator Author

@kelson42 It is a different issue, #2519 issue with a long click functionality is not working with the service worker. This issue is related to external URLs being different from normal URLs. As you can see this URL https://kiwix.app/A/mp_/https://www.unige.ch/sciences/terre/research/Groups/mineral_resources/archive/duparc/duparc.php. It looks like normal zim file URL.

@MohitMaliFtechiz MohitMaliFtechiz self-assigned this Sep 20, 2023
@kelson42 kelson42 added this to the 3.9.0 milestone Sep 20, 2023
@kelson42
Copy link
Collaborator

@rgaudin @mgautierfr I'm puzzled by the gact that external links looks like internal links! How should we deal with that?

@rgaudin
Copy link
Member

rgaudin commented Sep 20, 2023

In warc2zim ZIMs, original links are stored in the HTML entry in the ZIM. At run time, once the SW is installed, it injects Wombat.js which transforms all the links to that form depicted above. At this stage, there is no knowledge of what's in the ZIM and what's not.
It's only upon request (click here) that the SW queries the backend and should it receive a 404, it renders a special page.
That page for warc2zim is 404.html. It includes a few things:

  • a hack to remove the (previous) kiwix-serve toolbar
  • check that it's either in root document or in a first level iframe (should work for old kiwix-serve, new kiwix-serve and android)
    • then, only if the url and the current URL are on different origin/domain:
      • if on kiwix-serve with the block external feature, redirect to kiwix-serve block handler
      • otherwise, redirect to that URL

Unmet conditions (deeper iframe, same origin/domain) then it displays the un-styled error message “Sorry, the url xxx is not found on this server”.

In the screencast, we see an android/webview error message for a wombat-rewritten SW-only URL. We should check wether the SW is still installed at this moment and its installed prefix. I don't quite recall but I think we dont multitask and the prefix is /A on kiwix.app for the current view.

@Jaifroid
Copy link
Member

Jaifroid commented Nov 1, 2023

Can you please share which methodology is used in https://pwa.kiwix.org/ to check the external URLs?

That's a question for me, as I wrote the PWA implementation. Methodology I used was very similar to that described by @rgaudin. Basically, you can't tell in advance whether a link in a warc2zim ZIM archive is in the archive or not, because it has exactly the same format as other links, whether or not the wombat script and Service Worker are used to transform it. In the case of the PWA, I had to emulate the transforms, as I couldn't get the Service Worker to run (due to conflict with our own Service Worker). The only solution was to search in the ZIM for the referenced link (it is stored as something like C/A/www.example.com/etc/etc/), and if not found, decide that it was not scraped and so can be considered an external link. The "not found" part is complex, because you can get redirects inside the ZIM stored as headers like C/H/www.example.com/etc/etc/, which have to be inspected before you can be sure the requested resource is not in the ZIM.

@kelson42 kelson42 self-assigned this Jan 13, 2024
@kelson42 kelson42 modified the milestones: 3.12.0, 3.10.0 Jan 13, 2024
@kelson42
Copy link
Collaborator

@rgaudin @benoit74 @mgautierfr Will Zimit 2.0 fix the issue (with openzim/warc2zim#122 ?)? Should we just close this ticket and move forward with Zimit 2.0?

@Jaifroid
Copy link
Member

My guess is we'll find a number of edge cases that may need special treatment. There are some weird cases like openzim/warc2zim#209, which it's really hard to account for without a Service Worker to trap the requests (and even so, resolving the URL that way doesn't play nicely with some implementations of CSP, unless you know exactly what event to cancel in order to test a URL in the reader).

@rgaudin
Copy link
Member

rgaudin commented Jan 13, 2024

I don't know what's specific to android but I see that openzim/warc2zim#137 is closed.
It should already be possible to create a zimit2 athena ZIM and check. That would be the reasonable way to close this.

Regarding potential edge cases, I'd be in favor of opening specific tickets so those are clearly scoped and actionnable by developers.

@mgautierfr
Copy link
Member

We agree that there is no more things to do here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants