Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

We should tag/remove the "external" links somehow #65

Open
kelson42 opened this issue Oct 27, 2020 · 7 comments
Open

We should tag/remove the "external" links somehow #65

kelson42 opened this issue Oct 27, 2020 · 7 comments
Assignees
Labels
enhancement New feature or request stale
Milestone

Comments

@kelson42
Copy link
Contributor

This is pretty annoying not to know before clicking on a link if we will get a 404 (because offline) or if this is a resource which is in the ZIM. This kill a bit the user experience.

I see two solutions to solve the problem:

  • Make like on wikipedia: add a small icon beside the link to indicate this is an external link
  • Remove the external link at all.

I suspect if one can be implemented the other can as well pretty easily, therefore we should maybe let the user decides how he want to deal with the problem.

@kelson42 kelson42 changed the title We should marc/handle the "external" links somehow We should tag/remove the "external" links somehow Oct 27, 2020
@kelson42 kelson42 added the enhancement New feature or request label Oct 27, 2020
@rgaudin
Copy link
Member

rgaudin commented Oct 27, 2020

I am not in favor of both propositions for the same reason: it's too fragile. Links can take many visual form and attempting to attach a visual artifact to them will most likely degrade the experience. Same goes for removing elements. You never know what you might break by doing this.

WP is radically different in that it uses a known, minimal and stable structure and style so comparing both is not relevant.

I am fine with the current behavior though it's good to known that in a warc2zim ZIM, all navigation goes through the service worker that looks up the resource and if not present, uses the 404.html. Current implementation takes care of forwarding different-domain requests to the kiwix-serve blocking feature (if on kiwix-serve) and otherwise just does a redirect.

It is possible to have a different behavior here (so post-click).

@kelson42
Copy link
Contributor Author

I confirm, I complain of having so many links pointing to a 404 page. This is tolerable time to time, but not if this happen every few clicks.

I agree about the visual marker.

I disagree with removing the link in itself. We do that with MWoffliner, which by the way you can have a link on basically everything, and this works fine. Considering that removing a link has no visual effect, I hardly see how this could be worse than keeping it.

@rgaudin
Copy link
Member

rgaudin commented Oct 27, 2020

If it happens too often, it might be a good indicator that the ZIM is poorly scoped or has too much external links.

Again, WP is a very specific. Wild websites have links everywhere, which contains images, text, list, etc. What do you remove and what do you keep ?

Let's say you have

<a href="broken"><img src="somehugeimage.svg" /></a>

with a CSS rule as

a img { height: 1em; }

What happens when you remove the <a /> element but keep its innerHTML? You get an improperly styled image because as implied in the name, CSS is cascading that's why it's not guaranteed to be safe to remove a link in a random/generic website.

@kelson42
Copy link
Contributor Author

kelson42 commented Oct 27, 2020

yes, but

```css
a img { height: 1em; }

You are right, here there is an impact. But this is a silly rule - not really convincing to me.

@ikreymer
Copy link
Collaborator

This is actually a rather big challenge, and what we've called the 'boundary problem', how to communicate to users what the boundary of a particular archive is. To make this work, I think there needs to be a tool that runs client-side and is able to respond to changes in the DOM, and follows certain rules. (Many tags are generated dynamically so it is not possible to do it at scraper time).

This has been a general issue, and my colleagues at Rhizome had a research project to try to implement an approach, and a prototype tool was created to address this problem: https://github.com/Rhizome-Conifer/Periphery over the summer.

While this was created for the python wayback, it runs on the client and much of it would apply here. The prototype included various options, including UI overlays, tooltips, and dynamically checking links on a page to see if they are valid.

Currently, this is a prototype, but I can reach out to Matt, the developer of this tool to see if perhaps it can be adopted to work with wabac.js and then integrated into warc2zim.

@stale
Copy link

stale bot commented Dec 28, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

@stale stale bot added stale and removed stale labels Dec 28, 2020
@stale
Copy link

stale bot commented Mar 19, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request stale
Projects
None yet
Development

No branches or pull requests

3 participants