-
-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
We should tag/remove the "external" links somehow #65
Comments
I am not in favor of both propositions for the same reason: it's too fragile. Links can take many visual form and attempting to attach a visual artifact to them will most likely degrade the experience. Same goes for removing elements. You never know what you might break by doing this. WP is radically different in that it uses a known, minimal and stable structure and style so comparing both is not relevant. I am fine with the current behavior though it's good to known that in a warc2zim ZIM, all navigation goes through the service worker that looks up the resource and if not present, uses the 404.html. Current implementation takes care of forwarding different-domain requests to the kiwix-serve blocking feature (if on kiwix-serve) and otherwise just does a redirect. It is possible to have a different behavior here (so post-click). |
I confirm, I complain of having so many links pointing to a 404 page. This is tolerable time to time, but not if this happen every few clicks. I agree about the visual marker. I disagree with removing the link in itself. We do that with MWoffliner, which by the way you can have a link on basically everything, and this works fine. Considering that removing a link has no visual effect, I hardly see how this could be worse than keeping it. |
If it happens too often, it might be a good indicator that the ZIM is poorly scoped or has too much external links. Again, WP is a very specific. Wild websites have links everywhere, which contains images, text, list, etc. What do you remove and what do you keep ? Let's say you have <a href="broken"><img src="somehugeimage.svg" /></a> with a CSS rule as a img { height: 1em; } What happens when you remove the |
yes, but
You are right, here there is an impact. But this is a silly rule - not really convincing to me. |
This is actually a rather big challenge, and what we've called the 'boundary problem', how to communicate to users what the boundary of a particular archive is. To make this work, I think there needs to be a tool that runs client-side and is able to respond to changes in the DOM, and follows certain rules. (Many tags are generated dynamically so it is not possible to do it at scraper time). This has been a general issue, and my colleagues at Rhizome had a research project to try to implement an approach, and a prototype tool was created to address this problem: https://github.com/Rhizome-Conifer/Periphery over the summer. While this was created for the python wayback, it runs on the client and much of it would apply here. The prototype included various options, including UI overlays, tooltips, and dynamically checking links on a page to see if they are valid. Currently, this is a prototype, but I can reach out to Matt, the developer of this tool to see if perhaps it can be adopted to work with wabac.js and then integrated into warc2zim. |
This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions. |
This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions. |
Postponing this to "later" because it is clear that we want and need to do something but it is vastly unclear how to do it properly. Transferring "Matt" prototype into warc2zim would probably make sense (at least it looks like a promising solution to me), but it does not look like a small thing we can do without funding, more research / fine-tuning is probably still needed. |
This is pretty annoying not to know before clicking on a link if we will get a 404 (because offline) or if this is a resource which is in the ZIM. This kill a bit the user experience.
I see two solutions to solve the problem:
I suspect if one can be implemented the other can as well pretty easily, therefore we should maybe let the user decides how he want to deal with the problem.
The text was updated successfully, but these errors were encountered: