-
-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zimdump produces invalid path in HTML-based redirect #224
Comments
This is a workaround to fix issue described in: openzim/zim-tools#224
@kelson42 fwiw I've worked around this in ipfs/distributed-wikipedia-mirror@11fe184 by leveraging find ./wikipedia_tr_all_maxi_2021-02/A -type f -size -800c -exec fgrep -l "0;url=A/" {} + -exec sed -i "s|0;url=A/|0;url=|" {} >> fixed_redirects.log + Inspecting all articles takes too much time, so it looks for |
@veloman-yunkan may I ask you please to look at this ticket if you have the time? This is a pretty small one, but this impairs a bit the new Wikipedia snapshots release effort on IPFS we are running now. |
@kelson42 OK, I will investigate it right away |
@veloman-yunkan Thank you! I looked yesterday to the code and the problem is clearly that there is no code at all to compute the relative path. The redirect targeted article fullpath URL is just put there. I believe we have a function somewhere (in |
I looked for such a function only in |
Assuming I understood this correctly, the HTML-based redirect created by
zimdump
points at an invalid path.How to reproduce
wikipedia_cr_all_maxi_2021-02.zim
is small enough to be a good demo:So while in
A/{name1}
we are redirected toA/{name2}
.This looks like a bug, because the redirect is relative to
{name1}
, so it points atA/A/{name2}
which does not exist.I was able to reproduce the same issue with
wikipedia_cu_all_maxi_2021-02
– example:A/Жена
effectively redirects toA/A/%D0%96%D1%94%D0%BD%D0%B0
wikipedia_tr_all_maxi_2021-02
– example:A/Fatih_Sultan_Mehmed
effectively redirects toA/A/II._Mehmed
How to fix?
Relative redirect
Ideally, a relative path would be used, without namespace prefix, something like:
Open problem: subdirectories
The caveat is an edge case when a name with
/
is redirecting to something else. Something likeA/some/name
needs to point at article one level up:In case that adds too much complexity to
zimdump
code, we could do that in JS.If we replace
<meta>
tag with<script>
then we are able to correctly redirect, no matter if name includes/
or not. Something like:@kelson42 lmk if there is a better way (or if I misunderstood how things work :))
The text was updated successfully, but these errors were encountered: