-
-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
URL encoding suggestion pathname's unintended side effect #958
Comments
We url encode as we may have
At least zimit stores the querystring as part of the path ( At first sight, I would said that we should not url encode |
@veloman-yunkan This is a must fix before releasing |
Building suggestion links using URI-encoded components (book name and the article path) was introduced in #860. The main targets were only a few dangerous symbols, however in the absence of a standard function for selective URI-encoding all reserved symbols were encoded leading to this issue.
@kelson42 The non-homogeneous structure of URLs but the natural desire to keep things simple has resulted in an encoding scheme that makes the world go nuts when one URL has to be embedded in another URL. Therefore there is no surprise that a human mind cannot comprehend all possible interactions of various scenarios (especially when pieces of code from different people having different assumptions meet in one place). Hence, please be prepared for a few more issues of this kind with exponentially growing weirdness until fixing the last one will produce a space-time singularity that will destroy our Universe. 😄 |
Before this fix suggestion links were built out of fully URI-encoded book name and article path components despite the fact that this measure was taken against only a few dangerous symbols such as '#', '?', '"' and '\'. However, URI-encoding the slash symbols in the path has some undesirable side-effects (see #958). Henceforth only the problematic symbols are encoded in the article path component. The book name is still fully URI-encoded since I don't see any counter-arguments.
Before this fix suggestion links were built out of fully URI-encoded book name and article path components despite the fact that this measure was taken against only a few dangerous symbols such as '#', '?', '"' and '\'. However, URI-encoding the slash symbols in the path has some undesirable side-effects (see #958). Henceforth only the problematic symbols are encoded in the article path component. The book name is still fully URI-encoded since I don't see any counter-arguments.
Here's another tricky behavior.
In this ZIM, because it's a single page app, we have ZIM front entries that are stub HTML documents simply redirecting to the SPA's URL with appropriate fragment.
For instance, one content is available from:
index.html#english/python-for-everybody/comparing-and-sorting-tuples
We thus have a front article at
redirect/english/python-for-everybody/comparing-and-sorting-tuples
.This works fine with and without the viewer. The random feature works as expected, serving up the redirects.
The problem is with the Suggestions. As you can see, the suggestion engines returns correct results and those are displayed fine but click on any leads to a
404
.Instead of arriving (search for “comp”) at the expected location, one arrives at
https://dev.library.kiwix.org/viewer#index.html
As you can see, the bookName part is gone and one would expect the redirect to have gone to far. It's not the case.
It's not a problem with how the location is changed (https://github.com/kiwix/libkiwix/blob/main/static/skin/viewer.js#L50).
What happens is that the suggestion code urlencodes the result turning
redirect/english/python-for-everybody/comparing-and-sorting-tuples
intoredirect%2Fenglish%2Fpython-for-everybody%2Fcomparing-and-sorting-tuples
.Now read carefully:
This is not linked to the viewer and the iframe. You can reproduce on
/content
:/content/freecodecamp.org_en_custom_2023-06/
✅/content/freecodecamp.org_en_custom_2023-06/redirect/english/python-for-everybody/comparing-and-sorting-tuples
✅/content/freecodecamp.org_en_custom_2023-06/redirect%2Fenglish%2Fpython-for-everybody%2Fcomparing-and-sorting-tuples
❌Counter-intuitively, the browser does not translates the
%2F
into/
and moves there but instead it queries the server with the%2F
, gets the response and renders it from it'sredirectXXX
location, not accounting those%2F
as path segments.That's exactly the purpose of URL-encoding and all browsers do the same 👍
My question at this point would be: why are we URL-encoding this information? From what I understand, this is a ZIM-Entry path that is sent to us via the Suggestion API and we just use it as source of the iframe (prefixed with root if any).
Need for URL-encoding would be to prevent characters from that ZIM entry path to be interpreted as special-meaning for being reserved characters.
While ZIM path are entirely free, ZIM creators already limit themselves and will do as long as ZIM are mainly run in a Web context.
␣
!
"
#
$
%
&
'
(
)
*
+
,
/
:
;
=
?
@
[
]
The text was updated successfully, but these errors were encountered: