Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full URI-encoding of URLs returned by kiwix-serve #890

Merged
merged 11 commits into from Feb 9, 2023

Conversation

veloman-yunkan
Copy link
Collaborator

This PR tries to ensure that the full path component of URLs returned by kiwix-serve is properly URI-encoded. In particular, all of the following constituents of the path component encoded:

  • root location
  • book name
  • path of the entry within the book

Fixes #441

- Before this change `InternalServer::build_redirect()` only URI-encoded the
  article path, ignoring the book name and/or the root location components of
  the URL.

- In order to be able to test this fix, corner_cases.zim was renamed to
  contain a couple of special URL symbols in its filename. The
  `create_corner_cases_zim_file` script was updated accordingly.
Testing of this functionality revealed that the query part containing +
symbols (as replacement for spaces in the parameter values) isn't
forwarded properly as the + symbols are URI-encoded (this is a bug on
the part of the `RequestContext::get_query()` the result of which
already contains URI-encoded +'s).
This change doesn't make much sense on its own - the real goal is to
prepare some ground for easier implementation of URI-encoding of the root
location.
This silly optimization in fact helps to avoid a somewhat more serious
waste of CPU cycles that would otherwise result in the next commit.
@codecov
Copy link

codecov bot commented Feb 8, 2023

Codecov Report

Base: 71.88% // Head: 71.95% // Increases project coverage by +0.07% 🎉

Coverage data is based on head (51206f4) compared to base (2f41999).
Patch coverage: 93.75% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #890      +/-   ##
==========================================
+ Coverage   71.88%   71.95%   +0.07%     
==========================================
  Files          54       54              
  Lines        3752     3748       -4     
  Branches     2100     2100              
==========================================
  Hits         2697     2697              
+ Misses       1053     1049       -4     
  Partials        2        2              
Impacted Files Coverage Δ
src/server/internalServer.h 89.47% <ø> (ø)
src/server/request_context.h 100.00% <ø> (ø)
src/server/internalServer.cpp 88.64% <92.30%> (+0.11%) ⬆️
src/server/request_context.cpp 86.60% <100.00%> (+0.54%) ⬆️
src/server/response.cpp 94.64% <100.00%> (ø)
src/tools/stringTools.cpp 63.55% <0.00%> (+0.93%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Member

@mgautierfr mgautierfr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two comments that are more open questions than issue.
I'm not sure at all we should implement what suggested.

test/server.cpp Outdated Show resolved Hide resolved
Comment on lines +16 to +18
# Assuming that tests are NOT run under Windows, above symbols can be included
# in testing if the file is renamed while copying to the build directory (see
# test/meson.build), though that would make maintenance slightly more confusing.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will compile libkiwix on windows and it would be a good thing to test it also.

One thing will could do is rename the file only on linux and don't rename it on windows.
(And adapt the code accordingly). But in this case, that would make maintenance more than slightly more confusing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do anything about it now/in this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we can leave it at it is.

Now the root location is URI-encoded too.

In order to properly test this change the root location in the tests was
changed from "/ROOT" to "/ROOT#?" (or "/ROOT%23%3F" in URI-encoded form),
which is why this commit is so big.
The alleged bug seems rather an issue with httplib which seems to
URI-encode any + present in query parameters.
@mgautierfr mgautierfr merged commit fa80be8 into main Feb 9, 2023
@mgautierfr mgautierfr deleted the url_encoding_of_redirects branch February 9, 2023 10:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Kiwix-serve random feature returns partly broken URLs
2 participants