Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Fix string_manip.h #159

Closed
wants to merge 1 commit into
from

Conversation

Projects
None yet
3 participants

pyhalov commented Oct 4, 2013

copy_escape_html() shouldn't depend on locale. It should encode
every string which is not ascii digit or alphabetic symbol.

Fix string_manip.h
copy_escape_html() shouldn't depend on locale. It should encode
every string which is not ascii digit or alphabetic symbol.
Contributor

lotheac commented Jan 3, 2015

I ran into this as well, on OmniOS. Announce requests to certain trackers would
fail due to peer_id not being correctly encoded: partial hexdump of such a GET
request below:

     112: 4225 3543 3569 2670 6565 725f 6964 3d2d    B%5C5i&peer_id=-
     128: 6c74 3044 3430 2dec 2541 39ce e825 4146    lt0D40-.%A9..%AF
     144: 2541 36e8 e825 3839 d9f1 6c26 6b65 793d    %A6..%89..l&key=

Applying this patch to both libtorrent and rtorrent seems to have resolved the
issue: bytes with the high bit set in the local id appear correctly encoded
both in the rtorrent UI and on the HTTP requests on the wire.

Owner

rakshasa commented Mar 14, 2016

Closing this as I'm not going to include hacks to support systems that don't properly implement C++ standard.

@rakshasa rakshasa closed this Mar 14, 2016

Contributor

lotheac commented Mar 14, 2016

On Mon, Mar 14 2016 10:30:45 -0700, Jari Sundell wrote:

Closing this as I'm not going to include hacks to support systems that don't properly implement C++ standard.

That is not the issue. Depending on the current locale to decide whether or not
percent-encode characters in HTTP requests is just wrong. Observe the following
on Linux (in C, because I really suck at C++):

fractran ~ % cat alpha.c 
#include <ctype.h>
#include <locale.h>
#include <stdio.h>

int main(void) {
        setlocale(LC_ALL, "");
        printf("is alpha: %d\n", isalpha(0xe4));
        return 0;
}
fractran ~ % gcc -o alpha alpha.c
fractran ~ % LC_CTYPE=C ./alpha
is alpha: 0
fractran ~ % LC_CTYPE=fi_FI.ISO-8859-1 ./alpha
is alpha: 1024

So if you happen to need to encode the byte 0xe4 in a URL, it will be sent
as-is on the wire if you happen to be using a ISO-8859-1 locale, but not
otherwise. In the HTTP request context you should encode anything that is not
an "unreserved character" per https://tools.ietf.org/html/rfc3986#section-2.3

Lauri Tirkkonen | lotheac @ IRCnet

Owner

rakshasa commented Mar 15, 2016

No, it always uses 'std::locale::classic()' when doing std::isalpha, etc, and will ALWAYS use the standard 'C' locale.

That is why I'm saying their system is broken, as their implementation of either the above 'classic' method or std::isalpha does not follow the C++ standard. This should not be an issue on Linux or OS X, etc.

Contributor

lotheac commented Mar 15, 2016

On Tue, Mar 15 2016 01:30:16 -0700, Jari Sundell wrote:

No, it always uses 'std::locale::classic()' when doing std::isalpha, etc, and will ALWAYS use the standard 'C' locale.

That is why I'm saying their system is broken, as their implementation of either the above 'classic' method or std::isalpha does not follow the C++ standard. This should not be an issue on Linux or OS X, etc.

Ok, fair enough, I misread the definition of std::locale::classic (I
thought it meant it following the C semantics for locales). I'll report
a bug in illumos, thanks.

Lauri Tirkkonen | lotheac @ IRCnet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment