-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Forced encoding change in ada_set_*()
functions
#66
Comments
Thanks for reporting this. This seems to be a bug that might be present in all functions library("adaR")
examples <- c(
"http://xn--53-6kcainf4buoffq.xn--p1ai/pood/junior-electrical-engineer-jobs-remote.html",
"http://xn--80abb0biooohbv.xn--p1ai/",
"http://xn--alicantesueo-khb.com/insomnio",
"https://normal-url.com/this-path-will-be-fine",
"http://xn--53-6kcainf4buoffq.xn--p1ai/this-path-will-not-be-fine"
)
ada_url_parse(examples,decode = FALSE)
#> href
#> 1 http://xn--53-6kcainf4buoffq.p1aǢi/pood/junior-electǢricaǢl-engǡineer-jobs.html
#> 2 http://xn--80abb0biooohbv.xn--p1ai/
#> 3 http://xn--alicantesueo-khb.com/insomnio
#> 4 https://normal-url.com/this-path-will-be-fine
#> 5 http://xn--53-6kcainf4buoffq.p1ai/this-˘path˘-will-not-be
#> protocol username password host hostname port
#> 1 http: поверкадома53.рф поверкадома53.рф
#> 2 http: бамбукхутор.рф бамбукхутор.рф
#> 3 http: alicantesueño.com alicantesueño.com
#> 4 https: normal-url.com normal-url.com
#> 5 http: поверкадома53.рф поверкадома53.рф
#> pathname search hash
#> 1 /pood/junior-electrical-engineer-jobs-remote.html
#> 2 /
#> 3 /insomnio
#> 4 /this-path-will-be-fine
#> 5 /this-path-will-not-be-fine
ada_url_parse(examples, decode = TRUE)
#> href
#> 1 http://xn--53-6kcainf4buoffq.p1aǢi/pood/junior-electǢricaǢl-engǡineer-jobs.html
#> 2 http://xn--80abb0biooohbv.xn--p1ai/
#> 3 http://xn--alicantesueo-khb.com/insomnio
#> 4 https://normal-url.com/this-path-will-be-fine
#> 5 http://xn--53-6kcainf4buoffq.p1ai/this-˘path˘-will-not-be
#> protocol username password host hostname port
#> 1 http: поверкадома53.рф поверкадома53.рф
#> 2 http: бамбукхутор.рф бамбукхутор.рф
#> 3 http: alicantesueño.com alicantesueño.com
#> 4 https: normal-url.com normal-url.com
#> 5 http: поверкадома53.рф поверкадома53.рф
#> pathname search hash
#> 1 /pood/junior-electrical-engineer-jobs-remote.html
#> 2 /
#> 3 /insomnio
#> 4 /this-path-will-be-fine
#> 5 /this-path-will-not-be-fine Created on 2024-01-10 with reprex v2.0.2 |
First guess is the Lines 5 to 13 in 3b43e60
specifically the call to ada_idna_to_unicode .
|
@schochastics I think it only affects some urls with puny and therefore To reduce this problem into the smallest, is this: ada_get_href("http://xn--53-6kcainf4buoffq.xn--p1ai/") ## ok
ada_get_href("http://xn--53-6kcainf4buoffq.xn--p1ai/doof") ## ok
ada_get_href("http://xn--53-6kcainf4buoffq.xn--p1ai/doof/junior.html") ## ok
ada_get_href("http://xn--53-6kcainf4buoffq.xn--p1ai/doof/juniorprogrammer.html") ## ok
ada_get_href("http://xn--53-6kcainf4buoffq.xn--p1ai/doof/junior_programmer.html") ## ok
ada_get_href("http://xn--53-6kcainf4buoffq.xn--p1ai/doof/junior-programmer.html") ## BEEEEEEP! |
Just to be sure, this works (modified from the C demo). #include "ada_c.h"
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
static void ada_print(ada_string string) {
printf("%.*s\n", (int)string.length, string.data);
}
int main(int c, char* arg[]) {
const char* input =
"http://xn--53-6kcainf4buoffq.xn--p1ai/doof/junior-programmer.html";
ada_url url = ada_parse(input, strlen(input));
if (!ada_is_valid(url)) {
puts("failure");
return EXIT_FAILURE;
}
ada_print(ada_get_href(url));
ada_free(url);
return EXIT_SUCCESS;
} ## with the single-header distribution: ada.cpp and ada.h
c++ -c ada.cpp -std=c++17
cc -c demo.c
c++ demo.o ada.o -o cdemo
./cdemo Like you said, @schochastics, a thing that I found is that there are Lines 109 to 110 in 3b43e60
Maybe one solution is not always force |
@chainsawriot do you want to give it a try to fix it? I am fine with any solution that does not affect other parts negatively |
obviously there is no stress and this can wait till March |
Hi!
I've encountered a bug in
adaR::ada_set_*
functions family related to pathname processing.In cases where an URL is in punycode (domain starting with xn--), using adaR's set family functions changes pathname encoding and I don't know how to prevent (or revert) this behavior.
For example:
will return:
Notice 1st and 5th URLs.
even though
adaR::ada_get_pathname(examples, decode = FALSE)
returns correct output:The same behavior is present even when pathname isn't changed, for example:
Also it's worth noting that
hostnames
looks different (is encoded), but the function call above didn't change the hostname at all.My
sessionInfo()
The text was updated successfully, but these errors were encountered: