Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Utf8 double encoding problem - ü rendered as ü #178

Open
firepol opened this issue Jan 1, 2015 · 1 comment
Open

Utf8 double encoding problem - ü rendered as ü #178

firepol opened this issue Jan 1, 2015 · 1 comment

Comments

@firepol
Copy link

firepol commented Jan 1, 2015

Hi, I am having headache after investigating some hours for this issue.

I'm trying to render this URL: http://airolo.ch/impianti/details.php?lang=ita&season=winter

Like this:

var initialHtml = CQ.CreateFromUrl("http://airolo.ch/impianti/details.php?lang=ita&season=winter");
var cssTarget = initialHtml[".container"];
string cssResult = cssTarget.FirstOrDefault().Render();

In the cssResult string I expected to get "Pesciüm" encoded like this:

Pesciüm

What I get, instead, is:

Pesciüm

I tried also:

string cssResult = cssTarget.FirstOrDefault().Render(OutputFormatters.HtmlEncodingNone);

In the cssResult string I expected to get "Pesciüm" (as in the original file), what I get, instead: "Pesciüm"

I think that CsQuery is double encoding utf8. The problem can be seen also in this blog post:
http://www.bardecode.com/en1/double-encoded-utf-8-strings-in-c/

I tried another url of a German website, full of words wit umlauts, but there I don't experience the same problem. So maybe in this case the encoding is not properly detected? Is there a proper way to deal with such cases automatically (without knowing what the encoding of the original website is)?

I tried to use the method suggested in the blog, however that produces other problems (non breaking spaces converted to strange characters).

I've seen in the CsQuery that it's possible to implement custom implementation of an OutputFormatter, but maybe you already have a solution for this?

I'm not sure if this is a CsQuery bug or another problem...
I'd really appreciate if you can help, thank you.

@rufanov
Copy link
Contributor

rufanov commented May 23, 2015

Can't reproduce your problem. Is it fixed already? For me, your code contains Pesciüm as expected, not Pesciüm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants