Skip to content
This repository has been archived by the owner on Feb 10, 2024. It is now read-only.

Encoding issues on latest windows build ? #1636

Open
Xyl2k opened this issue Mar 13, 2016 · 29 comments
Open

Encoding issues on latest windows build ? #1636

Xyl2k opened this issue Mar 13, 2016 · 29 comments
Labels

Comments

@Xyl2k
Copy link

Xyl2k commented Mar 13, 2016

Hello everyone,
I'm wondering if someone else is experiencing this problem, after updating to the latest windows build of Hexchat (2.12.0).
from: 97af149822e11c53385b2358a4419972 (HexChat 2.12.0 x86 installer)
Some characters seems not properly rendered.

2016-03-13_18-40-09

2016-03-13_18-48-26

I'm always in UTF-8 (Unicode) on my channels and i never had this problem with previous hexchat versions.
Another guy (who's using the x64 version) got the same problem after updating to 2.12.0 that why i opened this thread to see if anyone is experiencing troubles.
as French we have a lot of accents and shits, and it's a bit frustrating to read text with '�' char instead of accents.
i don't know why some chars are badly parsed like that, and depending of people talking, accents come good:

2016-03-13_19-09-28

I'm looking at my parameters but i have no idea of what's can i do to fix this kind of problem.
Regards

@Arnavion
Copy link
Contributor

The people who you're getting invalid characters from are not using UTF-8. Tell them to use UTF-8.

@stallemanden
Copy link

I am sorry to see, that this issue has been closed, as I believe it is valid.
After the latest update of hexchat, I see a lot of this.
This being an issue, that did not exist in the previous release/build.

"Tell them to use UTF-8" is to me, a rather ignorant approche, as this is an issue introduced by the latest hexchat build.

@Arnavion
Copy link
Contributor

The reverse of the issue (UTF-8 text incorrectly getting interpreted as system-locale-encoding text and thus breaking) was present in the previous builds. You were just lucky you never encountered it.

If two sides of a conversation pass text to each other, they have to agree on the encoding. This is common sense, not ignorance.

@stallemanden
Copy link

So, it is pure coincidence, that the last 3 months of usage within the same channel, with the same people, have not shown any sign of the problem ?
As you are hinting (UTF-8 required on both sides), not all of the peoples texts are showing the issue after the update, but i have not seen any sign of the issue within the last 3 months of usage.
It seems rather strange to me...

@Arnavion
Copy link
Contributor

So, it is pure coincidence, that the last 3 months of usage within the same channel, with the same people, have not shown any sign of the problem ?

Yes, since I and many others encountered it constantly with words with quotes getting treated incorrectly.

It seems rather strange to me...

I'm sure it does. Common sense is rather hard to be found these days.

@stallemanden
Copy link

Yes, since I and many others encountered it constantly with words with quotes getting treated incorrectly.

I see, and have not seen any issues with quotes getting treated incorrectly.
What I am seeing, after latest update, is æ, ø, å being displayed as �
Again, not something I have seen before, across multiple systems.

I'm sure it does. Common sense is rather hard to be found these days.

Seriously ?

@TingPing
Copy link
Member

I see, and have not seen any issues with quotes getting treated incorrectly.
What I am seeing, after latest update, is æ, ø, å being displayed as �
Again, not something I have seen before, across multiple systems.

It is as simple as they are sending an encoding that is not the encoding you use. Previously releases had various hacks that attempted multiple encodings. This was removed because it is an awful idea that is client specific and introduces various corruption issues that can be solved by everybody agreeing on an encoding.

@Arnavion
Copy link
Contributor

Seriously ?

Says the guy who called me ignorant for knowing what I'm talking about.

@stallemanden
Copy link

It is as simple as they are sending an encoding that is not the encoding you use. Previously releases had various hacks that attempted multiple encodings. This was removed because it is an awful idea that is client specific and introduces various corruption issues that can be solved by everybody agreeing on an encoding.

Thank you for actually explaining what and why we are seeing this issue.
Thank you very much ThingPing

Says the guy who called me ignorant for knowing what I'm talking about.

I commented on the "Tell them to use UTF-8" statement. I am aware of the technical reasons for this happening. I just never got any response as to why something had worked previously for me, and obviously other users, had stopped working.
In other words, it was not clear, as to why and what had been changed in the latest build, to introduce this behavior.

@TingPing
Copy link
Member

This change was mentioned in the release notes because it is a "regression" for some users.

@Arnavion
Copy link
Contributor

All right, I see there was a language problem, so I apologize for assuming the worst.

FYI when you say "an ignorant approach" the word "ignorant" applies to the person who suggested the approach. It doesn't mean "an approach for which there is no justification". So "Tell them to use UTF-8" is to me, a rather ignorant approche, as this is an issue introduced by the latest hexchat build. should've been "Tell them to use UTF-8" is to me, without justification, as this is an issue introduced by the latest hexchat build.

@Grui
Copy link

Grui commented Mar 22, 2016

Hi,
Same problem there, I use Hexchat in my office. Since last update, impossible to work : special characters problems. I uninstall Hexchat.

Best regards.

@Arnavion
Copy link
Contributor

The time to uninstall HC, sign up for Github, search for the relevant issue and comment on it could've been better spent setting the network encoding correctly.

@Grui
Copy link

Grui commented Mar 22, 2016

Hi Arnavion,

Sadly impossible, 2 encodings are used on the network UTF-8 + ISO-885915

@Xyl2k
Copy link
Author

Xyl2k commented Mar 22, 2016

problem is people who don't want to change their clients, or server configuration and as hexchat is more forcing, i simply downgraded to 2.10, may be buggy for some but still it work fine for me and get ride of all this mess.

@Arnavion
Copy link
Contributor

They're all mIRC users, aren't they?

Anyway, when everyone in this thread and #1198 can come up with a solution that's acceptable to all of you, I'm happy to consider it.

@TingPing
Copy link
Member

I think not falling back to ISO-8859-1 on every line is just objectively the right thing to do. A more generic version of the "IRC" encoding is common in other clients though where a user can explicitly configure a fallback encoding which handles most situations (It does re-add old problems, but would be more explicit and opt-in).

@lichtmetzger
Copy link

Previously releases had various hacks that attempted multiple encodings. This was removed because it is an awful idea

If this feature worked fine for some users (like me btw) and they have problems now, this feature shouldn't be removed. A better idea would be to make it optional and let the user choose what he wants.

Telling all users in a chat to change their encoding when some of them use web clients is impossible.

@TingPing
Copy link
Member

Telling all users in a chat to change their encoding when some of them use web clients is impossible.

Which web client uses an awful encoding by default and doesn't support changing the encoding?

A better idea would be to make it optional and let the user choose what he wants.

I'm not totally against that but the previous solution needed to be ripped out as everything was hardcoded to be broken.

@lichtmetzger
Copy link

Which web client uses an awful encoding by default and doesn't support changing the encoding?

That's not the point. Users who use webclients have generally no idea about IRC at all, otherwise they would use a real client.
Imagine joining a chat with 100 users and telling them all to change the encoding settings. That's like asking a basic windows user to edit their registry.

I'm not totally against that but the previous solution needed to be ripped out as everything was hardcoded to be broken.

And by unbreaking one feature you broke another feature. I think you should fix it.

@Arnavion
Copy link
Contributor

That's not the point.

Yes it is. If such a web client exists then tell us who it is, so we can tell them to fix it.

@lichtmetzger
Copy link

Yes it is. If such a web client exists then tell us who it is, so we can tell them to fix it.

The problem seems to originate from here:
https://webirc2.iz-smart.net/?channel=#fernsehkritik-tv

I can't see all characters from users connecting through this webclient and there is no option to change encoding in this client (at least I can't see an obvious one).

This chat is used for livestreams that are watched by hundreds of users. Given that size, your solution of telling the members of a chatroom to change their settings is not applicable to the real world.

@Arnavion
Copy link
Contributor

That looks like a German network, so it's likely everyone there is using a German encoding. Set HC to use the same encoding.

@lichtmetzger
Copy link

It is a German network, but with the same situation that grui has posted here:

Sadly impossible, 2 encodings are used on the network UTF-8 + ISO-885915

So whichever encoding I set in HexChat, I can only lose. WebChat users cannot change their encoding.

@jhkl
Copy link

jhkl commented Aug 20, 2016

Is it possible to add an option like "[ ] Use heuristic to guess non-UTF8 characters" that would enable the old behavior?

@TingPing
Copy link
Member

Is it possible to add an option like "[ ] Use heuristic to guess non-UTF8 characters" that would enable the old behavior?

To be clear, the previous solution did nothing smart as it simply tried a second encoding and then just used that result. Equally as often resulting in corrupt garbage as the correct result.

As I've mentioned I would be ok with a feature that allowed selecting a second encoding to just always try. The same problem would basically exist but the user is opting into that broken behavior at least.

@tmannerm
Copy link

tmannerm commented Sep 3, 2016

Technically forcing UTF-8 only on IRC is a good thing and I agree with that goal. However, there's not enough time or support resources in the world to educate the dozens of people that send broken characters. I tried to work with one using irssi (on a Linux shell machine) but they claim they have everything set right and I'm only one with problems so it's "my problem". There are also various, especially on mobile platforms, IRC clients that don't even support UTF8 so they are unfixable.

So it will be impossible to get everybody else fix their characters so we are stuck with the ugly UTF8 unknown character symbols. This is especially problem with country specific channels in IRCnet. I have a feeling all the other IRC clients are doing this detection as nobody else is having problems (not only mIRC users) except latest hexchat version users. I guess there's no provisions in the IRC protocol for the encoding and the "standard" still assumes pure ASCII? Or has that changed in later revisions?

Is there any other work going on to address this very real problem or is the patch reversal and/or version downgrade only way? I feel there's possibility that was something else was broken while removing this feature as it doesn't seem to work against some IRC clients. Has a core developer verified latest version on IRCnet with non-ASCII characters? If not, anybody has time to check? I'll try to verify what settings this particular irssi user has and can provide any help I can to fix this issue.

In any case, please reopen and reconsider this issue. Otherwise I'm afraid lots of people will be stuck on an older version with any security problems.

@TingPing
Copy link
Member

TingPing commented Sep 3, 2016

I tried to work with one using irssi (on a Linux shell machine) but they claim they have everything set right and I'm only one with problems so it's "my problem".

/set recode_out_default_charset UTF-8

The number of clients that don't support setting an encoding and default to not-utf8 is pretty small I believe. I'll repeat myself yet again and say that an option to opt-in to being broken would be fine, I don't know of anybody who plans on doing that work though.

@Arnavion
Copy link
Contributor

Reopening because a PR to add a fallback encoding option is welcomed.

@Arnavion Arnavion reopened this Jan 23, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Development

No branches or pull requests

8 participants