Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

geth console special character/encoding problem #16286

Closed
philsmd opened this issue Mar 8, 2018 · 18 comments
Closed

geth console special character/encoding problem #16286

philsmd opened this issue Mar 8, 2018 · 18 comments

Comments

@philsmd
Copy link

philsmd commented Mar 8, 2018

System information

Geth version: 1.7.2
OS & Version: Windows 10
Commit hash : release version downloaded by Mist (1db4ecd)

Expected behaviour

Copy pasting text into the geth console should leave the copied text unmodified and should not strip any characters.

Actual behaviour

Special characters (non-ascii) will be ignored/stripped by the geth console (it seems that this has to do with certain keyboard languages, but it works outside of the geth console) and this could result that passwords are set incorrectly. Even unlocking accounts with passwords including special characters might fail even if it shouldn't, see also ethereum/mist#3513 (comment) (search for "Let's talk about problem number 2:" within that post to fast forward to problem number 2 within that post)

Steps to reproduce the behaviour

  1. this was tested/confirmed under windows, but it could be that it has nothing to do with the OS (but the main reason could be the differences between certain keyboard languages)
  2. download/run geth (it seems that a lot/all versions are affected, but this needs to be confirmed/tested)
  3. set your keyboard language to e.g. English US, my default one (I also tested with German keyboard language)
  4. run "geth console" (without quotes)
  5. try to paste non-ASCII characters into the console or password prompt (e.g. the character with hex value 0xd3, i.e. Ó)
  6. the special characters will be ignored, which could be very dangerous because passwords might be set incorrectly and unlocking (for instance if the user just wants to see if the password is correct) might fail even if the password specified was correct

Note: I want to emphasize that the copy pasting only fails to work within geth, other tools and the same terminal (cmd) are able to get the whole input including special characters

Confirmation video (also see screenshots from ethereum/mist#3513 (comment)); this is a screen record to show what is happening:

ethereum_geth_console_special_characters

Note: before I started geth.exe the copy pasting of all characters worked (even with my default English US keyboard layout), but it failed within the geth console... it suddently started working when changing to the English United States International keyboard layout... it seems to be geth related because it works outside geth with each and every keyboard layout.
It is needless to say that this same behaviour can be observed with any other geth commands and also with the password prompts (which is very dangerous! and the main reason that I've discovered this problem, see ethereum/mist#3513) etc. I decided to paste the character as a command (or let's say directly within the geth console, without using the password prompt etc) because it makes it more clear what is going on because this way we can see which characters are actually pasted and which ones are ignored

What is the effect of this problem:

  1. the user could unintentionally set a password that s/he doesn't want to set (personal.newAccount ()), the password could be very different depending on how many special characters s/he used, it could even be empty in the worse case even if the user wanted to set a very difficult password with many special characters
  2. the user can't unlock this newly created keystore files with the correct password (if it includes special characters) whenever s/he tries to use 3rd party tools (which accept the special chars)
  3. the user might not be able to unlock keyfiles generated by 3rd party tools (if they accepted the special characters but geth fails to accept them)
  4. other commands (besides the password cmd) might be affected the same way and could also lead to strange/unintended outcomes
    .... etc

CC: @evertonfraga, @karalabe, @holiman

@holiman
Copy link
Contributor

holiman commented Mar 8, 2018

Geth uses https://github.com/peterh/liner under the hood. I'm curious if how the little app here https://github.com/peterh/liner#getting-started would behave on your system.

@philsmd
Copy link
Author

philsmd commented Mar 8, 2018

Hey @holiman, thanks for your fast reply and the hint about liner.

I tested the code from https://github.com/peterh/liner#getting-started now. I compiled it with GOOS=windows GOARCH=amd64 go build -o peterh_liner.exe peterh_liner.go

This test shows that liner alone is not affected by this problem. Maybe geth is using some special modes etc, but the little app does allow posting special characters.
I have a screenshot to proof this for you. BTW: it doesn't matter which keyboard layout I set, the peterh liner little app test always work (similar to the pasting directly within cmd), but it fails when I paste in geth.

liner_test

As you can see the keyboard layout in the screenshot above is my default one which failed with geth. With the little liner app I have no problems pasting special chars

@holiman
Copy link
Contributor

holiman commented Mar 8, 2018

Thanks for checking!

@philsmd
Copy link
Author

philsmd commented Mar 11, 2018

I decided that this problem is significant and risky enough (it could mean a lot of trouble for many users, because I guess a lot of users also use geth on windows with non-international keyboard layouts) so that I dedicated quite a lot of time today to troubleshoot this problem.
The following is a little bit of a story telling, please forgive me:
I decided to try if I can compile geth entirely on windows (without cross-compilation and without running the pre-compiled binaries) for the sake of being able to make modification to the golang source files etc.
I actually managed to compile the whole geth on windows without many problems with these instructions: https://github.com/ethereum/go-ethereum/wiki/Installation-instructions-for-Windows#compiling-geth-with-tools-from-chocolatey ( I think this guide is pretty good, except that I think after the "cd src\github.com\ethereum\go-ethereum" command the prompt in the guide should be "C:\Users\xxx\src\github.com\ethereum\go-ethereum>" already. Furthermore, the guide should tell you where the geth binaries are: they are located in %USERPROFILE%\bin\, therefore the go-ethereum binary is in %USERPROFILE%\bin\geth.exe for instance)

After seeing that also the newly compiled geth binaries fails @ encoding, I inserted the example code from liner: https://github.com/peterh/liner#getting-started into the geth source code at a very early stage where geth is still initializing (I think I inserted it after this line

func newTerminalPrompter() *terminalPrompter {
).
The outcome: a total surprise! even if I inserted the code at the beginning of geth it still didn't work. no special characters pasted/allowed.
My idea now was to compile the little liner standalone app (https://github.com/peterh/liner#getting-started) without any geth code, just to make sure that it still works (I compiled it within the go-ethereum directory. on windows directly).
Again a surprise: now the little liner app also didn't work anymore!

I tried to see if the old liner exe file I compiled (see previous posts) still worked, or if it was some terminal/keyboard language problem. I moved to the other geth directory and run it. Surprise again: it worked perfectly!
First, I thought it was the cross-compilation, because I compiled the old liner exe file with a linux (my usual/preferred os). next idea: try to compile the liner app again within this changed directory.
Now comes the funny part:
I got the error that I can't compile the little liner app and that I need to "go get" the perterh liner app source code! WTH? I just compiled it without any problems. It told me that I have to "go get github.com/peterh/liner", why didn't it tell me previously that I had to do so!?

Now I of course realized pretty quickly what was going on, geth ships its own source files of the peterh liner app see: https://github.com/ethereum/go-ethereum/tree/master/vendor/github.com/peterh/liner
That explains why the first attempt today was successful in compiling without the "go get" and why that version was still misbehaving while the other/newer version of liner directly downloaded from https://github.com/peterh/liner was working perfectly fine and I had to "go get" it.

Funny coincidence that I was going through this changing-directory-decides-if-it-fails-or-succeeds mess, right?

Anyway, putting the newer source code of liner (from github.com/peterh/liner) within the vendor/github.com/peterh/liner/ directory of geth made the geth binary working (with special character support) again for me! Success

I also took some time, just for fun now, to locate the commit that fixed the problem in liner (this is the first version that works for me):

peterh/liner@161ea9c

(it's currently the latest commit in liner and the commit message talks about fixing problems with surrogate pairs etc. It's actually funny that this problem was fixed just 21 days ago in the liner repository. I think we should use the latest versions of liner if no problems are detected within geth during testing with the newest liner version)

There is one big problem now that you shouldn't ignore. There are potentially many accounts now out there that can only be unlocked with specific versions of geth, i.e. those that ignore the special characters etc. The problem is similar to this one #2371. You can't just fix the problem, but you also need to make sure that the old accounts still can be unlocked. One possibility would be to test unlocking also with the "stripped special character" version of the password if the typed password (with special chars) did not work. I think this needs to be done on each and every supported platform even if the problem seems to be windows-specific, because it could happen that a user created the file with geth on windows and later on tries to unlock it with geth on linux/mac. I would also suggest to show some warnings and hints for instance when geth detects that the password did only work with the modified password and also that geth tells the user (it should explain why) it also tries with a modification because of this bug etc!

I hope you appreciate this contribution / troubleshooting. It was actually a lot of messing around and trying to figure out wth is going on with this liner lib. If also just some people have less problems with passwords/encoding in the future because of this "research"/fixing it was definitely worth it. I even tried to test the encryption/decryption with password recovery tools just to make sure that with the newer liner version also the whole set of bytes of the password are used by the encryption algorithm as we would expect (UTF 8 encoded password with all special characters)... cracking the UTC-- files worked with all my tests now if the files were produced with the modified geth that was compiled with the liner fix!

BTW: I just found out that this problem was actually already mentioned (and people were struggling with it) in other issues (a little bit hidden therefore I didn't find this post beforehand) and I think the root cause is exactly the same (namely the old liner code!): see ethereum/mist#3176 (comment)

What we can learn from this: it often makes sense to update the included/imported libs, because there could be a lot of problems that were fixed with newer versions of them.

Can you guys please suggest how you plan to properly fix this? liner update + code which also tries the modified password (at least for several future versions of geth) cc: @karalabe, @holiman

@holiman
Copy link
Contributor

holiman commented Mar 11, 2018

Thanks for the investigation!

We do update the 'vendored' libraries from time to time -- and I guess updating 'liner' would be good. IIUC, the TLDR is "Copy-pasting geth console in windows may have dropped certain special characters".

Also, if I'm reading it right, this does not concern accounts that have been created through Mist.

@philsmd
Copy link
Author

philsmd commented Mar 11, 2018

yeah exactly (but it also happens if you typed it of course, it's not a clipboard problem only)
but what if newer versions of geth suddently start accepting the special characters that the user have typed/pasted ? the stored/remembered password does not work anymore !
that's a huge problem and you can unfortunately only fix it by trying both version of the password when unlocking accounts

@philsmd
Copy link
Author

philsmd commented May 31, 2018

are there any updates here ?

I find the thought quite disturbing that dozens (maybe much, much more) of users could potentially create accounts with geth each and every day that are generated with a different password compared to the password the user set. This is horrible and this should be addressed pretty quickly.
I mean this isn't a rare situation at all. At least all windows binary (and the master source code) are affected by this bug. This problem kind of affects every windows user if they didn't set an international keyboard layout (i.e. only the English United States International layout seems to be an exception for some reason, which might be only used by very few users) and used non-ASCII characters within their password.

As already mentioned, this problem also leads to other problems like the user can't use this keystore file with 3rd party tools like mycrypto/myetherwallet (because they don't mangle the password input in the same way the geth-liner-combination does) and you also can't use mist UI etc if the key was generated with geth beforehand (because the password, when we look at this specific bug, only gets screwed-up by the geth-liner combo).

I think there is absolutely no plausible reason to wait until even more (hundreds/thousands?) of keystore files are generated "incorrectly" (with wrong password)... I reported the problem about 3 months ago and it seems the users noticed some strange behaviour even much, much earlier (though nobody seemed to identify the main cause before I opened this issue). Why risk that even more keystore files are incorrectly generated ? This should be escalated and handled very responsible and by considering all aspects of the changes (like I already mentioned: what if the keystore file was generated with the mangled password, but in upcomming versions geth uses the "correct" password input? both versions of the password - mangled/raw - must be tested, on each and every platform).

You might have more numbers/stats available than I have, but my guess is that there are a lot of windows geth downloads and also a lot of users that do not use ASCII characters at all (their keyboard is non-english, uses non-latin characters in general !!!) and/or a lot of users try to set a difficult password with special characters/umlauts etc... all these cases lead to a INCORRECT keystore file, you know about the problems details since about 3 months and did nothing against it! This is a horrible fact and I find it very careless of you to not address problems like this with highest priority and uttermost diligence.

I mean ... the exact problem was already carefully analyzed and reported and even the reason/fix was already proposed by me (i.e. update liner + test for mangled password) ... the only thing missing is the code that checks both versions of the password to not risk that all old/affected keystore files are completely useless (and to not risk that not only they don't work with 3th party software, but also do not work anymore with newer versions of geth... if the fix is not handling these special cases/keystore files).

I think somebody needs to escalate this problem if nothing happens within the next few days/weeks. Maybe we should at least let the ethereum twitter and reddit community ask what they think of this bug and how this was (not) addressed within a timely manner.

Finally, I just want to make sure that nobody gets this/me wrong: I'm not trying to blame somebody for this problem or accuse somebody of something... but thinking about hundreds/thousands of keystore files that are completely wrongly generated and the devs ignoring it for months makes me a little bit alerted and frustrated. I see no reason why devs do this to their users (maybe they just forget/ignored the issue, but I think it's horrible enough that it can't be simply ignored !).
cheers

@holiman
Copy link
Contributor

holiman commented May 31, 2018

Gotcha, we're on it.

It's a very peculiar circumstance where someone would get bitten by this, and the wallet could always be recovered again using the same setup as where it was first set, so as the path forward was a bit unclear, it slipped under the pile of things to do.

We could add a 'canonicalizer', if a password fails and the 'canonical' version of the password is different from the password, we can attempt that too.

@holiman
Copy link
Contributor

holiman commented May 31, 2018

Though, to be clear, I don't think all non-ascii characters are affected, but UTF-16 surrogate pairs (which I'll have to look into)

@holiman
Copy link
Contributor

holiman commented Jun 1, 2018

I've asked @kurkomisi try to repro this on a similar system. He will post the specifics, it seems we'll need to investigate further about what exactly is causing this issue, because while UTF-16 surrogate-pairs are dropped, characters such as Ó works fine.

@philsmd
Copy link
Author

philsmd commented Jun 1, 2018

I'm not sure what you mean by it works with Ó . The screencast above clearly shows that there is a problem also with that specific character on the system(s) I tested with.
As I already mentioned here the character is encoded to 0xc393 in utf8: ethereum/mist#3513 (comment) , the same character utf-16 encoded might be different again. I'm pretty sure it uses 2 or more bytes for that character and this is the reason why the bug may trigger. Well, encoding is difficult and therefore it might need some more investigation.
I also want to emphasize that the system(s) I tested with are not using any strange (non default) settings or any strange language or keyboard settings (it's a normal windows system, the only thing I added later on was the additional german and international keyboard layouts because I wanted to see what happens with these settings).

@holiman
Copy link
Contributor

holiman commented Jun 1, 2018

Yeah I didn't mean there was no issue, just saying that we failed to reproduce it on a windows 10 with geth 1.7.2 and US (Eng) keyboard. So we'll likely need to dig further.

@holiman
Copy link
Contributor

holiman commented Jun 4, 2018

So, regarding the case you linked above, where a user thought he had used FußRoDah, but managed to unlock it with FusRoDah. If you believe that that was caused by this bug, then that means that the character ß was replaced by s, which does not appear to be the case in this bug.

As I've understood this bug report (and I'm asking for confirmation here because we haven't been able to reproduce it).

  • Characters above 127 are dropped.

Could you please, on your system, with the old liner, check what happens to the string FußRoDah? If the character is dropped, then that linked case is not caused by this issue. And it also means that the solution for this issue is much simpler, if it involves only dropping all non-ascii characters from the password and attempt again.

@philsmd
Copy link
Author

philsmd commented Jun 4, 2018

Hey @holiman, first of all, thanks for your effort in troubleshooting this.

I think the FusRoDah vs FußRoDah example is a little bit confusing indeed. As far as I understand, the user said that 0xAccountA with password FusRoDah (which is a complete independent account) does work without problems, while the account 0xAccountB with password FußRoDah does not work (but again account 0xAccountA and 0xAccountB are not related at all, you can not make any conclusion if either of them works, because there is no dependency, not even when we look at encoding or character codes etc). I think that the user just mentioned both cases, because as far as I know within German writings you sometimes (need to) replace the Eszett (sharp s) character, because some documents/forms do not allow special characters (in these specific cases I think ß is replaced by two s or something like this), but this is not really something we need to worry about here.
Basically, the example just showed that it works with a charset from [a-zA-Z] but doesn't work with special characters.

... but I also need to admit that now reading the post again carefully, the reporter said that it never worked for them, not even without the sharp s character. I didn't yet test the esszett example myself, I will of course do it as soon as possible because this might be somehow related here (or also completely independent and maybe the other post is not related with this issue at all. we will see).

I agree that the solution might be easy, but you still need to somehow deal with old keystore files.
Since the liner patch (peterh/liner@161ea9c) worked for my system, I guess that we need to cover all cases where basically these bytes from the ALT key event (as far as I understood this event is misused and also triggers when the alt key is NOT pressed) were ignored in previous versions of liner.

I'm not sure why you can't reproduce it. I do not think that it has something to do with 0x7f, but with characters that are at least > 0xff or > 0xffff (utf-16 for instance?, I'm not yet sure what the first character is that is ignored by older versions of liner, but I will test this too).

Stay tuned...


update: there seems to be no problem with the small ß on my system with the original version of geth and old version of liner (it will be copied and displayed correctly, even cracking with hashcat works with the generated UTC files) ... the capital letter ẞ does display some question marks (?) but I think this is because it was introduced in unicode only very recently and I wouldn't worry about that too much here (the unlocking and cracking of the capital letter ẞ works, just the display within the console is screwed up with a question mark) update: after further investigation it turned out that the esszett only works with the old version of geth&liner if the German Keyboard language was selected (for some reason I only tested again with that specific language settings). The characters do not get recognized if English was selected as the keyboard language (again this problem exist with old version of liner and it happens on my system with the US keyboard too)

update 2: I just noticed that there might be other (very little older) liner fixes too that might be important for this issue: see peterh/liner@e1deee0 o.O

btw: I'm still trying to see what ranges of characters work and which do not, but it is kind of non-intuitive to me... it's not really a threshold code value in the extended ASCII/utf-8 character space... it works for some special character, then there are some that do not work and then it start working for some codes again and it fails again etc... I was thinking if there is somehow a different code page or character encoding involved that might explain this behaviour (btw: my cmd says that it is using 437 (OEM - United States), not sure if that matters... but it is still kind of "random" to me which character work and which do not)

update 3: while trying to debug this problem, it turned out that this "barrier" is the main trigger of the issue I described above i.e. my particular test cases: https://github.com/peterh/liner/blob/e1deee0acfb34066ae65e7daa79265f8533c5546/input_windows.go#L197-L199
basically whenever the "KeyDown" field was zero, the characters are completely ignored and it seems that this also depends on the keyboard settings (I found out that most importantly the CF_LOCAL changes from 07 04 (ger) to 09 04 (US), when it comes to the clipboard content). I think that we can assume that whenever there was no keydown event, the passwords were not used correctly.

btw: my debuggings show that the section of code that is run for my specific test cases is always this one: peterh/liner@161ea9c#diff-8ea1b116174305efd1aa198c64cb38cfR210

i.e. it is "just" a normal ke.Char code (for instance for Ó the value is 211), but without keyDown set and with ke.VirtualKeyCode == vk_menu (therefore in this particular case it is no surrogate or anything like that and also the case surrogate > 0 code sections etc are not used with the Ó etc characters).


therefore my conclusion after this quite intensive troubleshooting and testing session is that depending on the keyboard layout there could be some characters (in my tests always with character codes > 0x7f) that are send via the ALT+unicode method instead of the "normal" keydown event. Depending on the keyboard language selected, there might be some codes that are > 0x7f that do not use the ALT method (common characters for that specific language?).
I'm not 100% sure how geth can 100% correctly test this... it's quite difficult without doing weird information gathering about keyboard language and getting the raw input (and input events) to see if the ALT method was used (maybe the only valid solution is to modify the liner code to construct both password with the old and new code of liner, i.e. the continue or with ALT+unicode handling). Another problem also might be that the user changed the keyboard layout afterwards or switch to a different computer but is using the same keystore file... you can't really determine 100% what happened when the user first set the (wrong) password. It's a very dangerous and difficult to fix bug (if my conclusions about the keyboard layout dependency, the misuse of the ALT event etc are correct) :(

@philsmd
Copy link
Author

philsmd commented Feb 7, 2019

This must be a joke.

Are you serious ? Why was this issue closed ?

We've discussed in depth within the above posts that you can't just update the dependencies and do as if the wrong-password bug with special characters was never a thing and that this bug never existed.

This is similar to the bug here #2371 where the fix was also completely wrong, if you look at the user experience... That old fix for instance (also) made it even worse, because now the users can't even test with the old password (with wrongly added newline characters in that specific case).

Why are people even wondering why threads like this with user claiming that geth/mist set the "wrong password" ethereum/mist#3513 are that long and very confusing/weird if you are that reluctant to fix the problem properly and address all points that were raised above ?

This is not how you fix issues, you should know better. You are risking to lock people out of their wallets and making it even more annoying to understand what the real password (versus the incorrectly used password set by geth) is.

cc @holiman , @evertonfraga

@holiman
Copy link
Contributor

holiman commented Feb 8, 2019

We decided that the right approach would be to update the liner to behave correctly going forward. That said, it's true that there might be corner-cases where people are using a wallet with a 'corrupt' password.

However, this bug has nothing whatesoever to do with anyone who has set password via Mist: this concerns only this case:

  • Users on Windows who are copy-pasting passwords directly into the geth console
    • and, it seems, also has some strange locale/ and or/ keybooard settings

So it's probably better to use the correct liner, and maybe provide an additional binary, e.g. mirror.

Suggestion: The mirror binary could use the old liner, and basically do

This the echoing liner-tester. Any input you paste will be echoed back in hexadecimal form 
> 

The user could then copy-paste his password, and see what the corresponding hexdata is. He could then store that (binary) data into a password file, and use for unlock.

Such a binary could also help us discover more information about this corruption, and eventually add the required rules/manglings to adjust for this quirk internally within geth/clef.

How does that approach sound?

@philsmd
Copy link
Author

philsmd commented Feb 8, 2019

I mean that would be a first start, but it doesn't really solve the problem and is also not really user friendly and very practical.
How are users notified that they need to use the "mirror" binary ? Kind of nobody would use it at the end. They will just freak out why their password stopped working and they will try to use some online wallet (maybe even a scam site) instead.

The main problem is that there might be a considerable large amount of windows users affected by this problem that for instance used password managers to generate the password with special characters etc and that won't even think about trying to test their password anytime soon ("hodlers" ?). They might try to unlock their account in a couple of months/years and won't get any hint why the password is not working anymore (yeah, I saw that you are planning to add a line within the changelog about this, but who reads the changelog anyway ?) There must be at least some kind of warning if the password does not work within the console itself and when the raw input/password meets the pre-requisit for this bug: windows user/special ALT characters

@philsmd
Copy link
Author

philsmd commented Feb 14, 2019

Please reopen this github issue. The problem is not fixed. The patch made the problem even worse for the users affected.

denis-papyrus pushed a commit to papyrusglobal/go-ethereum that referenced this issue Mar 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants