Skip to content
This repository has been archived by the owner on Jan 3, 2023. It is now read-only.

Can't type Unicode between U+0080 and U+00FF in F# Interactive on Linux and OS X #654

Closed
rmunn opened this issue Jan 11, 2017 · 17 comments
Closed

Comments

@rmunn
Copy link

rmunn commented Jan 11, 2017

Linux Mint 18 (based on Ubuntu 16.04 LTS), F# 4.0. Unicode characters between U+0080 and U+00FF seem to be getting replaced by � (U+FFFD REPLACEMENT CHARACTER) at the F# Interactive prompt. I can include these characters in .fsx scripts (encoded in UTF-8) and run them just fine, but when I try to type (or copy & paste) the same characters at the F# Interactive prompt, I get � instead:

bad-utf8-in-fsi

One line is highlighted in that screenshot because that's what I copied and pasted into the F# Interactive window -- but as you can see, the µ was replaced by �. Same thing occurred when I typed µ at the keyboard. However, using #load to load and run the .fsx file printed the correct output, so this problem is limited to the F# Interactive window.

I first encountered this behavior in ionide/ionide-vscode-fsharp#290, but since I'm able to reproduce it entirely in the F# Interactive terminal with no IDE running at all, I believe the root cause of that bug is something in fsi.exe and not in the Ionide plugin.

Note that this bug does NOT appear to happen on Windows 10 or Mac OS X (I don't know which version of OS X, I just know that in the ionide/ionide-vscode-fsharp#290 bug comments, someone reported that he could not reproduce the bug on OS X). Both people who have reproduced this bug seem to be running Linux.

Expected behavior

The µ character would show up correctly when typed into the terminal.

Actual behavior

All Unicode characters between U+0080 (unassigned control character) and U+00FF (LATIN SMALL LETTER Y WITH DIAERESIS) appear to be replaced by � (U+FFFD REPLACEMENT CHARACTER) when typed at the F# Interactive prompt. Other characters work fine:

rmunn@rmunn-vm-mint18 ~ $ echo 'ñòóôõöøùúûüýþÿĀāĂ'
ñòóôõöøùúûüýþÿĀāĂ
rmunn@rmunn-vm-mint18 ~ $ echo 'ñòóôõöøùúûüýþÿĀāĂ' | xxd
00000000: c3b1 c3b2 c3b3 c3b4 c3b5 c3b6 c3b8 c3b9  ................
00000010: c3ba c3bb c3bc c3bd c3be c3bf c480 c481  ................
00000020: c482 0a                                  ...
rmunn@rmunn-vm-mint18 ~ $ fsharpi

F# Interactive for F# 4.0 (Open Source Edition)
Freely distributed under the Apache 2.0 Open Source License

For help type #help;;

> printfn "��������������ĀāĂ" ;;
��������������ĀāĂ
val it : unit = ()
> #quit;;

- Exit...

Again, the text in F# Interactive was a direct copy-and-paste from the text typed into the Bash shell.

Known workarounds

No workarounds known at the moment. This is interfering with being able to use the Ionide plugin's Alt+Enter shortcut to send text to the F# Interactive window in VS Code.

Related information

Reproduced on Linux Mint 18 (based on Ubuntu 16.04) and on Ubuntu 14.04. In both cases, the fsharp package was the latest available from the Mono distribution's official Debian/Ubuntu repo. Terminal used was GNOME terminal, and value of $LANG variable was en_US.UTF-8.

@knocte
Copy link
Contributor

knocte commented Jan 11, 2017

Linux Mint 18 (based on Ubuntu 16.04 LTS)

FYI bug also happens in Ubuntu16.04LTS, just tested.

@rmunn
Copy link
Author

rmunn commented Jan 11, 2017

I believe what is happening is that the readKeyFixup function is being called when there's no need for it on Linux. It is designed to work around an older Windows bug where System.Console.ReadKey would return a byte from the system codepage, rather than a Unicode character. On any modern Linux distribution, this is unnecessary, and calling the readKeyFixup function in its current state would end up replacing any character with �. (But it is only called for characters <= 255, hence the pattern of behavior I noticed in this bug).

There are compilation flags set to skip building this function on non-Windows systems, but it appears that the Debian package was built from an older version of the code where those compilation flags had not yet been added. So this may in fact be a bug in the Debian (and Ubuntu) packaging, and not a bug in F# Interactive after all. Once I've verified that this bug does not require any changes in F#, I'll open a bug in the appropriate Debian bugtracker and close this GitHib issue.

@rmunn
Copy link
Author

rmunn commented Jan 11, 2017

Debian bug report opened at https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=850911.

@ismail
Copy link

ismail commented Jan 26, 2017

Why is this bug closed?

@dsyme
Copy link
Contributor

dsyme commented Jan 26, 2017

@ismail @rmunn I was under the impression it was a Debian bug, and that this issue should have been closed?

So this may in fact be a bug in the Debian (and Ubuntu) packaging, and not a bug in F# Interactive after all. Once I've verified that this bug does not require any changes in F#, I'll open a bug in the appropriate Debian bugtracker and close this GitHib issue.

@ismail
Copy link

ismail commented Jan 26, 2017

For me I can reproduce this on openSUSE and we do not modify fsharp in any way.

@dsyme dsyme reopened this Jan 27, 2017
@dsyme
Copy link
Contributor

dsyme commented Jan 27, 2017

@ismail OK, reopened, thanks for checking

@rmunn
Copy link
Author

rmunn commented Feb 3, 2017

Yes, I should have posted a followup comment: I do think that some changes will be needed in F#'s terminal handling. I didn't post anything because I don't yet know what they are.

Thinking about it a bit more, it might be best to check the input encoding and, if it's set to UTF-8, skip the readKeyFixup function. That would require testing the Windows versions affected by the bug that readKeyFixup was written to work around, and I don't have access to those older versions. (The test would running chcp 65001 to set the codepage in cmd.exe to 65001, Microsoft's codepage number of UTF-8, and then seeing what happens in the F# Interactive REPL).

@rmunn
Copy link
Author

rmunn commented Feb 3, 2017

Also note that #594 is a reproduction on OS X, so this affects more than just Linux. I'll edit the issue title accordingly.

@rmunn rmunn changed the title Can't type Unicode between U+0080 and U+00FF in F# Interactive on Linux Can't type Unicode between U+0080 and U+00FF in F# Interactive on Linux and OS X Feb 3, 2017
@haf
Copy link

haf commented Mar 11, 2017

May be causing this issue of mine: http://forum.pdfsharp.net/viewtopic.php?f=2&t=3553&p=10681#p10681 I must have spent 2 days on this not realising that it's may be the F# interactive that breaks my invoicing.

@dsyme
Copy link
Contributor

dsyme commented Mar 29, 2017

@haf could you check this is fixed by tag 4.1.5 please? thanks

@dsyme dsyme closed this as completed Mar 29, 2017
@haf
Copy link

haf commented Mar 30, 2017

How do I test it?

@dsyme
Copy link
Contributor

dsyme commented Mar 30, 2017

@rmunn could you check this is fixed by tag 4.1.5 please? thanks

@ismail
Copy link

ismail commented Mar 30, 2017

@dsyme Confirmed fixed on openSUSE, thanks!

@dsyme
Copy link
Contributor

dsyme commented Mar 30, 2017

@ismail lovely, thank you!

@rmunn
Copy link
Author

rmunn commented Mar 31, 2017

@dsyme And I can confirm that tag 4.1.5 fixes the bug on Linux Mint 18 (which is based on Ubuntu 16.04). I still can't build the visualfsharp repo for the reasons I mentioned in dotnet/fsharp#2582 (comment), but I've been able to build this repo and the fix is working here.

@rmunn
Copy link
Author

rmunn commented Apr 9, 2017

Further confirmation: I just installed the latest .deb package built from the https://github.com/mono/linux-packaging-fsharp/ repo, on a machine where I had not yet done anything to fix this bug manually. In other words, the laptop that I'm typing on right now was exhibiting this bug immediately prior to installing version 4.1.6-0xamarin1+debian7b1 of the Mono-built fsharp package, and immediately after installing version 4.1.6-0xamarin1+debian7b1, the bug went away and I could type ä properly in F# Interactive.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants