Skip to content

IO error while decoding <path to worksheet> with UTF-8 Please try specifying another one using the -encoding option #120

Closed
megri opened this Issue Mar 1, 2013 · 17 comments

4 participants

@megri
megri commented Mar 1, 2013

If using Swedish characters like åäö in a worksheet, it won't compile.

I have tried adding both '-encoding UTF-8' and '-Dfile.encoding=UTF-8' to the Additional command line parameters-box under Preferences>Scala>Compiler to no avail.

My worksheet is encoded using UTF-8 without BOM.

Scala plugin version: 3.0.0.rc1-2_10-201302280900-dd11367

Scala compiler version: 2.10.1.v20130225-100037-RC2-4d1a1f7ee5
Scala library version: 2.10.1.v20130225-100037-RC2-4d1a1f7ee5
Eclipse version: 4.2.1.v201209141800

encoding-bug-worksheet

@megri
megri commented Mar 1, 2013

Addendum: my original source is located @ ../one-shots/src/testing.sc and is indeed UTF-8. However, the file @ ../one-shots/.worksheets/src/testing.scala is plain ANSI. The encoding seems to be lost during pre-compilation.

@dragos
Eclipse Scala IDE member
dragos commented Mar 1, 2013

Thank you for the precise diagnosis, that is indeed what happened. We'll fix it.

@veeandy
veeandy commented Apr 10, 2013

any update on this?

@dotta
Eclipse Scala IDE member
dotta commented Apr 11, 2013

any update on this?

@veeandy We are busy working on the Play2 Eclipse plug-in but, as usual, PRs are welcomed ;-)

@dragos
Eclipse Scala IDE member
dragos commented Apr 11, 2013

Sorry for the delay, as @dotta mentioned, we are now focusing on improving the Play2 plugin. However, this is a fairly easy fix, and it would be cool to get some help from the community. The code is here and it even has a FIXME comment. To specify an encoding, it should be enough to change the FileWriter to a Writer that takes an encoding parameter (that should be retrieved from Eclipse settings, see ScalaProject.getEncoding in the Scala IDE.

@megri
megri commented Apr 12, 2013

I think I may have a fix for this but I can't figure out how to build the project for testing..

@dotta
Eclipse Scala IDE member
dotta commented Apr 12, 2013

I think I may have a fix for this

Great!

but I can't figure out how to build the project for testing..

I'll have a look and update the documentation if needed. I'll be back with more info soon.

@dotta
Eclipse Scala IDE member
dotta commented Apr 12, 2013

@megri So, it turns out we need to do some clean-up in the POM and the documention (expect to have some news about this early next week).

For the moment, run the following command for building the worksheet:

mvn -P 2.9.x -P nightly-scala-ide-scala-2.9 -P indigo clean install

If that compiles fine, feel free to issue a PR.

@dragos
Eclipse Scala IDE member
dragos commented Apr 16, 2013

See PR #123, it should simplify the build.

@megri
megri commented Apr 16, 2013

Cool, I'll have a fix ready soon. It will be a bit dirty due to the state of ResidentCompiler.scala but should consider charset on by-file basis.

@megri
megri commented Apr 17, 2013

I've come across a problem that needs discussion before continuing on.

My initial intention was to allow the worksheet to be compiled using whatever encoding the worksheet FILE was using, as opposed to project. This makes sense if you're working in a cross-platform/developer environment where file encodings may vary.

The problem is that the scalatools.nsc.Global-compiler locks down the encoding it's going to use after initialization. As the settings object is mutable this had me confused at first until I looked at line 315 in the source.

I see a couple of ways to proceed:
1. resolve project encoding at startup, ignore file level encodings;
2. replace the compiler with a new instance when the encoding of the source and the compiler mismatch;
3. mutate the compiler instance's internal SourceReader by reflecting the crap out of it; or
4. lazily launch and cache a new compiler instance for each encoding encountered

What do you think?

@dotta
Eclipse Scala IDE member
dotta commented Apr 18, 2013

I've come across a problem that needs discussion before continuing on.

My initial intention was to allow the worksheet to be compiled using whatever encoding the worksheet FILE was using, as opposed to project. This makes sense if you're working in a cross-platform/developer environment where file encodings may vary.

The problem is that the scalatools.nsc.Global-compiler locks down the encoding it's going to use after initialization. > As the settings object is mutable this had me confused at first until I looked at line 315 in the source.

Good catch!

I see a couple of ways to proceed:
1. resolve project encoding at startup, ignore file level encodings;

At the moment, this is the option I would go with; it's pragmatic, and hopefully the simplest one to implement. It would be convenient if an error is reported to the user when the opened worksheet file doesn't use the project's encoding.

  1. replace the compiler with a new instance when the encoding of the source and the compiler mismatch;

This would work. However, starting up a new compiler takes time. If the user has many worksheet sources that use different encodings I'm afraid it will get frustrated because of the waiting time, and it would end up blaming the tool.

  1. mutate the compiler instance's internal SourceReader by reflecting the crap out of it; or

That sounds scary :) I'm wondering if it would actually work. But we should probably try to stay away from black-magic wizardry.

  1. lazily launch and cache a new compiler instance for each encoding encountered

This would also work, but each compiler instance eats up quite some memory (the actual size depends on your classpath). Both Eclipse and Scala IDE are already eating up quite some memory by themselves, so I'd rather not create and cache a new compiler instance per worksheet/encoding :)

What do you think?

@dragos
Eclipse Scala IDE member
dragos commented Apr 22, 2013

@megri, the problem is much simpler, I think. Here's what I see:

  • the worksheet encoding is UTF-8
  • the presentation compiler correctly uses UTF-8 to read the worksheet source
  • the "resident" compiler (the one that compiles the worksheet) correctly tries to read the instrumented worksheet code using UTF-8, but fails.

It looks like the instrumented source (under .worksheet/src) is saved using the default encoding. So, it seems to me, the only encoding that's missing is for writing the instrumented code to disk. That should be all fixable in runtime.Configuration.scala:74

Hope this helps.

@megri
megri commented Apr 22, 2013

What if the platform encoding isn't UTF-8? Windows uses Cp1252 by default; creating a new project will make the files use Cp1252 so enforcing UTF-8 won't work. The reason it works as it is right now is that both nsc.Global and Eclipse default to the platform encoding. Running a "linux" worksheet under windows won't work as the file/platform charsets will mismatch.

Please correct me if I'm wrong :)

@dragos
Eclipse Scala IDE member
dragos commented Apr 22, 2013

@megri, you are half-right. :)

The reason it works right now is that the presentation compiler respects the platform encoding. There's no "default", or at least, the default is wrong in 99% of the cases. For instance, on MacOs the default is MacRoman. The compiler picks it up from Eclipse ScalaProject.scala:491

Right now, there is a mismatch between the encoding used to read the file and the one to write it. That's one bug. The other (could be considered an enhancement), is to allow for per-file encodings. That won't happen very soon because of the Scala compiler, and I think it's probably not very common to have different encodings in the same project. My suggestion is to keep this ticket about the first issue, which seems way more common and annoying.

@megri
megri commented Apr 22, 2013

@dragos there we go then!

@megri megri added a commit to megri/scala-worksheet that referenced this issue Apr 24, 2013
@megri megri Fixes #120 b79a12a
@megri megri added a commit to megri/scala-worksheet that referenced this issue Apr 24, 2013
@megri megri Allow multibyte characters in a worksheet
These changes attempt to fix an issue where a worksheet would be instrumented using the platform's default encoding rather than the project's. The fix collects its encoding by calling getDefaultCharset() on the eclipse project resource.

Be aware that the fix is ignorant to file-level encoding and/or project encoding changes; changing the worksheet's encoding will cause it to fail.

Fix #120
bc7de07
@dotta dotta pushed a commit that closed this issue Apr 24, 2013
@megri megri Allow multibyte characters in a worksheet
These changes attempt to fix an issue where a worksheet would be instrumented using the platform's default encoding rather than the project's. The fix collects its encoding by calling getDefaultCharset() on the eclipse project resource.

Be aware that the fix is ignorant to file-level encoding and/or project encoding changes; changing the worksheet's encoding will cause it to fail.

Fix #120
bc7de07
@dotta dotta closed this in bc7de07 Apr 24, 2013
@dotta
Eclipse Scala IDE member
dotta commented Apr 26, 2013

Unfortunately, I need to re-open this ticket. see #127 for details.

@dotta dotta reopened this Apr 26, 2013
@dotta dotta added a commit that closed this issue May 1, 2013
@dotta dotta Set file encoding to `UTF-8` for tests execution in Tycho
* Had to change the return type of the Snowman to `AnyRef` for
  the test to pass against both Scala 2.9 and Scala 2.10 (the
  returned name for the `String` type is different in 2.9 and
  2.10)

* Pass `-Dfile.encoding=UTF-8` to Tycho test task.
* Re-enabled encoding test.

Fix #120
e818b54
@dotta dotta closed this in e818b54 May 1, 2013
@nadavwr nadavwr added a commit that referenced this issue May 13, 2013
@dotta dotta Set file encoding to `UTF-8` for tests execution in Tycho
* Had to change the return type of the Snowman to `AnyRef` for
  the test to pass against both Scala 2.9 and Scala 2.10 (the
  returned name for the `String` type is different in 2.9 and
  2.10)

* Pass `-Dfile.encoding=UTF-8` to Tycho test task.
* Re-enabled encoding test.

Fix #120
afe5c52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.