Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Un-special-case unicode escapes #8282

Open
wants to merge 8 commits into
base: 2.13.x
from

Conversation

@martijnhoekstra
Copy link
Contributor

commented Jul 30, 2019

Unicode escapes are handled like any other escape, in any context a normal escape could also work.

That means Unicode escapes get expanded in

  • Char literals
  • String literals
  • s and f interpolations
  • back quoted identifiers

triple quoted strings and raw interpolations don't allow normal escapes, but have allowed unicode escapes, since those were handled earlier, in the scanner. In this PR doing so emits a deprecation warning, unless under -xSource:2.14, where the unicode escapes are simply unprocessed

WIP because

  • Needs clearer tests around deprecation
  • Needs to be sure local partest failures are as local as I expect them to be
  • Needs to consider what to do with -X-no-uescape
  • Needs community build
  • Needs spec updates
  • Fix caret position for warnings to the position of the \
  • Revisit exceptions and wording used
  • Investigate whether the invalid escape errors have regressed in terms of how they're shown (compiler crash-like)

@scala-jenkins scala-jenkins added this to the 2.13.1 milestone Jul 30, 2019

@martijnhoekstra martijnhoekstra force-pushed the martijnhoekstra:lostEscape branch 2 times, most recently from 220fc45 to 8b06af7 Jul 30, 2019

@martijnhoekstra

This comment has been minimized.

Copy link
Contributor Author

commented Aug 2, 2019

Regarding -X-no-uescape, this PR gives it the axe (-X?) in the spirit of scala/scala-dev#430

The setting was passed to JavaByteArrayReader, but not escaping Unicode escapes in java source was never valid in the first place per jls 3.3.

Removing that parameter outright is both source and binary breaking though, and probably not worth the deprecation cycle (though I'm happy to accommodate whatever if greater minds think unlike)

@martijnhoekstra

This comment has been minimized.

Copy link
Contributor Author

commented Aug 2, 2019

Linking scala/bug#3220 which this closes.

@SethTisue could you kick off a community build for this branch?

@martijnhoekstra martijnhoekstra force-pushed the martijnhoekstra:lostEscape branch from 5a76a6f to 0bbdab7 Aug 2, 2019

@SethTisue

This comment has been minimized.

Copy link
Member

commented Aug 3, 2019

@SethTisue

This comment has been minimized.

Copy link
Member

commented Aug 4, 2019

[scala-parser-combinators] [error] scala.StringContext$InvalidEscapeException: invalid escape '\[' not one of [\b, \t, \n, \f, \r, \, \", \', \uxxxx] at index 37 in "([^"\x00-\x1F\x7F\\]|\\[\\'"bfnrt]|\\u[a-fA-F0-9]{4})*". Use \\ for literal \.
[fast-string-interpolator] [info] - should don't compile in case of escaping error *** FAILED *** (11 milliseconds)
[fast-string-interpolator] [info]   "Expected no compiler error, but got the following type error: "invalid escape '\d' not one of [\b, \t, \n, \f, \r, \, \", \', \uxxxx] at index 0 in "\d". Use \\ for literal \.", for code:  fs"\d" " did not contain "invalid escape '\d' not one of [\b, \t, \n, \f, \r, \\, \", \'] at index 0 in "\d". Use \\ for literal \." (FastStringInterpolatorSpec.scala:34)

@martijnhoekstra martijnhoekstra force-pushed the martijnhoekstra:lostEscape branch 4 times, most recently from f5c2800 to dbb8483 Aug 6, 2019

@martijnhoekstra

This comment has been minimized.

Copy link
Contributor Author

commented Aug 6, 2019

@SethTisue could you start another community build? The parser combinators failures are hopefully fixed now (and the other one was spurious)

@SethTisue

This comment has been minimized.

@martijnhoekstra

This comment has been minimized.

Copy link
Contributor Author

commented Aug 7, 2019

I'm seeing a lot of community build stuff going on, but I can't figure out where, if anywhere, this branch is running now. Could you take a look and squeeze one in if you're not too swamped (or the servers) with the build itself?

@SethTisue

This comment has been minimized.

Copy link
Member

commented Aug 7, 2019

oops, I shouldn't have canceled 2439, I got confused as well. (Jenkins doesn't let you attach a description to a run until the run actually starts)

queued: https://scala-ci.typesafe.com/job/scala-2.13.x-integrate-community-build/2449/

@SethTisue

This comment has been minimized.

Copy link
Member

commented Aug 8, 2019

consider that a green run — the handful of failures are unrelated to your changes

@martijnhoekstra

This comment has been minimized.

Copy link
Contributor Author

commented Aug 9, 2019

No longer WIP. Not sure whom to request for review, there is a bit of spec, a bit of parser, a bit of interpolation.

@martijnhoekstra martijnhoekstra changed the title WIP: Un-special-case unicode escapes Un-special-case unicode escapes Aug 9, 2019

@martijnhoekstra martijnhoekstra force-pushed the martijnhoekstra:lostEscape branch 3 times, most recently from 711a638 to cf783ec Aug 9, 2019

@volth

This comment has been minimized.

Copy link

commented Aug 17, 2019

So, s"\u0022$a\u0022" is now valid ?
After deprecation of s"\042$a\042" in 2.13 it is great news.

Then it closes scala/bug#6476 too

@martijnhoekstra

This comment has been minimized.

Copy link
Contributor Author

commented Aug 17, 2019

@martijnhoekstra martijnhoekstra force-pushed the martijnhoekstra:lostEscape branch from ee96400 to a83a517 Aug 20, 2019

@martijnhoekstra

This comment has been minimized.

Copy link
Contributor Author

commented Aug 20, 2019

Rebased for conflicts on mima exceptions.

@volth see a83a517 for a tes of that behaviour -- it works.

@martijnhoekstra

This comment has been minimized.

Copy link
Contributor Author

commented Aug 21, 2019

/rebuild

@martijnhoekstra martijnhoekstra force-pushed the martijnhoekstra:lostEscape branch from a83a517 to 36a5aee Sep 3, 2019

@martijnhoekstra

This comment has been minimized.

Copy link
Contributor Author

commented Sep 3, 2019

rebased again for mima exceptions.

@szeiger

This comment has been minimized.

Copy link
Contributor

commented Sep 3, 2019

What's the state of this in Dotty? I'd almost call this an uncontroversial bug fix that could probably be done without a SIP but we should at least have Dotty on board with this change. Rescheduling for 2.13.2 for now.

@martijnhoekstra

This comment has been minimized.

Copy link
Contributor Author

commented Sep 3, 2019

I didn't port this to dotty, so I would assume there is no state in dotty ATM.

I don't intend on backporting to 2.12 myself, but I suspect it would be welcomed if this is accepted.

@SethTisue

This comment has been minimized.

Copy link
Member

commented Sep 3, 2019

I suggest we leave 2.12.x alone. 2.12 is in maintenance mode, and this is a design change and not an implementation fix

treat unicode escapes as any other escape
except in triple quotes strings and raw interpolations
for those, emit deprecation warnings, but keep the current
behaviour of processing unicode escapes

@martijnhoekstra martijnhoekstra force-pushed the martijnhoekstra:lostEscape branch from 36a5aee to aa42d7d Sep 5, 2019

@martijnhoekstra

This comment has been minimized.

Copy link
Contributor Author

commented Sep 5, 2019

@smarter are you able to answer the question of @szeiger?

@smarter

This comment has been minimized.

Copy link
Contributor

commented Sep 5, 2019

I assume it does whatever scalac was doing so far, I don't personally have the time/energy to port this to Dotty.

@martijnhoekstra

This comment has been minimized.

Copy link
Contributor Author

commented Sep 8, 2019

Is porting this to dotty required?

@smarter

This comment has been minimized.

Copy link
Contributor

commented Sep 8, 2019

No such requirements have been established for scalac PRs yet, but if it doesn't get ported to Dotty now, there's a very real chance that it will be forgotten about and won't be part of 3.0.

@martijnhoekstra

This comment has been minimized.

Copy link
Contributor Author

commented Sep 8, 2019

I don't think I'll be able to port this to Dotty.

@smarter

This comment has been minimized.

Copy link
Contributor

commented Sep 8, 2019

FWIW, the Dotty scanner is pretty close to the scalac one, so I don't think a port of the scanner changes would be too hard. The changes to StringContext in the standard library would require a new release of scala-library for Dotty to depend on first, so that's a good argument for getting this PR merged first.

@smarter

This comment has been minimized.

Copy link
Contributor

commented Sep 8, 2019

Also, instead of a dotty PR, an issue opened against dotty to make sure we remember to eventually port this change would be fine too.

@SethTisue

This comment has been minimized.

Copy link
Member

commented Sep 8, 2019

@martijnhoekstra perhaps someone reading this like would like to volunteer to port the change to Dotty? seems like a good getting-started project for someone interested in becoming a Dotty contributor

@martijnhoekstra

This comment has been minimized.

Copy link
Contributor Author

commented Sep 8, 2019

@smarter

This comment has been minimized.

Copy link
Contributor

commented Sep 8, 2019

What crash on incremental compilation ?

@martijnhoekstra

This comment has been minimized.

Copy link
Contributor Author

commented Sep 8, 2019

@smarter

This comment has been minimized.

Copy link
Contributor

commented Sep 8, 2019

Ah right, I think the only solution is to switch to WSL, or give up on Windows, or rewrite sbt's classloader handling from scratch.

Perhaps it's easier to wait for 3.0, and then target 3.0.1

I'm not sure this is a change that can go into a minor release.

@martijnhoekstra

This comment has been minimized.

Copy link
Contributor Author

commented Sep 8, 2019

@lrytz

This comment has been minimized.

Copy link
Member

commented Sep 10, 2019

@martijnhoekstra I re-opened this PR, I think it should be merged for 2.13.2 (once it's reviewed, and 2.13.1 is released). The question how we manage forward-porting changes from Scala 2 to 3 is orthogonal to the improvements implemented here. The fact that we haven't figured it out yet doesn't block improvements such as this one from happening in Scala 2. We will also not condition contributions to Scala 2 on having a forward-port to Scala 3 available.

@lrytz lrytz reopened this Sep 10, 2019

@martijnhoekstra

This comment has been minimized.

Copy link
Contributor Author

commented Sep 10, 2019

That's a nice surprise.

While trying to port to dotty, I came across some issues that I want to double-check here (does it not accidentally interpret an unicode escape as terminator).

case ((p, o), i) if p != o => i
}.getOrElse(processed.length - 1)

c.warning(lit.pos.withShift(diffindex), "Unicode escapes in raw interpolations are deprecated as of scala 2.13.1, and will be removed in scala 2.14. Use the literal character instead.")

This comment has been minimized.

Copy link
@lrytz

lrytz Sep 10, 2019

Member

You can use currentRun.reporting.deprecationWarning

^
t3220-1.scala:3: error: invalid unicode escape at index 6 of foo \unope that's wrong
val badtriple = """foo \unope that's wrong"""
^

This comment has been minimized.

Copy link
@lrytz

lrytz Sep 10, 2019

Member

caret off?

This comment has been minimized.

Copy link
@martijnhoekstra

martijnhoekstra Sep 10, 2019

Author Contributor

pesky off-by-3 errors

@martijnhoekstra martijnhoekstra force-pushed the martijnhoekstra:lostEscape branch from 9ef34cc to 2fb3f6b Sep 10, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.