Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-printable characters in symbols should be properly escaped #468

Open
armanbilge opened this issue Jun 6, 2022 · 10 comments
Open

Non-printable characters in symbols should be properly escaped #468

armanbilge opened this issue Jun 6, 2022 · 10 comments

Comments

@armanbilge
Copy link
Contributor

Code like this:

    if (buffer.startsWith(Shared.`\r\n`.toArray)) {

https://github.com/http4s/http4s/blob/f9533ec5b499c277f915c7f275be5fdb3af20fac/ember-core/shared/src/main/scala/org/http4s/ember/core/ChunkedEncoding.scala#L123

Generates a scoverage.coverage like this:

30
/workspace/http4s/ember-core/shared/src/main/scala/org/http4s/ember/core/ChunkedEncoding.scala
org.http4s.ember.core
ChunkedEncoding
Object
org.http4s.ember.core.ChunkedEncoding
go
2366
2372
72
org.http4s.ember.core.Shared.

Select
false
0
false
Shared.



Which makes great sadness like this:

[error] java.lang.IllegalArgumentException: For input string: "Select"
[error]         at scala.collection.immutable.StringLike.parseBoolean(StringLike.scala:327)
[error]         at scala.collection.immutable.StringLike.toBoolean(StringLike.scala:289)
[error]         at scala.collection.immutable.StringLike.toBoolean$(StringLike.scala:289)
[error]         at scala.collection.immutable.StringOps.toBoolean(StringOps.scala:33)
[error]         at scoverage.Serializer$.toStatement$1(Serializer.scala:115)
[error]         at scoverage.Serializer$.deserialize(Serializer.scala:144)
[error]         at scoverage.Serializer$.deserialize(Serializer.scala:89)
[error]         at scoverage.ScoverageSbtPlugin$.loadCoverage(ScoverageSbtPlugin.scala:318)
[error]         at scoverage.ScoverageSbtPlugin$.$anonfun$coverageReport0$1(ScoverageSbtPlugin.scala:148)
[error]         at scoverage.ScoverageSbtPlugin$.$anonfun$coverageReport0$1$adapted(ScoverageSbtPlugin.scala:139)
@armanbilge
Copy link
Contributor Author

I can make a PR to fix this. Is this data format specced anywhere?

@armanbilge armanbilge changed the title Newlines in symbols should be properly escaped Backslashes in symbols should be properly escaped Jun 6, 2022
@armanbilge
Copy link
Contributor Author

Also, I can't really grasp whether this is an issue with newlines in particular, or backslashes in general 🤔

@ckipp01
Copy link
Member

ckipp01 commented Jun 6, 2022

I can make a PR to fix this. Is this data format specced anywhere?

It's versioned and sort of defined in here, since it needs to both deserialize 2 stuff that this plugin created and also 3 that was generated by the Scala 3 compiler.

def writeHeader(writer: Writer): Unit = {
writer.write(
s"""# Coverage data, format version: ${Constants.CoverageDataFormatVersion}
|# Statement data:
|# - id
|# - source path
|# - package name
|# - class name
|# - class type (Class, Object or Trait)
|# - full class name
|# - method name
|# - start offset
|# - end offset
|# - line number
|# - symbol name
|# - tree name
|# - is branch
|# - invocations count
|# - is ignored
|# - description (can be multi-line)
|# '\f' sign
|# ------------------------------------------
|""".stripMargin
)
}

@armanbilge
Copy link
Contributor Author

since it needs to both deserialize 2 stuff that this plugin created and also 3 that was generated by the Scala 3 compiler.

Hum, I'm confused about this. It seems to only be capable of deserializing v3, if I understanding correctly.

val headerFirstLine = lines.next()
require(
headerFirstLine == s"# Coverage data, format version: ${Constants.CoverageDataFormatVersion}",
"Wrong file format"
)

@ckipp01
Copy link
Member

ckipp01 commented Jun 6, 2022

Hum, I'm confused about this. It seems to only be capable of deserializing v3, if I understanding correctly.

Sorry, I wasn't clear. Yes, only V3. By 2, I mean Scala 2 coverage and Scala 3 coverage.

@armanbilge armanbilge changed the title Backslashes in symbols should be properly escaped Non-printable characters in symbols should be properly escaped Jun 6, 2022
armanbilge added a commit to http4s/http4s that referenced this issue Jun 6, 2022
@TheElectronWill
Copy link

TheElectronWill commented Jun 7, 2022

Also, I can't really grasp whether this is an issue with newlines in particular, or backslashes in general 🤔

The parser excepts the name to be on one line, because it uses new lines as a value separator, so I'd say that newlines are the biggest problem. Any character should be fine, as long as it doesn't break this expectation. Are there other characters that mess up lines? I know that unicode can be weird sometimes 😄

@armanbilge
Copy link
Contributor Author

Thanks, I think you are right! I was wondering whether if you define a method:

def `\n` = ???

whether the symbol should be a "newline" instead of \\n backslash-n. But I guess the compiler determines that.

Are there other characters that mess up lines? I know that unicode can be weird sometimes

Not sure tbh. I think JSON has a similar restriction and these are the characters they escape.

https://github.com/circe/circe/blob/92ab397a45f08c685bcb41527321b18db4e48c68/modules/core/shared/src/main/scala/io/circe/Printer.scala#L322-L331

@TheElectronWill
Copy link

TheElectronWill commented Jun 7, 2022

Thanks, I think you are right! I was wondering whether if you define a method:

def `\n` = ???

whether the symbol should be a "newline" instead of \\n backslash-n. But I guess the compiler determines that.

A quick test with scalac (dotty version) shows that the compiler keeps \n as a newline (it even shows up as a real newline in the trees with -Vprint:all 😄) and escape it (fortunately!) when producing bytecode. The def gets compiled to:

public Nothing$ $u000A() {
    return Predef$.MODULE$.$qmark$qmark$qmark();
}

@TheElectronWill
Copy link

edit: for scoverage, we should escape it to \\n (i.e. write two chars: \ and n), it's simpler. Of course it means that all \ must now be escaped. All like JSON

@armanbilge
Copy link
Contributor Author

Yes, I think we should follow JSON here 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants