Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

org.apache.avro.SchemaParseException: Illegal character in: return$ #86

Closed
mcenkar opened this issue Feb 15, 2022 · 7 comments
Closed

Comments

@mcenkar
Copy link
Contributor

mcenkar commented Feb 15, 2022

Hi,

Given:

@namespace( "foo.bar" )
protocol AProtocol {
  record A {
    union { null, int } id = null;
    union { null, string } return = null;
  }
}
  • Using: addSbtPlugin("com.julianpeeters" % "sbt-avrohugger" % "2.0.0-RC17")

It gets compiled to (correct and working):

case class A(id: Option[Int] = None, `return`: Option[String] = None)
  • Using: addSbtPlugin("com.julianpeeters" % "sbt-avrohugger" % "2.0.0-RC18")

It gets compiled to:

case class A(id: Option[Int] = None, return$: Option[String] = None)

In my case I eventually get following exception:

vulcan.AvroException$$anon$1: org.apache.avro.SchemaParseException: Illegal character in: return$
  at vulcan.AvroException$.apply(AvroException.scala:18)
  at vulcan.AvroError$$anon$3.throwable(AvroError.scala:282)
  at fs2.kafka.vulcan.AvroDeserializer$.$anonfun$using$extension$3(AvroDeserializer.scala:55)

Caused by: org.apache.avro.SchemaParseException: Illegal character in: return$
  at org.apache.avro.Schema.validateName(Schema.java:1566)
  at org.apache.avro.Schema.access$400(Schema.java:91)
  at org.apache.avro.Schema$Field.<init>(Schema.java:546)
  at org.apache.avro.Schema$Field.<init>(Schema.java:585)

Only workaround I've found is sticking to RC17 version.

@julianpeeters
Copy link
Owner

Thanks for the report @mcenkar. Unfortunately, my time-frame for a fix is very far out. PR's welcome of course, or else, if possible, you might try renaming the field to avoid Java keywords.

I can't recall why avrohugger adopted avro's field mangling strategy (it seems likely that only Specific format would need mangling, but I'm not certain): https://github.com/apache/avro/blob/9a378ffab2b49d95edcd2dba36691096301b64ee/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java#L97

mangling: https://github.com/julianpeeters/avrohugger/blob/197f8b51b7ab01595952483021199a52ad8b77fa/avrohugger-core/src/main/scala/format/FieldRenamer.scala#L5

related: #85

@mcenkar
Copy link
Contributor Author

mcenkar commented Feb 15, 2022

Hi,

so had a quick look and this file:

@namespace( "foo.bar" )
protocol AProtocol {
  record A {
    union { null, string } return = null;
    union { null, int } abstract = null;
    union { null, int } assert = null;
    union { null, int } break = null;
    union { null, int } byte = null;
    union { null, int } case = null;
    union { null, int } catch = null;
    union { null, int } char = null;
    union { null, int } class = null;
    union { null, int } const = null;
    union { null, int } continue = null;
    union { null, int } default = null;
    union { null, int } do = null;
    union { null, int } else = null;
    union { null, int } extends = null;
    union { null, int } final = null;
    union { null, int } finally = null;
    union { null, int } for = null;
    union { null, int } goto = null;
    union { null, int } if = null;
    union { null, int } implements = null;
    union { null, int } instanceof = null;
    union { null, int } interface = null;
    union { null, int } native = null;
    union { null, int } new = null;
    union { null, int } package = null;
    union { null, int } private = null;
    union { null, int } protected = null;
    union { null, int } public = null;
    union { null, int } short = null;
    union { null, int } static = null;
    union { null, int } strictfp = null;
    union { null, int } super = null;
    union { null, int } switch = null;
    union { null, int } synchronized = null;
    union { null, int } this = null;
    union { null, int } throw = null;
    union { null, int } transient = null;
    union { null, int } try = null;
    union { null, int } volatile = null;
    union { null, int } while = null;
  }
}

Compiles just fine using java -jar avro-tools-1.11.0.jar idl src/main/avro/a.avdl ./lol.avpr to:

{
  "protocol" : "AProtocol",
  "namespace" : "foo.bar",
  "types" : [ {
    "type" : "record",
    "name" : "A",
    "fields" : [ {
      "name" : "return",
      "type" : [ "null", "string" ],
      "default" : null
    }, {
      "name" : "abstract",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "assert",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "break",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "byte",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "case",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "catch",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "char",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "class",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "const",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "continue",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "default",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "do",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "else",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "extends",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "final",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "finally",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "for",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "goto",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "if",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "implements",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "instanceof",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "interface",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "native",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "new",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "package",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "private",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "protected",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "public",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "short",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "static",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "strictfp",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "super",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "switch",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "synchronized",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "this",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "throw",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "transient",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "try",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "volatile",
      "type" : [ "null", "int" ],
      "default" : null
    }, {
      "name" : "while",
      "type" : [ "null", "int" ],
      "default" : null
    } ]
  } ],
  "messages" : { }
}

Those properties are coming from your RESERVED_WORDS. Only some of them don't compile:

boolean
double
enum
false
float
import
int
long
null
throws
true
void

For example boolean:

Exception in thread "main" org.apache.avro.compiler.idl.ParseException: Encountered " "boolean" "boolean "" at line 45, column 25.
Was expecting one of:
    <IDENTIFIER> ...
    "@" ...
    "`" ...
    
        at org.apache.avro.compiler.idl.Idl.generateParseException(Idl.java:1729)

What's your take on:

  • working ones (return, abstract, assert, break, byte) we change to have backtick instead of $ (return$ -> `return`)
  • remaining (boolean, double, enum, false) stay with $?

@mcenkar
Copy link
Contributor Author

mcenkar commented Feb 15, 2022

@mcenkar
Copy link
Contributor Author

mcenkar commented Feb 15, 2022

Summary:

  • names like boolean, or throws you don't really have to handle, it's not valid avro anyway,
  • `return` which I need works better with ticks, where $ breaks it,
  • do you remember any case where $ is needed? In avro it's for generating valid Java:
public class A extends org.apache.avro.specific.SpecificRecordBase implements org.apache.avro.specific.SpecificRecord {
  private java.lang.CharSequence return$;
  private java.lang.Integer abstract$;
  private java.lang.Integer assert$;
  private java.lang.Integer break$;

But in scala we can always do backticks. So probably best way is to do backticks around scala keywords (different set than avro), and just drop $ mangle?

@julianpeeters
Copy link
Owner

names like boolean, or throws you don't really have to handle, it's not valid avro anyway,
return which I need works better with ticks, where $ breaks it,
do you remember any case where $ is needed?

IIRC, the $ was needed for java avro interop. I wonder what happens at runtime? I suspect org.apache.avro.specific might require the $ at runtime? And it also looks like vulcan is using org.apache.avro.generic under the hood.

My other concern is breaking changes.

So I might end up favoring the addition of some configuration, maybe for Standard format at least.

@mcenkar
Copy link
Contributor Author

mcenkar commented Feb 16, 2022

Hi,

I had a quick look and I think we don't need to do any config for this. It would be breaking change, but it would break a bug that's there right now. I added PR here, right now it's failing.

The only places where I found we may need $ is scavro interop, but that's actually using different mangling strategy: list of keywords, and here calling avro-compiler.

So in this PR instead I switched to mangling by backticks, and doing that only on scala keywords, as we don't need to do that on Java keywords.

I didn't test thorougly scavro part but everything still looks OK, and it fixes my problem.

(edit: Also #85 would be fixed)

Happy to hear your feedback.

Cheers

@julianpeeters
Copy link
Owner

Awesome, thanks very much for the eyes and the fix. Released as avrohugger 1.0.0 and sbt-avrohugger 2.0.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants