-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Parquet documentation for AvroCompat, @doc annotation #885
Conversation
site/src/main/paradox/parquet.md
Outdated
|
||
However, the parquet-avro API encodes array types differently: as a nested array inside a required group. | ||
|
||
```scala mdoc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after running sbt site/mdoc
this evaluates to:
import org.apache.avro.Schema
val avroSchema = new Schema.Parser().parse("{\"type\":\"record\",\"name\":\"MyRecord\",\"fields\":[{\"name\": \"listField\", \"type\": {\"type\": \"array\", \"items\": \"string\"}}]}")
// avroSchema: Schema = {"type":"record","name":"MyRecord","fields":[{"name":"listField","type":{"type":"array","items":"string"}}]}
import org.apache.parquet.avro.AvroSchemaConverter
new AvroSchemaConverter().convert(avroSchema)
// res4: org.apache.parquet.schema.MessageType = message MyRecord {
// required group listField (LIST) {
// repeated binary array (STRING);
// }
// }
//
site/src/main/paradox/parquet.md
Outdated
writer.close() | ||
|
||
ParquetFileReader.open(HadoopInputFile.fromPath(path, new Configuration())).getFileMetaData |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after running sbt site/mdoc this block evaluates to:
import magnolify.parquet._
import magnolify.parquet.ParquetArray.AvroCompat._
import magnolify.shared._
@doc("Top level annotation")
case class MyRecord(@doc("field annotation") listField: List[Int])
val writer = ParquetType[MyRecord]
.writeBuilder(HadoopOutputFile.fromPath(path, new Configuration()))
.build()
// writer: org.apache.parquet.hadoop.ParquetWriter[MyRecord] = org.apache.parquet.hadoop.ParquetWriter@432302e5
writer.write(MyRecord(List(1,2,3)))
writer.close()
ParquetFileReader.open(HadoopInputFile.fromPath(path, new Configuration())).getFileMetaData
// res12: org.apache.parquet.hadoop.metadata.FileMetaData = FileMetaData{schema: message repl.MdocSession.MdocApp9.MyRecord {
// required group listField (LIST) {
// repeated int32 array (INTEGER(32,true));
// }
// }
// , metadata: {writer.model.name=magnolify, parquet.avro.schema={"type":"record","name":"MyRecord","namespace":"repl.MdocSession.MdocApp9","doc":"Top level annotation","fields":[{"name":"listField","type":{"type":"array","items":"int"},"doc":"field annotation"}]}}}
5fd732c
to
170abc2
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #885 +/- ##
=======================================
Coverage 95.17% 95.17%
=======================================
Files 51 51
Lines 1825 1825
Branches 157 157
=======================================
Hits 1737 1737
Misses 88 88 ☔ View full report in Codecov by Sentry. |
No description provided.