Improve Parquet documentation for AvroCompat, @doc annotation #885

clairemcginty · 2024-01-11T19:37:52Z

No description provided.

clairemcginty · 2024-01-11T19:38:32Z

site/src/main/paradox/parquet.md

+
+However, the parquet-avro API encodes array types differently: as a nested array inside a required group.
+
+```scala mdoc


after running sbt site/mdoc this evaluates to:

import org.apache.avro.Schema val avroSchema = new Schema.Parser().parse("{\"type\":\"record\",\"name\":\"MyRecord\",\"fields\":[{\"name\": \"listField\", \"type\": {\"type\": \"array\", \"items\": \"string\"}}]}") // avroSchema: Schema = {"type":"record","name":"MyRecord","fields":[{"name":"listField","type":{"type":"array","items":"string"}}]} import org.apache.parquet.avro.AvroSchemaConverter new AvroSchemaConverter().convert(avroSchema) // res4: org.apache.parquet.schema.MessageType = message MyRecord { // required group listField (LIST) { // repeated binary array (STRING); // } // } //

clairemcginty · 2024-01-11T19:39:02Z

site/src/main/paradox/parquet.md

+writer.close()
+
+ParquetFileReader.open(HadoopInputFile.fromPath(path, new Configuration())).getFileMetaData


after running sbt site/mdoc this block evaluates to:

import magnolify.parquet._ import magnolify.parquet.ParquetArray.AvroCompat._ import magnolify.shared._ @doc("Top level annotation") case class MyRecord(@doc("field annotation") listField: List[Int]) val writer = ParquetType[MyRecord] .writeBuilder(HadoopOutputFile.fromPath(path, new Configuration())) .build() // writer: org.apache.parquet.hadoop.ParquetWriter[MyRecord] = org.apache.parquet.hadoop.ParquetWriter@432302e5 writer.write(MyRecord(List(1,2,3))) writer.close() ParquetFileReader.open(HadoopInputFile.fromPath(path, new Configuration())).getFileMetaData // res12: org.apache.parquet.hadoop.metadata.FileMetaData = FileMetaData{schema: message repl.MdocSession.MdocApp9.MyRecord { // required group listField (LIST) { // repeated int32 array (INTEGER(32,true)); // } // } // , metadata: {writer.model.name=magnolify, parquet.avro.schema={"type":"record","name":"MyRecord","namespace":"repl.MdocSession.MdocApp9","doc":"Top level annotation","fields":[{"name":"listField","type":{"type":"array","items":"int"},"doc":"field annotation"}]}}}

codecov · 2024-01-12T13:38:15Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (e57d06d) 95.17% compared to head (ac2f81c) 95.17%.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #885   +/-   ##
=======================================
  Coverage   95.17%   95.17%           
=======================================
  Files          51       51           
  Lines        1825     1825           
  Branches      157      157           
=======================================
  Hits         1737     1737           
  Misses         88       88

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

clairemcginty requested review from shnapz and RustedBones January 11, 2024 19:37

clairemcginty commented Jan 11, 2024

View reviewed changes

clairemcginty added 2 commits January 12, 2024 08:24

Improve Parquet documentation for AvroCompat, @doc annotation

de3f4f5

clear up wording

170abc2

clairemcginty force-pushed the avro-compat-doc branch from 5fd732c to 170abc2 Compare January 12, 2024 13:31

clear up wording

4915048

organize page

ac2f81c

RustedBones approved these changes Jan 12, 2024

View reviewed changes

clairemcginty merged commit 8153389 into main Jan 12, 2024
13 checks passed

clairemcginty deleted the avro-compat-doc branch January 12, 2024 14:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Parquet documentation for AvroCompat, @doc annotation #885

Improve Parquet documentation for AvroCompat, @doc annotation #885

clairemcginty commented Jan 11, 2024

clairemcginty Jan 11, 2024

clairemcginty Jan 11, 2024

codecov bot commented Jan 12, 2024 •

edited

Loading


		However, the parquet-avro API encodes array types differently: as a nested array inside a required group.

		```scala mdoc

		writer.close()

		ParquetFileReader.open(HadoopInputFile.fromPath(path, new Configuration())).getFileMetaData

Improve Parquet documentation for AvroCompat, @doc annotation #885

Improve Parquet documentation for AvroCompat, @doc annotation #885

Conversation

clairemcginty commented Jan 11, 2024

clairemcginty Jan 11, 2024

Choose a reason for hiding this comment

clairemcginty Jan 11, 2024

Choose a reason for hiding this comment

codecov bot commented Jan 12, 2024 • edited Loading

Codecov Report

codecov bot commented Jan 12, 2024 •

edited

Loading