Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] is there a clever way to register writers for lots of case classes #299

Closed
normana400 opened this issue May 4, 2023 · 1 comment

Comments

@normana400
Copy link

normana400 commented May 4, 2023

It seems like the writerOf functionality requires a hard typing for every writer to the case class it writes.

Let's say I have 50+ different case classes I want to have parquet writers for. It seems heavy to manually have to code each case class with a parquet writer and then update that logic with every new case class that gets developed later

example

case class Alpha
case class Beta
case class Gamma
case class Delta
... every case class in the alphabet 
case class Omega


def writerOf[T<: Product](data: T): ParquetWriter[T]={
   data match{
     case cc: Alpha => ParquetWriter.of[Alpha]
     case cc: Beta => ParquetWriter.of[Beta]
     case cc: Gamma => ParquetWriter.of[Gamma]
     case cc: Delta => ParquetWriter.of[Delta]
... every case class A-O
     case cc: Omega  => ParquetWriter.of[Omega]
     case _: throw new RuntimeException("sorry no writer for you!") 
    }
}

is there way to obtain a parquet writer for a case class without the manual stitching of the above code to register each concrete case class that needs to have a writer?

@mjakubowski84
Copy link
Owner

Each time you write a Parquet file you need to provide a schema for this file. Now, it really depends on what you want to do. If you want to write each case class to a separate normalized file/directory then, sorry, you have to provide a dedicated schema for each case class. If that is not the case, and you want to dump all data into a single file/directory then you can have a single generic schema.

If maintaining a huge class hierarchy is a problem for you then you can have a look into generic records.

Please read the documentation: https://mjakubowski84.github.io/parquet4s/docs/records_and_schema/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants