[SUPPORT] How to store schema in schema registry for capturing nested arrays and JSONs with column names for Hudi table onboarding from Mongo data source? #253

torvalds-dev-testbot · 2024-08-22T13:24:37Z

Tips before filing an issue

Have you gone through our FAQs?
Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
If you have triaged this as a bug, then file an issue directly.

Describe the problem you faced

Can someone suggest how to store the schema in schema registry to capture all complex nested arrays and JSONs with column names for Hudi table onboarding when the source is a Mongo data source (e.g., when opting for a static schema nature data)?

To Reproduce

Steps to reproduce the behavior:
1.
2.
3.
4.

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

Hudi version :
Spark version :
Hive version :
Hadoop version :
Storage (HDFS/S3/GCS..) :
Running on Docker? (yes/no) :

Additional context

Add any other context about the problem here.

Stacktrace

Add the stacktrace of the error.

torvalds-dev-testbot · 2024-08-22T13:24:40Z

Relevant Sources:

https://hudi.apache.org/docs/faq_storage

https://hudi.apache.org/docs/schema_evolution

https://aws.amazon.com/blogs/big-data/automate-schema-evolution-at-scale-with-apache-hudi-in-aws-glue/

apache#10895

torvalds-dev-testbot · 2024-08-22T13:24:43Z

Hello! 👋 Please wait while I generate the follow up questions; this usually takes about 30 seconds

torvalds-dev-testbot · 2024-08-22T13:24:46Z

Based on the provided documentation snippets, in order to store the schema in a schema registry to capture all complex nested arrays and JSONs with column names for a Hudi table onboarding when the source is a Mongo data source with static schema nature data, you can follow these steps:

Enable schema reconciliation using the config hoodie.datasource.write.reconcile.schema.
Configure the schema evolution scenarios on write to support schema changes such as adding nullable fields or promoting field datatypes.
Use the hoodie utilities to ingest data from Kafka with the Schema Registry and Confluent Kafka set up.
Utilize AWS Glue Data Catalog with Apache Hudi to register tables and interact with data stored in S3.
Make use of Spark to write data to a Hudi table with specific options such as storage type, record key field, table name, partitioning, and more.

By following these steps and configurations, you can effectively store the schema in a schema registry to handle complex nested data structures in Hudi tables when onboarding data from a Mongo source.

Sources:

torvalds-dev-testbot · 2024-08-22T13:24:49Z

Not satisfied with the answer.
Can you re-look into answer generated by you?

torvalds-dev-testbot · 2024-08-22T13:24:52Z

Ticket created successfully. Here is the link to the GitHub issue: #253

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SUPPORT] How to store schema in schema registry for capturing nested arrays and JSONs with column names for Hudi table onboarding from Mongo data source? #253

[SUPPORT] How to store schema in schema registry for capturing nested arrays and JSONs with column names for Hudi table onboarding from Mongo data source? #253

torvalds-dev-testbot bot commented Aug 22, 2024

torvalds-dev-testbot bot commented Aug 22, 2024

torvalds-dev-testbot bot commented Aug 22, 2024

torvalds-dev-testbot bot commented Aug 22, 2024

torvalds-dev-testbot bot commented Aug 22, 2024

torvalds-dev-testbot bot commented Aug 22, 2024

[SUPPORT] How to store schema in schema registry for capturing nested arrays and JSONs with column names for Hudi table onboarding from Mongo data source? #253

[SUPPORT] How to store schema in schema registry for capturing nested arrays and JSONs with column names for Hudi table onboarding from Mongo data source? #253

Comments

torvalds-dev-testbot bot commented Aug 22, 2024

torvalds-dev-testbot bot commented Aug 22, 2024

torvalds-dev-testbot bot commented Aug 22, 2024

torvalds-dev-testbot bot commented Aug 22, 2024

torvalds-dev-testbot bot commented Aug 22, 2024

torvalds-dev-testbot bot commented Aug 22, 2024