Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SUPPORT] Need help in understanding Apache Hudi documentation and CLI usage #219

Open
torvalds-dev-testbot bot opened this issue Jul 11, 2024 · 7 comments

Comments

@torvalds-dev-testbot
Copy link

Tips before filing an issue

Describe the problem you faced
I need help in understanding the Apache Hudi documentation and CLI usage.

To Reproduce
Steps to reproduce the behavior:
1.
2.
3.
4.

Expected behavior
A clear and concise description of what you expected to happen.

Environment Description

  • Hudi version :
  • Spark version :
  • Hive version :
  • Hadoop version :
  • Storage (HDFS/S3/GCS..) :
  • Running on Docker? (yes/no) :

Additional context
Add any other context about the problem here.

Stacktrace
Add the stacktrace of the error.

Copy link
Author

Hello! 👋 Please wait while I search the knowledge base for an answer; this usually takes about 30 seconds

Copy link
Author

To install Hudi, you have a few options based on your environment:

  1. For local setup:

    • Build Hudi
    • Launch the Hudi CLI with "./hudi-cli.sh"
  2. For EMR:

    • Hudi comes pre-installed on EMR versions like emr-5.30.1
    • Pass necessary jars and configurations in spark-submit command
  3. For Docker environment:

    • Use Maven to build Docker images and bring up Hadoop, Hive, and Spark services with Hudi support
  4. For Spark:

    • Use PySpark or Scala Spark Shell with the appropriate Hudi packages to start using Hudi
  5. For AWS Glue:

    • Enable Hudi by setting --datalake-formats parameter and Spark configurations
    • AWS Glue 4.0 supports Lake Formation permissions for Hudi

Remember to follow the specific setup instructions based on your environment and Hudi version.

Sources:

  1. https://hudi.apache.org/docs/cli
  2. https://medium.com/@ritik20023/quickly-start-using-apache-hudi-on-aws-emr-de432c01e488
  3. https://hudi.apache.org/docs/docker_demo
  4. https://datacouch.medium.com/getting-started-with-apache-hudi-711b89c107aa
  5. https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-hudi.html >torvalds.dev is learning and improving. React below to provide feedback! 👍 0 button 👎 0 button

Copy link
Author

Ticket created successfully. Here is the link to the GitHub issue: Sheth-G#10

1 similar comment
Copy link
Author

Ticket created successfully. Here is the link to the GitHub issue: Sheth-G#10

Copy link
Author

Ticket created successfully. Here is the link to the GitHub issue: #219

1 similar comment
Copy link
Author

Ticket created successfully. Here is the link to the GitHub issue: #219

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

0 participants