Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When importing parquet file and CSV, have option to include partitionBy column in frame #8207

Closed
exalate-issue-sync bot opened this issue May 11, 2023 · 2 comments

Comments

@exalate-issue-sync
Copy link

Currently, when importing PARTITIONED parquet files or CSV in H2O, the partitionBy column is not present in frame.

However, when spark reads parquet, the partition column is included in the new Spark Frame.

Example:

{code:python}#Create Spark Frame from partitioned parquet
df_1 = spark.read.parquet("frame_1.parquet")
df_1.head(){code}

#Spark Frame has 5 columns (including ‘RT’ ← partitioned column)

{quote}Row(SERIALNO=673102, SPORDER=5, PUMA=100, Row_Number=21342, RT='P'){quote}

{code:python}#Create H2O Frame from partitioned parquet
h_frame1 = h2o.import_file("hdfs://mr-0xyz://user/UID/frame_1.parquet"){code}

#h2o frame has 4 columns (RT column is missing):

{quote}SERIALNO SPORDER PUMA Row_Number


84 1 2600 0
154 1 2500 1{quote}

@exalate-issue-sync
Copy link
Author

Neema Mashayekhi commented: Example of new feature:

Python:

{code:python}df = h2o.import_file(path=pyunit_utils.locate("smalldata/partitioned/partitioned_arilines/"), partition_by=["Year", "IsArrDelayed"]){code}

R:

{code:r}df <- h2o.importFile(path = locate("smalldata/partitioned/partitioned_arilines/"), partition_by=c("Year", "IsArrDelayed")){code}

@h2o-ops
Copy link
Collaborator

h2o-ops commented May 14, 2023

JIRA Issue Migration Info

Jira Issue: PUBDEV-7430
Assignee: Pavel Pscheidl
Reporter: Neema Mashayekhi
State: Resolved
Fix Version: 3.30.0.7
Attachments: N/A
Development PRs: Available

Linked PRs from JIRA

#4786

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant