Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow reading multiple files with spark_read_ #2118

Merged
merged 4 commits into from Aug 30, 2019

Conversation

@jozefhajnala
Copy link
Contributor

@jozefhajnala jozefhajnala commented Aug 23, 2019

Some of the org.apache.spark.sql.DataFrameReader methods allow for passing paths: String*, besides path: String. We can use this to allow the spark_read_ suite of functions to accept multiple paths, not just a single path and read multiple files in one call (inspired in part by SO question).

This PR proposes to add this support to:

  • spark_read_parquet()
  • spark_read_json()
  • spark_read_text() - with whole=FALSE only, stops with a meaningful error message if multiple paths are provided with whole=TRUE
  • spark_read_orc()

Unit tests are also added for the above reading functions.

Compatibility notes on DataFrameReader supporting paths: String*:

  • .parquet - since 1.4.0
  • .json - since 2.0.0
  • .text - since 1.6.0
  • .orc - since 2.0.0
R/data_interface.R Show resolved Hide resolved
Loading
R/data_interface.R Outdated Show resolved Hide resolved
Loading
@javierluraschi
Copy link
Collaborator

@javierluraschi javierluraschi commented Aug 23, 2019

@jozefhajnala this is pretty great! Thank you! Left a few comments, if you are low on time let us know and we can iterate over your Pr.

Loading

@kevinykuo
Copy link
Collaborator

@kevinykuo kevinykuo commented Aug 23, 2019

@jozefhajnala do you mind adding tests for reading multiple files with globbing expressions to make sure we don't break those?

Loading

@jozefhajnala
Copy link
Contributor Author

@jozefhajnala jozefhajnala commented Aug 25, 2019

@jozefhajnala do you mind adding tests for reading multiple files with globbing expressions to make sure we don't break those?

Added some in 7b0ccce.

Loading

@javierluraschi
Copy link
Collaborator

@javierluraschi javierluraschi commented Aug 30, 2019

Thanks @jozefhajnala!

Loading

@javierluraschi javierluraschi merged commit 5a419fe into sparklyr:master Aug 30, 2019
1 check passed
Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants