Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5820 +/- ##
==========================================
+ Coverage 61.49% 61.59% +0.09%
==========================================
Files 314 315 +1
Lines 11437 11469 +32
Branches 830 828 -2
==========================================
+ Hits 7033 7064 +31
- Misses 4404 4405 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
TODO is to add some better syntax for this from String -> metadata and/or glob -> resourceid -> metadata |
| } | ||
| } | ||
|
|
||
| class ParquetStringSCollectionSyntax(self: SCollection[String]) { |
There was a problem hiding this comment.
strong opinion loosely held:
IMO, this function is niche enough that instead of adding an extra SCollection[String] helper, we should just add a function to [FileSCollectionFuctions] like readFiles that just transforms SCollection[String] -> SCollection[ReadableFile]. Then the user can just do
sc
.parallelize(paths)
.readFiles
.parquetMetadata(approving anyway because I don't feel super strongly about this.)
There was a problem hiding this comment.
I guess there is already a ReadableFIle => A version of readFiles in which case
sc
.parallelize(paths)
.readFilesParquetMetadata() // or readParquetMeatadata, or even better than either of these parquetMetadata
Where readFilesParquetMetadata still needs to be in the parquet project, with an implicit/syntax, and we need to separately provide a ReadableFile => ParquetMetadata function
Reads parquet metadata, when possible, from the file footer.