You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
All HadoopFileDataObjects work with standard hadoop partition layout, e.g. /<col1>=x/<col2>=y. This is also how Spark expects it.
Sometimes it is needed to read files from locations with different partition layout, e.g. extracting //abc/.
Using arbitrary partition layouts is currently possible with SFtpFileRefDataObject, but not for reading files from Hadoop filesystems (local files, S3, ...).
I would like to copy files from Hadoop filesystems with arbitrary partition layouts into another Hadoop filesystem location using FileTransferAction, creating a standard hadoop partition layout. Currently this involves a lot of custom coding...
Describe the solution you'd like
Implement a DataObject reading from files from Hadoop filesystems, which reuses logic from SFtpFileRefDataObject to handle arbitrary partition layout.
Note that this DataObject can not be used with Spark execution engine, e.g. CopyAction, but only with file execution engine, e.g. FileTransferAction.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
All HadoopFileDataObjects work with standard hadoop partition layout, e.g.
/<col1>=x/<col2>=y
. This is also how Spark expects it.Sometimes it is needed to read files from locations with different partition layout, e.g. extracting //abc/.
Using arbitrary partition layouts is currently possible with SFtpFileRefDataObject, but not for reading files from Hadoop filesystems (local files, S3, ...).
I would like to copy files from Hadoop filesystems with arbitrary partition layouts into another Hadoop filesystem location using FileTransferAction, creating a standard hadoop partition layout. Currently this involves a lot of custom coding...
Describe the solution you'd like
Implement a DataObject reading from files from Hadoop filesystems, which reuses logic from SFtpFileRefDataObject to handle arbitrary partition layout.
Note that this DataObject can not be used with Spark execution engine, e.g. CopyAction, but only with file execution engine, e.g. FileTransferAction.
The text was updated successfully, but these errors were encountered: