You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to use the parquet-fs2 function fromParquet to read a file from S3.
I have set the below Hadoop configuration to read it from S3 (I am running localstack hence the s3 url is localhost:4566) :
val hadoopConf: Configuration = new Configuration()
hadoopConf.set("fs.s3a.path.style.access", "true")
hadoopConf.set("fs.s3a.endpoint.region", "eu-central-1")
hadoopConf.set("fs.s3a.connection.ssl.enabled", "false")
hadoopConf.set("fs.s3a.endpoint", "http://localhost:4566")
val readerStream =
fromParquet[F]
.as[MyCustomCaseClass]
.options(ParquetReader.Options(hadoopConf = hadoopConf))
.read(Path(directoryName))
where directoryName is the name of directory in S3. (In my case the directoryName = local_directory)
However, upon running this, I get the FileNotFound exception on this line.
As a result, the app bombs out with this error:
java.lang.IllegalArgumentException: Inconsistent partitioning.
[error] Parquet files must live in leaf directories.
[error] Every files must contain the same numbers of partitions.
[error] Partition directories at the same level must have the same names.
[error] Check following directories:
[error] local_directory
Can you please guide me on this as to why the FileNotFound exception is happening even though the file is present in the local S3?
Thanks
The text was updated successfully, but these errors were encountered:
I am now getting the following exception on actual AWS account. I'm using the IAM assumed role to connect from Hadoop to AWS.
Failed to initialize fileystem s3a://my-bucket/my-file.parquet: java.nio.file.AccessDeniedException: : org.apache.hadoop.fs.s3a.auth.NoAwsCredentialsException: SimpleAWSCredentialsProvider: No AWS credentials in the Hadoop configuration
@jeet23 Your questions are related to Hadoop AWS, not Parquet4S (which is just based on Hadoop indirectly). Please seek for a support in Hadoop community or docs.
I am trying to use the parquet-fs2 function
fromParquet
to read a file from S3.I have set the below Hadoop configuration to read it from S3 (I am running localstack hence the s3 url is
localhost:4566
) :where
directoryName
is the name of directory in S3. (In my case the directoryName =local_directory
)However, upon running this, I get the FileNotFound exception on
this
line.As a result, the app bombs out with this error:
Can you please guide me on this as to why the FileNotFound exception is happening even though the file is present in the local S3?
Thanks
The text was updated successfully, but these errors were encountered: