New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PUBDEV-3321] Allow arbitrary S3 connection end point/region. #164
Conversation
The motivation is to connect to API provided by Minio S3 implementation. |
Should we also expose the region parameter? |
Yes, i will do that in PR update. |
The fix: - exposes system property "sys.ai.h2o.persist.s3.endPoint" which can override default S3 connection end point. For example, `java -Dsys.ai.h2o.persist.s3.endPoint="https://localhost:9000" -jar h2o.jar`
The property "sys.ai.h2o.persist.s3.region" can specify S3 region. The property "sys.ai.h2o.persist.s3.enable.path.style" can force path style acces.
38a242d
to
add67c5
Compare
The Minio does not fill bucket name in returned object. It needs to be read from listing of objects.
can this be merged? |
I would like it put into release - just need to be approved. |
This fix is allowing us to specify multiple new configuration parameters which are needed if we are going to us Minio (an open source AWS S3 compatible object store) as an alternative to AWS S3 or S3 bound through HDFS... The THREE new config parms that we theoretically need to specify are:
The fixes Michal put in enable a H2O user to speficify all three config values via the java cmd line using the this syntax
In addition to the three parms above, a user would need to specify the accessKeyId and secretKey for S3 the “normal” way
The AWS accessKeyId and seecretKey config parms are not new, they (obviously) pre-existed for AWS users Only endpoint, enable path style, and region are new. I (scott) have done some (a decent amount, but not exhaustive) testing To support the testing I created a script which will start and H2O cluster (on one node only, standalone - it does not handle clustering or alternative backends e.g. Hadoop) The cluster will default all the parms to work with the publicly available play.minio.io cluster There is even a bucket (currently called scott) where I uploaded the Airlines data set (just for some simple testing). You can browse the public minio environment by just clicking play.minio.io:9000 NOTES:
-in general the current scheme has limitations… you can specify only ONE S3 endpoint… so a user/customer could either read data from AWS S3 or Minio S3 (they could not read data from both); to address this we need to decide on a syntax and implement additional changes… |
* [PUBDEV-3321] Allow arbitrary S3 connection end point. The fix: - exposes system property "sys.ai.h2o.persist.s3.endPoint" which can override default S3 connection end point. For example, `java -Dsys.ai.h2o.persist.s3.endPoint="https://localhost:9000" -jar h2o.jar` * Allow to specify region and path access style. The property "sys.ai.h2o.persist.s3.region" can specify S3 region. The property "sys.ai.h2o.persist.s3.enable.path.style" can force path style acces. * Fix for S3 Minio support The Minio does not fill bucket name in returned object. It needs to be read from listing of objects.
* [PUBDEV-3321] Allow arbitrary S3 connection end point. The fix: - exposes system property "sys.ai.h2o.persist.s3.endPoint" which can override default S3 connection end point. For example, `java -Dsys.ai.h2o.persist.s3.endPoint="https://localhost:9000" -jar h2o.jar` * Allow to specify region and path access style. The property "sys.ai.h2o.persist.s3.region" can specify S3 region. The property "sys.ai.h2o.persist.s3.enable.path.style" can force path style acces. * Fix for S3 Minio support The Minio does not fill bucket name in returned object. It needs to be read from listing of objects.
The fix:
S3 connection end point. For example,
java -Dsys.ai.h2o.persist.s3.endPoint="https://localhost:9000" -jar h2o.jar
This change is