[PUBDEV-3321] Allow arbitrary S3 connection end point/region. #164

mmalohlava · 2016-09-01T03:38:52Z

The fix:

exposes system property "sys.ai.h2o.persist.s3.endPoint" which can override default
S3 connection end point. For example, java -Dsys.ai.h2o.persist.s3.endPoint="https://localhost:9000" -jar h2o.jar

This change is

mmalohlava · 2016-09-01T03:39:51Z

The motivation is to connect to API provided by Minio S3 implementation.

badscooter23 · 2016-09-01T23:06:53Z

Should we also expose the region parameter?

mmalohlava · 2016-09-01T23:23:08Z

Yes, i will do that in PR update.

The fix: - exposes system property "sys.ai.h2o.persist.s3.endPoint" which can override default S3 connection end point. For example, `java -Dsys.ai.h2o.persist.s3.endPoint="https://localhost:9000" -jar h2o.jar`

The property "sys.ai.h2o.persist.s3.region" can specify S3 region. The property "sys.ai.h2o.persist.s3.enable.path.style" can force path style acces.

The Minio does not fill bucket name in returned object. It needs to be read from listing of objects.

arnocandel · 2016-10-09T05:58:00Z

can this be merged?

mmalohlava · 2016-10-09T21:45:49Z

I would like it put into release - just need to be approved.

badscooter23 · 2016-10-10T18:37:59Z

This fix is allowing us to specify multiple new configuration parameters which are needed if we are going to us Minio (an open source AWS S3 compatible object store) as an alternative to AWS S3 or S3 bound through HDFS...

The THREE new config parms that we theoretically need to specify are:

endpoint (to point at a minio server instance overriding the fact that today the endpoint is hardcoded to be AWS S3)
enable path style which overrides the default S3 behavior to expose every bucket as a full DNS enabled path (recommended by Minio; technically not a required parm)
region which allows the user to name their minio server instance as a “region” explicitly. S3 has regions (obviously)… minio servers can be assigned a region name… this appears to be mostly “descriptive” (not sure what functional value it provides). By default (through the S3 apis) the region is us-east-1 and minio observes the same default

The fixes Michal put in enable a H2O user to speficify all three config values via the java cmd line using the this syntax

-Dsys.ai.h2o.persist.s3.endPoint="play.minio.io:9000"
-Dsys.ai.h2o.persist.s3.enable.path.style=true
-Dsys.ai.h2o.persist.s3.region="us-east-1"

In addition to the three parms above, a user would need to specify the accessKeyId and secretKey for S3 the “normal” way

-Daws.accessKeyId="Q3AM3UQ867SPQQA43P2F"
-Daws.secretKey="zuf+tfteSlswRu7BJ86wekitnifILbZam1KYY3TG"

The AWS accessKeyId and seecretKey config parms are not new, they (obviously) pre-existed for AWS users

Only endpoint, enable path style, and region are new.

I (scott) have done some (a decent amount, but not exhaustive) testing

To support the testing I created a script which will start and H2O cluster (on one node only, standalone - it does not handle clustering or alternative backends e.g. Hadoop)

The cluster will default all the parms to work with the publicly available play.minio.io cluster

There is even a bucket (currently called scott) where I uploaded the Airlines data set (just for some simple testing). You can browse the public minio environment by just clicking play.minio.io:9000

NOTES:

region appears to be bugged today (ie: if you specify -Dsys.ai.h2o.persist.s3.region="us-east-1" you will get an error); the root cause appears to be that specifying region overrides the endpoint (it appears the “order matters).

-in general the current scheme has limitations… you can specify only ONE S3 endpoint… so a user/customer could either read data from AWS S3 or Minio S3 (they could not read data from both); to address this we need to decide on a syntax and implement additional changes…

* [PUBDEV-3321] Allow arbitrary S3 connection end point. The fix: - exposes system property "sys.ai.h2o.persist.s3.endPoint" which can override default S3 connection end point. For example, `java -Dsys.ai.h2o.persist.s3.endPoint="https://localhost:9000" -jar h2o.jar` * Allow to specify region and path access style. The property "sys.ai.h2o.persist.s3.region" can specify S3 region. The property "sys.ai.h2o.persist.s3.enable.path.style" can force path style acces. * Fix for S3 Minio support The Minio does not fill bucket name in returned object. It needs to be read from listing of objects.

mmalohlava added the please review label Sep 1, 2016

[PUBDEV-3321] Allow arbitrary S3 connection end point.

6c0fdd7

The fix: - exposes system property "sys.ai.h2o.persist.s3.endPoint" which can override default S3 connection end point. For example, `java -Dsys.ai.h2o.persist.s3.endPoint="https://localhost:9000" -jar h2o.jar`

mmalohlava changed the title ~~[PUBDEV-3321] Allow arbitrary S3 connection end point.~~ [PUBDEV-3321] Allow arbitrary S3 connection end point/region. Sep 23, 2016

Allow to specify region and path access style.

add67c5

The property "sys.ai.h2o.persist.s3.region" can specify S3 region. The property "sys.ai.h2o.persist.s3.enable.path.style" can force path style acces.

mmalohlava force-pushed the MM_pubdev_3321_s3_endpoint branch from 38a242d to add67c5 Compare September 23, 2016 22:33

Fix for S3 Minio support

50d52b5

The Minio does not fill bucket name in returned object. It needs to be read from listing of objects.

arnocandel merged commit 6903883 into master Oct 10, 2016

arnocandel deleted the MM_pubdev_3321_s3_endpoint branch October 10, 2016 17:58

h2o-ops mentioned this pull request May 15, 2023

Allows to specify an arbitrary S3 endpoint/region #10236

Open

hutch3232 mentioned this pull request Apr 25, 2024

h2o.set_s3_credentials should support endpoint and region #16178

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PUBDEV-3321] Allow arbitrary S3 connection end point/region. #164

[PUBDEV-3321] Allow arbitrary S3 connection end point/region. #164

mmalohlava commented Sep 1, 2016 •

edited by abal5

mmalohlava commented Sep 1, 2016

badscooter23 commented Sep 1, 2016

mmalohlava commented Sep 1, 2016

arnocandel commented Oct 9, 2016

mmalohlava commented Oct 9, 2016

badscooter23 commented Oct 10, 2016

[PUBDEV-3321] Allow arbitrary S3 connection end point/region. #164

[PUBDEV-3321] Allow arbitrary S3 connection end point/region. #164

Conversation

mmalohlava commented Sep 1, 2016 • edited by abal5

mmalohlava commented Sep 1, 2016

badscooter23 commented Sep 1, 2016

mmalohlava commented Sep 1, 2016

arnocandel commented Oct 9, 2016

mmalohlava commented Oct 9, 2016

badscooter23 commented Oct 10, 2016

mmalohlava commented Sep 1, 2016 •

edited by abal5