Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PUBDEV-3321] Allow arbitrary S3 connection end point/region. #164

Merged
merged 3 commits into from Oct 10, 2016

Conversation

mmalohlava
Copy link
Member

@mmalohlava mmalohlava commented Sep 1, 2016

The fix:

  • exposes system property "sys.ai.h2o.persist.s3.endPoint" which can override default
    S3 connection end point. For example, java -Dsys.ai.h2o.persist.s3.endPoint="https://localhost:9000" -jar h2o.jar

This change is Reviewable

@mmalohlava
Copy link
Member Author

The motivation is to connect to API provided by Minio S3 implementation.

@badscooter23
Copy link

Should we also expose the region parameter?

@mmalohlava
Copy link
Member Author

Yes, i will do that in PR update.

The fix:
  - exposes system property "sys.ai.h2o.persist.s3.endPoint" which can override default
  S3 connection end point. For example, `java -Dsys.ai.h2o.persist.s3.endPoint="https://localhost:9000" -jar h2o.jar`
@mmalohlava mmalohlava changed the title [PUBDEV-3321] Allow arbitrary S3 connection end point. [PUBDEV-3321] Allow arbitrary S3 connection end point/region. Sep 23, 2016
The property "sys.ai.h2o.persist.s3.region" can specify S3 region.
The property "sys.ai.h2o.persist.s3.enable.path.style" can force path style acces.
The Minio does not fill bucket name in returned object.
It needs to be read from listing of objects.
@arnocandel
Copy link
Member

can this be merged?

@mmalohlava
Copy link
Member Author

I would like it put into release - just need to be approved.

@arnocandel arnocandel merged commit 6903883 into master Oct 10, 2016
@arnocandel arnocandel deleted the MM_pubdev_3321_s3_endpoint branch October 10, 2016 17:58
@badscooter23
Copy link

This fix is allowing us to specify multiple new configuration parameters which are needed if we are going to us Minio (an open source AWS S3 compatible object store) as an alternative to AWS S3 or S3 bound through HDFS...

The THREE new config parms that we theoretically need to specify are:

  • endpoint (to point at a minio server instance overriding the fact that today the endpoint is hardcoded to be AWS S3)
  • enable path style which overrides the default S3 behavior to expose every bucket as a full DNS enabled path (recommended by Minio; technically not a required parm)
  • region which allows the user to name their minio server instance as a “region” explicitly. S3 has regions (obviously)… minio servers can be assigned a region name… this appears to be mostly “descriptive” (not sure what functional value it provides). By default (through the S3 apis) the region is us-east-1 and minio observes the same default

The fixes Michal put in enable a H2O user to speficify all three config values via the java cmd line using the this syntax

-Dsys.ai.h2o.persist.s3.endPoint="play.minio.io:9000"
-Dsys.ai.h2o.persist.s3.enable.path.style=true
-Dsys.ai.h2o.persist.s3.region="us-east-1"

In addition to the three parms above, a user would need to specify the accessKeyId and secretKey for S3 the “normal” way

-Daws.accessKeyId="Q3AM3UQ867SPQQA43P2F"
-Daws.secretKey="zuf+tfteSlswRu7BJ86wekitnifILbZam1KYY3TG"

The AWS accessKeyId and seecretKey config parms are not new, they (obviously) pre-existed for AWS users

Only endpoint, enable path style, and region are new.

I (scott) have done some (a decent amount, but not exhaustive) testing

To support the testing I created a script which will start and H2O cluster (on one node only, standalone - it does not handle clustering or alternative backends e.g. Hadoop)

The cluster will default all the parms to work with the publicly available play.minio.io cluster

There is even a bucket (currently called scott) where I uploaded the Airlines data set (just for some simple testing). You can browse the public minio environment by just clicking play.minio.io:9000

NOTES:

  • region appears to be bugged today (ie: if you specify -Dsys.ai.h2o.persist.s3.region="us-east-1" you will get an error); the root cause appears to be that specifying region overrides the endpoint (it appears the “order matters).

-in general the current scheme has limitations… you can specify only ONE S3 endpoint… so a user/customer could either read data from AWS S3 or Minio S3 (they could not read data from both); to address this we need to decide on a syntax and implement additional changes…

vpatryshev pushed a commit that referenced this pull request Oct 11, 2016
* [PUBDEV-3321] Allow arbitrary S3 connection end point.

The fix:
  - exposes system property "sys.ai.h2o.persist.s3.endPoint" which can override default
  S3 connection end point. For example, `java -Dsys.ai.h2o.persist.s3.endPoint="https://localhost:9000" -jar h2o.jar`

* Allow to specify region and path access style.

The property "sys.ai.h2o.persist.s3.region" can specify S3 region.
The property "sys.ai.h2o.persist.s3.enable.path.style" can force path style acces.

* Fix for S3 Minio support

The Minio does not fill bucket name in returned object.
It needs to be read from listing of objects.
vpatryshev pushed a commit that referenced this pull request Oct 11, 2016
* [PUBDEV-3321] Allow arbitrary S3 connection end point.

The fix:
  - exposes system property "sys.ai.h2o.persist.s3.endPoint" which can override default
  S3 connection end point. For example, `java -Dsys.ai.h2o.persist.s3.endPoint="https://localhost:9000" -jar h2o.jar`

* Allow to specify region and path access style.

The property "sys.ai.h2o.persist.s3.region" can specify S3 region.
The property "sys.ai.h2o.persist.s3.enable.path.style" can force path style acces.

* Fix for S3 Minio support

The Minio does not fill bucket name in returned object.
It needs to be read from listing of objects.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants