Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS IRSA S3 Support #2373

Merged

Conversation

matty-rose
Copy link
Contributor

@matty-rose matty-rose commented Aug 6, 2022

What this PR does / why we need it:
At the moment to use AWS IRSA functionality for storage initializer, dummy kubernetes secrets need to be created to get kserve to correctly parse s3 annotations and configure storage init container correctly with environment variables (see #2003 (comment))

This PR allows configuration of S3 storage init three ways now:

  1. Configure S3 options in global inferenceservice configmap so multiple k8s service accounts/IRSA can be configured without needing to repeat annotations
  2. Configure S3 options on individual k8s service accounts/IRSA (which will be used instead of global options where specified)
  3. Configure S3 options on k8s secret attached to k8s service account to preserve static credential functionality (AWS access key + secret key)

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #2113

Type of changes
Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Feature/Issue validation/testing:

Please describe the tests that you ran to verify your changes and relevent result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

For both tests I followed the developer guide, using istioctl 1.14.3 to deploy Istio, deploying Knative serving 1.6, and then running the make deploy-dev command on this branch.

  • Test A: Deploy S3 Example using only service account (no secret) and with all possible s3 annotations configured
apiVersion: v1
kind: ServiceAccount
metadata:
  name: s3
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/s3access
    serving.kserve.io/s3-endpoint: s3.amazonaws.com # replace with your s3 endpoint e.g minio-service.kubeflow:9000
    serving.kserve.io/s3-usehttps: "1" # by default 1, if testing with minio you can set to 0
    serving.kserve.io/s3-region: "us-east-2"
    serving.kserve.io/s3-useanoncredential: "false" # omitting this is the same as false, if true will ignore provided credential and use anonymous credentials
    serving.kserve.io/s3-verifyssl: "0"
    serving.kserve.io/s3-usevirtualbucket: "false"
    serving.kserve.io/s3-cabundle: "test"
---
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "mnist-s3"
spec:
  predictor:
    serviceAccountName: s3
    model:
      modelFormat:
        name: tensorflow
      storageUri: "s3://kserve-examples/mnist"

Describing pod with kubectl describe pod -n default -l serving.kserve.io/inferenceservice=mnist-s3 gives the following configuration on storage-init container
image

  • Test B: Deploy S3 Example using only service account (no secret) configuring all values inside inferenceservice configmap
  credentials: |-
    {
       "gcs": {
           "gcsCredentialFileName": "gcloud-application-credentials.json"
       },
       "s3": {
           "s3AccessKeyIDName": "AWS_ACCESS_KEY_ID",
           "s3SecretAccessKeyName": "AWS_SECRET_ACCESS_KEY",
           "s3Endpoint": "s3.ap-southeast-2.amazonaws.com",
           "s3UseHttps": "0",
           "s3Region": "ap-southeast-2",
           "s3VerifySSL": "0",
           "s3UseVirtualBucket": "true",
           "s3UseAnonymousCredential": "true",
           "s3CABundle": "test"
       }
    }
apiVersion: v1
kind: ServiceAccount
metadata:
  name: s3-configmap
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/s3access
---
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "mnist-s3-configmap"
spec:
  predictor:
    serviceAccountName: s3-configmap
    model:
      modelFormat:
        name: tensorflow
      storageUri: "s3://kserve-examples/mnist"

Describing pod with kubectl describe pod -n default -l serving.kserve.io/inferenceservice=mnist-s3-configmap gives following configuration on storage init container
image

Special notes for your reviewer:

  1. Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

Checklist:

  • Have you added unit/e2e tests that prove your fix is effective or that this feature works?
  • Has code been commented, particularly in hard-to-understand areas?
  • Have you made corresponding changes to the documentation?: I will make a separate PR to the kserve website repo updating documentation

Release note:

Adds AWS S3 configuration + IRSA support to inferenceservice configmap and k8s svcaccount

@yuzisun
Copy link
Member

yuzisun commented Aug 7, 2022

Thanks @matty-rose !! This is great!

/cc @surajkota

@kserve-oss-bot
Copy link
Collaborator

@yuzisun: GitHub didn't allow me to request PR reviews from the following users: surajkota.

Note that only kserve members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

Thanks @matty-rose !! This is great!

/cc @surajkota

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@surajkota
Copy link

Thanks for the PR, I will support reviewing this

@matty-rose
Copy link
Contributor Author

Hi @surajkota @yuzisun I think this is ready for review now

"s3SecretAccessKeyName": "AWS_SECRET_ACCESS_KEY",
"s3Endpoint": "s3.amazonaws.com",
"s3UseHttps": "1",
"s3Region": "",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was unsure about having empty values here since I couldn't see any other place in the configmap where this was done, but it makes sense to me to have them here so that users know all the possible configuration options without needing to look through the source code

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, maybe add a comment with the AWS/S3 reference you had in the other place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have added the quick links above the credentials entry in configmap

func BuildS3EnvVars(annotations map[string]string, s3Config *S3Config) []v1.EnvVar {
envs := []v1.EnvVar{}

if s3Endpoint, ok := annotations[InferenceServiceS3SecretEndpointAnnotation]; ok {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this s3 endpoint if/else logic block is unchanged, just moved files from s3_secret.go to here since I use it in both s3_secret.go and s3_service_account.go

pkg/credentials/s3/s3_service_account.go Outdated Show resolved Hide resolved
/*
For a quick reference about AWS ENV variables:
AWS Cli: https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html
Boto: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html#using-environment-variables
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the reference!

pkg/credentials/s3/s3_service_account_test.go Outdated Show resolved Hide resolved
pkg/credentials/s3/utils.go Show resolved Hide resolved
pkg/credentials/s3/utils_test.go Show resolved Hide resolved
@yuzisun
Copy link
Member

yuzisun commented Aug 13, 2022

Thanks @matty-rose for the awesome work with detailed descriptions!! The PR looks great!

Would appreciate if you can help update the website doc for the IAM Role support here

@matty-rose
Copy link
Contributor Author

@yuzisun have opened a website PR here: kserve/website#178

@yuzisun
Copy link
Member

yuzisun commented Aug 19, 2022

@surajkota Do you have comments here?

@surajkota
Copy link

surajkota commented Aug 19, 2022

Apologies, this slipped my list because of a customer request. Will take a look tomorrow, I hope that is okay

Copy link

@surajkota surajkota left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patience, great work @matty-rose. I have left a few questions on the PR. And I have a one more:

What is the difference between the S3 download code in the following locations:

  1. https://github.com/kserve/kserve/blob/master/python/kserve/kserve/storage.py#L147 -this is used by the storage initializer
  2. https://github.com/kserve/kserve/blob/master/pkg/agent/storage/utils.go#L124-L153 - when does this code get executed?

@@ -143,14 +143,24 @@ data:
"cpuLimit": "1",
"storageSpecSecretName": "storage-config"
}
# For a quick reference about AWS ENV variables:
# AWS Cli: https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html
# Boto: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html#using-environment-variables
credentials: |-
{
"gcs": {
"gcsCredentialFileName": "gcloud-application-credentials.json"
},
"s3": {
"s3AccessKeyIDName": "AWS_ACCESS_KEY_ID",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few questions:

  1. there are 2 variables used to construct the service endpoint URL: s3Endpoint and s3UseHttps. Depending on the value of useHttps the final AWS_ENDPOINT_URL is set. @yuzisun Is there a particular reason to do it this way? Just for my information
  2. how is s3VerifySSL used?
  3. In Test B: Deploy S3 Example using only service account (no secret) configuring all values inside inferenceservice configmap, does the user still need to keep "s3AccessKeyIDName": "AWS_ACCESS_KEY_ID" and "s3SecretAccessKeyName": "AWS_SECRET_ACCESS_KEY", in configmap? It can be a little confusing

Copy link
Member

@yuzisun yuzisun Aug 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for 1) I think this is mostly for backwards compatibility and integration with Kubeflow environment, for example tfserving was using S3_ENDPOINT, I am not sure if this is still the case currently.
for 2) Same as other setting, when s3VerifySSL is set to false it then does not verify the server side certificates which is mostly for testing purpose.
for 3) I do not think these are needed if we are using service account without secret, @matty-rose maybe we can add a comment in the configmap to state which of the settings are only needed for static credentials?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep for 3. those fields are not required in the configmap if not using static credentials, have added a comment to make this clearer

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. One change I would recommend based on storage-initializer: Let boto3 decide the endpoint for S3 #2377 is to remove the default value for all the fields except access and secret key name, especially s3Endpoint. This PR lgtm once this change is made.
  2. Thanks for adding the comment in this file. I think this should go to the website PR as well and also add information about default values used by each or atleast some of these like https, endpoint, region in website. I will try to review that one today.
  3. I could not find any reference for S3_VERIFY_SSL variable in boto3 or aws go sdk docs. So, I am not sure how its being used. Dont want to block the PR for this though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made the change removing the endpoint/use https default values and will try find some time this week to make further updates to the website PR to describe the use of these better. Thanks for your review @surajkota

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@surajkota S3_VERIFY_SSL/S3_ENDPOINT/S3_USE_HTTPS were originally inherited from tensorflow/tfserving(see https://docs.w3cub.com/tensorflow~guide/deploy/s3). I think we may no longer need these as we always use storage initializer to download models to a local volume and tfserving runtime never directly use these environment variables to get models from s3 in kserve. I created an issue to track this to clean up these variables.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, got it now

Copy link

@surajkota surajkota Aug 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved the PR, maybe we should remove s3VerifySSL from this configmap since it will not be used or not talk about it on the website for now

@yuzisun
Copy link
Member

yuzisun commented Aug 20, 2022

Thanks for the patience, great work @matty-rose. I have left a few questions on the PR. And I have a one more:

What is the difference between the S3 download code in the following locations:

  1. https://github.com/kserve/kserve/blob/master/python/kserve/kserve/storage.py#L147 -this is used by the storage initializer
  2. https://github.com/kserve/kserve/blob/master/pkg/agent/storage/utils.go#L124-L153 - when does this code get executed?

For 1) It is used for single model serving storage initializer where we inject the init container to download models before starting the model server.
For 2) this is the multi model implementation where you have a long running model agent injected as a sidecar to dynamically update models without tearing down pods.

For historical reasons they are implemented differently, we are planning to merge these implementation with ModelMesh model puller.

@matty-rose matty-rose requested review from surajkota and removed request for adriangonz, yuzisun and Iamlovingit August 22, 2022 06:40
Copy link

@surajkota surajkota left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the responses, One change request, after that this PR lgtm

@@ -143,14 +143,24 @@ data:
"cpuLimit": "1",
"storageSpecSecretName": "storage-config"
}
# For a quick reference about AWS ENV variables:
# AWS Cli: https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html
# Boto: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html#using-environment-variables
credentials: |-
{
"gcs": {
"gcsCredentialFileName": "gcloud-application-credentials.json"
},
"s3": {
"s3AccessKeyIDName": "AWS_ACCESS_KEY_ID",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. One change I would recommend based on storage-initializer: Let boto3 decide the endpoint for S3 #2377 is to remove the default value for all the fields except access and secret key name, especially s3Endpoint. This PR lgtm once this change is made.
  2. Thanks for adding the comment in this file. I think this should go to the website PR as well and also add information about default values used by each or atleast some of these like https, endpoint, region in website. I will try to review that one today.
  3. I could not find any reference for S3_VERIFY_SSL variable in boto3 or aws go sdk docs. So, I am not sure how its being used. Dont want to block the PR for this though

Adds S3 configuration options through the following
- `inferenceservice` configmap (global options)
- k8s service account annotations for AWS IRSA
- k8s secret annotations attached to k8s service account for static
  credentials

Signed-off-by: Matthew Rose <matthew.rose@maxkelsen.com>
Copy link

@surajkota surajkota left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work and explanation! Once we take it to the website, I think it would make users happy

@matty-rose matty-rose requested review from yuzisun and removed request for surajkota August 25, 2022 00:02
@yuzisun
Copy link
Member

yuzisun commented Aug 25, 2022

Thanks @matty-rose for the great work and @surajkota for the detailed review!

/lgtm
/approve

@kserve-oss-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: matty-rose, surajkota, yuzisun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kserve-oss-bot kserve-oss-bot merged commit 3994c63 into kserve:master Aug 25, 2022
alexagriffith pushed a commit to alexagriffith/kserve that referenced this pull request Sep 19, 2022
Adds S3 configuration options through the following
- `inferenceservice` configmap (global options)
- k8s service account annotations for AWS IRSA
- k8s secret annotations attached to k8s service account for static
  credentials

Signed-off-by: Matthew Rose <matthew.rose@maxkelsen.com>

Signed-off-by: Matthew Rose <matthew.rose@maxkelsen.com>
Signed-off-by: alexagriffith <agriffith96@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

IAM Role for Service Account(IRSA) support for models stored in S3
4 participants