-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(backend): add namespace & prefix scoped credentials to kfp-launcher config for object store paths #10625
Conversation
Because any of the configs might need to be overwritten for a specific bucket+prefix combination (including endpoint, region, and credentials) it might make more sense to use a structure like: defaultPipelineRoot: "s3://my-s3-bucket-1/pipelines"
providers:
## ==========================
## configs for `s3://` paths
## ==========================
s3:
## default configs, if not overridden
default:
endpoint: "s3.amazonaws.com"
region: "us-west-2"
disableSSL: false
credentials:
fromEnv: false
secretRef:
secretName: "my-s3-secret-1"
accessKeyKey: "AWS_ACCESS_KEY_ID"
secretKeyKey: "AWS_SECRET_ACCESS_KEY"
## configs for specific bucket + prefix combinations
overrides:
## configs for `s3://my-s3-bucket-2/SOME/KEY/PREFIX`
- bucketName: "my-s3-bucket-2"
keyPrefix: "SOME/KEY/PREFIX"
#endpoint: "s3.amazonaws.com"
#region: "us-west-1"
#disableSSL: false
credentials:
#fromEnv: false
secretRef:
secretName: "my-s3-secret-2"
accessKeyKey: "AWS_ACCESS_KEY_ID"
secretKeyKey: "AWS_SECRET_ACCESS_KEY"
## ==========================
## configs for `minio://` paths
## ==========================
minio:
...
(same structure as `s3`)
...
## ==========================
## for `gcs://` paths
## ==========================
gcs:
## default configs, if not overridden
default:
credentials:
fromEnv: false
secretRef:
secretName: "my-gcs-secret-1"
tokenKey: "service_account.json"
## configs for specific bucket + prefix combinations
overrides:
## configs for `gcs://my-gcp-bucket-1/SOME/KEY/PREFIX`
- bucketName: "my-gcp-bucket-1"
keyPrefix: "SOME/KEY/PREFIX"
credentials:
#fromEnv: false
secretRef:
secretName: "my-gcs-secret-2"
tokenKey: "service_account.json" |
Also, I haven't thoroughly tested, but my initial thoughts are:
|
thanks @thesuperzapper for the feedback but to quickly address some of your points:
yeah this is one of the considerations listed above, I support this idea, I'll make the amendments
In both instances being pointed out, if there are no However, it seems like there is definitely no support for IRSA with the new extension to the config. Your suggestion
Similar to the above, I think the default behavior should work. But the config option is clearly not configured with the right fields (I'll have to my research on gcs here, admittedly I'm not as familiar in this). In the community call @zijianjoy also suggested to separate out the configs and have them specially catered for each provider, (example, region for minio here doesn't make much sense). Will update this.
I'll take a look at this as well, thanks! |
@HumairAK MinIO actually does need to allow But an even more important reason is that the |
Interesting, did not know this, will include this then as optional for minio
I would have expected the |
/test kubeflow-pipelines-samples-v2 |
/lgtm |
@rimolive this pr is currently missing the UI component, so it's not ready for merge (unless we want to do that in a follow up PR) until then I've kept the WIP prefix |
@HumairAK But the PR is a backend feature, right? For the frontend side, you should have a second PR with This looks good for a backend PR, so I don't see a problem with merging it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Just some minor comments on the naming convention. Otherwise it looks good to me. From the e2e test, I think it didn't break any of the existing frontend features right?
@HumairAK I just want to confirm if you are actually ready to have this PR reviewed? Are there any remaining tasks from #10625 (comment) and our other discussions. |
@thesuperzapper the backend portion is ready for review, based on the discussion above, I will follow up with a different pr to provide the front end support for this it should cover everything in your comment, save for the UI bit |
@HumairAK I want to make sure we don't break the "query parameter" approach introduced by #10319, which allows bucket roots to look like this:
(See https://gocloud.dev/howto/blob/#s3-compatible for the syntax) I am not sure what the best way to deal with this is. Perhaps we should just ignore pipeline roots that have query parameters set, and revert back to using Also, the query parameter approach right now only works for We might as well do that in this PR also. |
@thesuperzapper Tested query parameters with minio and aws S3 and it worked for me. Also tested with GCS buckets, with fromEnv = true and and You can easily test this PR by buildling the launcher/driver and setting: |
Signed-off-by: Humair Khan <HumairAK@users.noreply.github.com>
Instead of only reading the kfp-launcher when a custom pipeline root is specified, the root dag will now always read the kfp-launcher config to search for a matching bucket if such a configuration is provided in kfp-launcher Signed-off-by: Humair Khan <HumairAK@users.noreply.github.com>
Provides a structured configuration for bucket providers, whereby user can specify credentials for different providers and path prefixes. A new interface for providing sessions is introduced, which should be implemented for any new provider configuration support. Signed-off-by: Humair Khan <HumairAK@users.noreply.github.com>
Utilizes blob provider specific constructors to open s3, minio, gcs accordingly. If a sessioninfo is provided (via kfp-launcher config) then the associated secret is fetched for each case to gain credentials. If fromEnv is provided, then the standard url opener is used. Also separates out config fields and operations to a separate file. Signed-off-by: Humair Khan <HumairAK@users.noreply.github.com>
retrieves the session info (if provided via kfp-launcher) and utilizes it for opening the provider's associated bucket Signed-off-by: Humair Khan <HumairAK@users.noreply.github.com>
Signed-off-by: Humair Khan <HumairAK@users.noreply.github.com>
also added some additional code comments clarifying store cred variable usage Signed-off-by: Humair Khan <HumairAK@users.noreply.github.com>
as well as update validation logic for provider config, and fix tests accordingly. Signed-off-by: Humair Khan <HumairAK@users.noreply.github.com>
@HumairAK: The following test failed, say
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/lgtm cc @chensun |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: chensun, Tomcli The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi all, |
@sanchesoon do you mean as a field for the kfp-launcher? can you raise another issue for that? |
No problem! and thanks for the feedback. |
Description of your changes:
resolves most of: #9689 and #10651
the mlpipeline endpoint hardcoding is not addressed here.
This pr extends the kfp-launcher config to support configurations for supporting different credentials for s3/minio/gcs for different path prefixes. The config is parsed in the driver, and passed to executor via mlmd context custom property.
A fleshed out kfp-launcher configmap can look like this:
where the
<provider_config>
can look like:In this example, a
defaultPipelineRoot: 'gs://my-gcp-bucket-1/SOME/OTHER/KEY/PREFIX'
would select the firstgs
provider with the matchingkeyprefix
, in this case that would be:See the unit tests in env_test.go for the various different edge cases, please feel free to identify others that ought to be covered. The
testdata
in the test cases is pulled from here.This config is parsed in the driver, and when the matching credentials are identified, the information is stored as a mlmd contextproperty (stringified json) with name:
store_session_info
, an example of how this property looks like:For GS buckets:
For AWS S3 buckets (minio, and other s3 compatible store are identical):
If
fromEnv
is set totrue
, then the secret information is omitted fromParams
.Doing this in the Driver allows us to fetch the configmap once per pipeline execution in the driver, and retrieve it from context custom properties.
Considerations
providers
is never specified inkfp-launcher
, then users should not notice a difference regardless of if they are using the default minio, or gcs, etc.Checklist: