Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] remove hard-coded configs in KFP V2 launcher #9689

Open
thesuperzapper opened this issue Jun 29, 2023 · 14 comments
Open

[feature] remove hard-coded configs in KFP V2 launcher #9689

thesuperzapper opened this issue Jun 29, 2023 · 14 comments

Comments

@thesuperzapper
Copy link
Member

thesuperzapper commented Jun 29, 2023

Feature Area

/area backend

Related Issues

The Problem

Currently, we are hardcoding a few important configs in KFP V2's launcher and cache code.
This is making it difficult for distributions to support KFP v2 in a generic way.

Launcher - The minio credentials secret is hard-coded as mlpipeline-minio-artifact:

const minioArtifactSecretName = "mlpipeline-minio-artifact"

Launcher - The minio credentials secret keys are hard-coded as accesskey and secretkey:

accessKey := string(secret.Data["accesskey"])
secretKey := string(secret.Data["secretkey"])

Launcher - The minio endpoint is hard-coded as minio-service.kubeflow:9000:

const defaultMinioEndpointInMultiUserMode = "minio-service.kubeflow:9000"

Cache - The ml-pipeline endpoint is hard-coded as ml-pipeline.kubeflow:8887:

defaultKfpApiEndpoint = "ml-pipeline.kubeflow:8887"

The Solution

We want to allow the above configs to be changed from their defaults, preferably on a per-namespace basis.

I propose we add these configs to the ConfigMap/kfp-launcher which already exists in each profile namespace to set defaultPipelineRoot.

For example, a new ConfigMap/kfp-launcher in a profile namespace might look like this:

data:
  defaultPipelineRoot: "minio://mlpipeline/v2/artifacts"
 
  ## minio endpoint config
  minioEndpoint: "minio.example.com:9000"
 
  ## minio auth configs
  minioAuthSecret: "mlpipeline-minio-artifact"
  minioAuthSecretAccessKeyKey: "access_key"
  minioAuthSecretAccessKeyKey: "access_key"

  ## ml pipeline endpoint config
  mlPipelineEndpoint: "ml-pipeline.kubeflow:8887"

The question now becomes how to propagate these configs to the kfp-launcher container in each workflow Pod.
I think there are two options:

Other Thoughts

We could make this slightly more advanced to allow for "selecting" credentials based on the bucket & key-prefix being used, because not all pipelines have to use the default bucket_root, so might need different credentials.

For example, we could extend ConfigMap/kfp-launcher like this:

data:
  defaultPipelineRoot: "minio://my-bucket/v2/artifacts"

  ## minio endpoint config
  minioEndpoint: "minio.example.com:9000"

  ## ml pipeline endpoint config
  mlPipelineEndpoint: "ml-pipeline.kubeflow:8887"

  ## bucket auth configs (as a YAML string)
  bucketAuth: |
    ## list of "auth providers" for minio
    minio:

      ## specifies auth for objects under "minio://my-bucket/v2/artifacts/"
      - bucketName: "my-bucket"
        keyPrefix: "v2/artifacts/"
        authFromSecret:
          secretName: "my-secret-in-profile-namespace"
          accessKeyKey: "access_key"
          secretKeyKey: "secret_key"

    ## similar, but for s3 buckets
    s3: []

    ## similar, but for gcs buckets
    gcs: []

Love this idea? Give it a 👍.

@thesuperzapper
Copy link
Member Author

thesuperzapper commented Jun 29, 2023

@chensun @zijianjoy I think this is very important before cutting the 2.0.0 for the backend.

@zijianjoy
Copy link
Collaborator

/assign @zijianjoy

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Oct 12, 2023
DnPlas added a commit to canonical/minio-operator that referenced this issue Oct 13, 2023
The upstream pipelines project hardcodes the name of the
object storage service to minio-service. The current implementation
sets the service to just minio, which collides with what pipelines
expect. This PR ensures the Service is created with the expected
name and ports.
Refer to kubeflow/pipelines#9689 for more information
DnPlas added a commit to canonical/minio-operator that referenced this issue Oct 17, 2023
* fix: explicitly apply minio-service with name

The upstream pipelines project hardcodes the name of the
object storage service to minio-service. The current implementation
sets the service to just minio, which collides with what pipelines
expect. This PR ensures the Service is created with the expected
name and ports.
Refer to kubeflow/pipelines#9689 for more information
@ca-scribner
Copy link
Contributor

Did this get resolved before the 1.8 release?

@thesuperzapper
Copy link
Member Author

@chensun can you confirm what the status on this is?

ca-scribner added a commit to canonical/kfp-operators that referenced this issue Nov 2, 2023
This:
* removes deprecated `DBCONFIG_USER`, etc, environment variables (they have been replaced by variables of name `DBCONFIG_[driver]CONFIG_*`)
* adds `OBJECTSTORECONFIG_HOST`, `_PORT`, and `_REGION`, which previously were required.  Although currently they seem to be ignored due to kubeflow/pipelines#9689 - but in theory they'll matter again?  Not sure exactly the scope of that issue.
@github-actions github-actions bot removed the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Nov 3, 2023
ca-scribner added a commit to canonical/kfp-operators that referenced this issue Nov 16, 2023
* update kfp-api's apiserver configuration

This:
* removes deprecated `DBCONFIG_USER`, etc, environment variables (they have been replaced by variables of name `DBCONFIG_[driver]CONFIG_*`)
* adds `OBJECTSTORECONFIG_HOST`, `_PORT`, and `_REGION`, which previously were required.  Although currently they seem to be ignored due to kubeflow/pipelines#9689 - but in theory they'll matter again?  Not sure exactly the scope of that issue.
DnPlas pushed a commit to canonical/kfp-operators that referenced this issue Nov 20, 2023
* update kfp-api's apiserver configuration

This:
* removes deprecated `DBCONFIG_USER`, etc, environment variables (they have been replaced by variables of name `DBCONFIG_[driver]CONFIG_*`)
* adds `OBJECTSTORECONFIG_HOST`, `_PORT`, and `_REGION`, which previously were required.  Although currently they seem to be ignored due to kubeflow/pipelines#9689 - but in theory they'll matter again?  Not sure exactly the scope of that issue.
@TobiasGoerke
Copy link
Contributor

Just seeing this issue after I created #10318 and implemented #10319.

If implemented, this would enable users to define arbitrary s3 endpoints for artifact storage.
Appropriate envs (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and AWS_REGION) still need to be added to the pipeline pods on some other way.

Does this help?

@HumairAK
Copy link
Contributor

Thank you for this @TobiasGoerke I think this is an improvement. Though the problem of the hardcoded configs still remains. Adding the endpoint as part of the query string seems more like a workaround to me for this issue, and as you said the issue of propagating credentials remains.

There seems to be already a precedent with passing around mlmd endpoint info via a configmap, seems to make sense to do the same for the s3 endpoint as suggested by @thesuperzapper. In this same config we can specify the s3 credentials secret name/keys as well.

@HumairAK
Copy link
Contributor

HumairAK commented Jan 16, 2024

@zijianjoy this is a priority item for our team so we'll be looking to resolve this in our downstream, and if you'd like I can help with implementing it here as well. If you are already in progress with this and would like to see it to completion, we'd be interested in your approach so we can be aligned in our solution while we wait for the change in upstream!

Some follow ups:

1. Is there an obvious preference over option 1/2?

Option 1 it seems easier to implement as well as easier to extend, one does not need to always update mlmd client code when adding new properties to the config. However, as previously mentioned it requires that every executor pod fetches the configmap, is this a major concern?

Option 2 we'd follow the same pattern as we do for reading in the pipelineroot. So addtiional metadata fields for the provider creds would need to be added. In the driver we would find the matching auth provider config in the kfp launcher configmap and maybe store that as a json string. We then query MLMD for the configs within the launcher.

I would be interested to hear other options/suggestions under consideration.

2. Configmap structure

I like @thesuperzapper more flexible configmap suggestions, but I have some further suggestions/amendments:

    defaultPipelineRoot: "minio://my-bucket/v2/artifacts"
    providers:
        minio:
          # can be another minio instance
          endpoint: "minio.example.com:9000"
          disableSSL: true
          # by default use this secret for this provider
          defaultProviderSecretRef:
            secretName: "my-secret-in-profile-namespace"
            accessKeyKey: "access_key"
            secretKeyKey: "secret_key"
          # optional, first matching prefix in pipelineRoot is used
          # in this example v2/artifacts would be used if using the 
          # defaultPipelineRoot above
          authConfigs:
            - bucketName: "my-bucket"
              keyPrefix: "v2/artifacts/123"
              secretRef:
                secretName: "my-secret-in-profile-namespace-1"
                accessKeyKey: "access_key"
                secretKeyKey: "secret_key"
            - bucketName: "my-bucket"
              keyPrefix: "v2/artifacts/"
              secretRef:
                secretName: "my-secret-in-profile-namespace-2"
                accessKeyKey: "access_key"
                secretKeyKey: "secret_key"

        # Having the ability to override the regions here allows us to leverage other s3 compatible providers
        # and not just aws s3
        s3:
          # allows for pointing to s3 compatible non aws solutions like ceph
          endpoint: "https://s3.amazonaws.com"
          disableSSL: true
          region: us-east-2
          defaultProviderSecretRef:
            secretName: "my-aws-secret-in-profile-namespace"
            accessKeyKey: "access_key"
            secretKeyKey: "secret_key"
          authConfigs: []
        gs: { }

3. mlPipelineEndpoint

I don't see the need to add this in kfp-launcher configmap, to me it makes more sense to specify this in the API server as a config/env var and pass it down as parameters within driver/kfp-launcher/etc. just like the mlmd address/port.

@thesuperzapper
Copy link
Member Author

@HumairAK I see that you have at least partially fixed this in your fork of KFP:

Would you be willing to raise a PR upstream, so we can all standardize on a fix?

@HumairAK
Copy link
Contributor

@thesuperzapper that's the plan, I'll send one out soon

@HumairAK
Copy link
Contributor

HumairAK commented Mar 27, 2024

@thesuperzapper I've provided a WIP that is mostly complete, just need to do more testing, and fix a obj store region related bug

I welcome feedback on implementation/approach there:

#10625

Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label May 27, 2024
@hbelmiro
Copy link
Contributor

Don't stale.

@stale stale bot removed the lifecycle/stale The issue / pull request is stale, any activities remove this label. label May 27, 2024
@rimolive
Copy link
Member

rimolive commented Jun 4, 2024

/lifecycle frozen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: P1
Development

No branches or pull requests

7 participants