Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sdk): Enable AWS ALB authentication in KFP SDK Client #4182

Closed
wants to merge 8 commits into from

Conversation

Jeffwan
Copy link
Member

@Jeffwan Jeffwan commented Jul 8, 2020

Description of your changes:

Currently, kfp client can not be used outside AWS EKS cluster. Application load balancer manages outside traffic and require authentication before traffic coming into mesh. This PR automates ALB authentication and get session cookie to authenticate KFP python client to Kubeflow cluster.

This unblocks user to submit pipeline/run outside kubeflow cluster and user can integrate with their CI/CD solutions much easier.

Cognito or OIDC behind ALB both can leverage this solution.

Usage:

# Option 1. Pass username and password

from kfp import Client
HOSTNAME='https://kubeflow.platform.shanjiaxin.com/pipeline'
client = Client.create_cognito_authenticated(HOSTNAME, username='<your_username>', password='<your_passworkd>')
client.list_pipelines()

# Option 2. Interactively type your password without echoing.
client = Client.create_cognito_authenticated(HOSTNAME)


>> from kfp import Client
>>> HOSTNAME='https://kubeflow.platform.shanjiaxin.com/pipeline'
>>> client = Client.create_cognito_authenticated(HOSTNAME)

[WDM] - Current google-chrome version is 83.0.4103
INFO:WDM:Current google-chrome version is 83.0.4103
....
[/Users/shjiaxin/.config/kfp/chromedriver/drivers/chromedriver/mac64/83.0.4103.39]

Username: shjiaxin
Password:
>>> client.list_pipelines()
{'next_page_token': None,
 'pipelines': [{'created_at': datetime.datetime(2020, 7, 8, 7, 0, 13, tzinfo=tzutc()),
                'default_version': {'code_source_url': None,
                                    'created_at': datetime.datetime(2020, 7, 8, 7, 0, 13, tzinfo=tzutc()),
                                    'id': 'cd5251a2-18f6-40e6-ae7d-a7f1b9297928',

Checklist:

  • The title for your pull request (PR) should follow our title convention. Learn more about the pull request title convention used in this repository.

    PR titles examples:

    • fix(frontend): fixes empty page. Fixes #1234
      Use fix to indicate that this PR fixes a bug.
    • feat(backend): configurable service account. Fixes #1234, fixes #1235
      Use feat to indicate that this PR adds a new feature.
    • chore: set up changelog generation tools
      Use chore to indicate that this PR makes some changes that users don't need to know.
    • test: fix CI failure. Part of #1234
      Use part of to indicate that a PR is working on an issue, but shouldn't close the issue when merged.
  • Do you want this pull request (PR) cherry-picked into the current release branch?

    If yes, use one of the following options:

    • (Recommended.) Ask the PR approver to add the cherrypick-approved label to this PR. The release manager adds this PR to the release branch in a batch update.
    • After this PR is merged, create a cherry-pick PR to add these changes to the release branch. (For more information about creating a cherry-pick PR, see the Kubeflow Pipelines release guide.)

Can we pick this PR to 1.0.0 release?

@kubeflow-bot
Copy link

This change is Reviewable

@Jeffwan Jeffwan changed the title Enable AWS ALB authentication in KFP SDK Client feat(sdk): Enable AWS ALB authentication in KFP SDK Client Jul 8, 2020
@Jeffwan
Copy link
Member Author

Jeffwan commented Jul 8, 2020

@Ark-kun Is there a tool to format sdk projects?

@numerology
Copy link

@Ark-kun Is there a tool to format sdk projects?

I think you can try yapf. There is a config under pipeline/.style.yapf you can use.

@Jeffwan
Copy link
Member Author

Jeffwan commented Jul 9, 2020

@numerology Thanks. I find your original PR #2446 talking about coding styles.

I tried to run the command however, its shows some hints on existing codes as well. I am wondering if CI has check on this and block the merge? Is this required step or optional?

@numerology
Copy link

Is this required step or optional?

I think it's optional. The presubmit CI checks do not include this style check particularly.

@Jeffwan
Copy link
Member Author

Jeffwan commented Jul 9, 2020

There's a python compatibility issue in the code. I will fix it today

@Jeffwan
Copy link
Member Author

Jeffwan commented Jul 9, 2020

@numerology @Ark-kun @Bobgy

I see the CI build with different python versions. Do we have to make sure python codes need to be compatible with 3.5 3.6 and 3.7? That means I can not use some library >=3.6?

@Bobgy
Copy link
Contributor

Bobgy commented Jul 9, 2020

@Jeffwan you are right.

/cc @Ark-kun
Can explain more

@Ark-kun Ark-kun self-assigned this Jul 11, 2020
@Ark-kun
Copy link
Contributor

Ark-kun commented Jul 11, 2020

Thank you for working on new auth methods.
I think it's useful to have such functionality in KFP.

Several high-level comments:

  1. We want to try and keep the interface of the kfp.Client platform-agnostic. It supports authentication using OAuth tokens. I wonder whether we can avoid adding AWS-specific and specialized (Cognito) methods to Client? If they're needed, then maybe they should go into kfp/aws.py and then communicate with kfp.Client using token or cookie.

  2. The selenium and chrome dependencies might be pretty heavy. I'm not sure we should add them

  3. I'm not sure using password in code and emulating the browser to authenticate using password conforms to AWS security guidelines. AFAIK, usually the password-based authentication happens outside the programs/libraries and the library only receives the token.

  4. Authentication using OAuth and tokens might be preferred to authentication using passwords and cookies.

/cc @eterna2

@Ark-kun
Copy link
Contributor

Ark-kun commented Jul 11, 2020

I've found this article which describes a way to programmatically authenticate with Cognito and get a token that can be used with the standard Bearer authentication header.

You use ClientId and ClientSecret to send a token request and get back a JWT token that can be used for sending requests.

$ curl -X POST \
  https://lobster1234.auth.us-east-1.amazoncognito.com/oauth2/token \
  -H 'authorization: Basic <base64-encoded (ClientId:ClientSecret)>' \
  -H 'content-type: application/x-www-form-urlencoded' \
  -d 'grant_type=client_credentials&scope=transactions%2Fpost'

@Bobgy
Copy link
Contributor

Bobgy commented Jul 13, 2020

The selenium and chrome dependencies might be pretty heavy. I'm not sure we should add them

+1, this seems a big no to me too.

@eterna2
Copy link
Contributor

eterna2 commented Jul 13, 2020

I agree too. Selenium is too heavy. I can be wrong but I think ALB only support cookies for now, and not bearer authorization.

I would propose using the popular requests lib to do a headless authentication, we still can retrieve the session cookie this way.

https://requests.readthedocs.io/en/latest/user/quickstart/#cookies

I think we can also just create a separate module which returns a cookie or header etc.

import kfp
import kfp.auth

# auth_func is a callable that return either a dataclass or tuple - i.e. auth headers, cookie
client = kfp.Client(auth_func=kfp.auth.aws_alb_cognito(<some-hostname>, user=secret.user(), pwd=secret.pwd()))

@Jeffwan
Copy link
Member Author

Jeffwan commented Jul 17, 2020

I've found this article which describes a way to programmatically authenticate with Cognito and get a token that can be used with the standard Bearer authentication header.

You use ClientId and ClientSecret to send a token request and get back a JWT token that can be used for sending requests.

$ curl -X POST \
  https://lobster1234.auth.us-east-1.amazoncognito.com/oauth2/token \
  -H 'authorization: Basic <base64-encoded (ClientId:ClientSecret)>' \
  -H 'content-type: application/x-www-form-urlencoded' \
  -d 'grant_type=client_credentials&scope=transactions%2Fpost'

@Ark-kun @Bobgy Thanks for the feedback. Using Cognito separately is a different flow. We are using application load balancer with Cognito in this case, I talked with cognito team internally and still need to double check with ALB team on this. Currently, as I know, ALB doesn't accept intial Auth token from Cognito directly.

Totally agree this is kind of heavy but seems there's no other ways to easily authenticate user

@Jeffwan
Copy link
Member Author

Jeffwan commented Jul 17, 2020

@eterna2 do you know any ways to pass user info to ALB if we use requests?

@Jeffwan
Copy link
Member Author

Jeffwan commented Aug 6, 2020

@eterna2

The limitation of Solution 1 is Cognito can connect to any IDP, current PR (dom structure) is based on native Cognito. If user change to Okta, Auth0, the login page will be different. That's my major concerns.

Solution 2. I am not sure if the token from cognito can be used in ALB directly. If that's supported, I think that makes sense to use sdk to talk to cognito to exchange the token.

Based on most user's feedback, I think using client_secret is ok. Most users need this in their CI/CD system, which means this will be used by robot account..

@NikeNano
Copy link
Member

Are there any progress on this @Jeffwan, happy to help out if you need help?

@cloudbow
Copy link

This is a very important pull request for aws users as this is the only way we can use multi user isolation of pipelines programatically.

@davidspek
Copy link
Contributor

Ping @PatrickXYS

@chensun chensun force-pushed the master branch 2 times, most recently from 7542f0e to 44d22a6 Compare February 12, 2021 09:23
@tkislan
Copy link

tkislan commented Feb 12, 2021

Is Authenticating to Kubeflow API on AWS somewhere on the roadmap anytime soon?
Because without this, this whole project is just worthless for production deployment, since you need trigger everything through UI only

@NikeNano
Copy link
Member

@kubeflow/aws is this being worked on? Feels lika an important feature in order to support kubeflow on aws as pointed out #4182 (comment). Would be happy to help out with the efforts as well.

@PatrickXYS
Copy link
Member

I don't think anyone is doing this part, I'll be working on this when available.

@NikeNano
Copy link
Member

NikeNano commented Feb 18, 2021

Awesome @PatrickXYS, I am happy to help out if you need any help and you could give some pointer on where to read up on how to best solve this.

@jaystary
Copy link

Is this still active/worked on?

@Deseram
Copy link

Deseram commented Apr 27, 2021

Hi guys, is anyone looking into this? would be great if we can have this feat soon 🙏

@eterna2
Copy link
Contributor

eterna2 commented Apr 27, 2021

Hi guys, is anyone looking into this? would be great if we can have this feat soon 🙏

I think no one at the moment. Because there was some challenges in programmatically authenticate against alb.

I haven't figure out what sort of checks cognito is doing when I try to mimic a login w/o actually having a headless browser (e.g. puppeteer or selenium).

Not sure if there is some x-csrf token or maybe even source up check.

Tldr when I try to programmatically login, cognito reject the oauth call.

@mtszkw
Copy link

mtszkw commented May 27, 2021

Any progress on this? 😥 Is there any chance this can be done at all? @PatrickXYS

@davidspek
Copy link
Contributor

For those getting hung up on this, ArgoFlow for AWS is a distribution I started that deploys the Kubeflow (1.3) manifests using Argo CD and has integrations for AWS (such as S3 and RDS for pipelines). We re-implemented auth with Istio using OAuth2-Proxy which should make it easier to integrate with different OIDC providers. The current default setup is using an NLB in instance mode, where Istio handles ingress certificates on the gateways. With OAuth2-Proxy it shouldn't be difficult to set it up with Cognito. I'm not sure if that changes the SDK situation much, but given that the authentication is handles by Istio I expect the current auth mechanisms in the SDK would work.

os.chmod(DEFAULT_CHROME_DRIVER_PATH, stat.S_IEXEC)


def mkdir(path):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If defining this here it makes sense to use it on line 318 as well.

In both places I'm curious if there is a reason to avoid the following:
os.makedirs(path, exist_ok=True)

@Ark-kun Ark-kun assigned chensun and Bobgy and unassigned Ark-kun Jul 22, 2021
@stale
Copy link

stale bot commented Mar 2, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Mar 2, 2022
@rimolive
Copy link
Member

Closing this PR. No activity for more than a year.

/close

@stale stale bot removed the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Mar 24, 2024
Copy link

@rimolive: Closed this PR.

In response to this:

Closing this PR. No activity for more than a year.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@google-oss-prow google-oss-prow bot closed this Mar 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.