Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model Serving not working on AWS Kubeflow with Cognito #1154

Closed
seizadi opened this issue Oct 22, 2020 · 14 comments
Closed

Model Serving not working on AWS Kubeflow with Cognito #1154

seizadi opened this issue Oct 22, 2020 · 14 comments

Comments

@seizadi
Copy link

seizadi commented Oct 22, 2020

/kind bug

I have a Kubeflow on AWS EKS using AWS Cognito with ALB. Kubeflow dashboard, notebook server and pipelines work fine. I have problem with kfserving model API access for model prediction. The model deploys in my namespace:

kubectl get -n seizadi inferenceservices
NAME           URL                                                                       READY   DEFAULT TRAFFIC   CANARY TRAFFIC   AGE
sklearn-iris   http://sklearn-iris.seizadi.platform.example.com/v1/models/sklearn-iris   True    100                                31h

I get the session cookie from my browser and try a regular request to make sure my session cookie is valid:

import requests
# api-endpoint 
url = 'https://kubeflow.platform.example.com/pipeline/apis/v1beta1/pipelines?page_token=&page_size=10&sort_by=created_at%20desc&filter='
cookies = {'AWSELBAuthSessionCookie-0': 'xxxxxx'
           'AWSELBAuthSessionCookie-1': 'xxxxxx'
          }
r = requests.get(url=url, cookies=cookies)
result = r.json()

This request works fine and now I try to create a kfserving model prediction request:

url =https://kubeflow.platform.example.com/v1/models/sklearn-iris:predictheaders={‘Host’: ‘sklearn-iris.seizadi.platform.example.com’}
# data to be sent to api
data = { ‘instances’: [ [6.8,  2.8,  4.8,  1.4], [6.0,  3.4,  4.5,  1.6] ] }
r = requests.post(url=url, cookies=cookies, data=data, headers=headers)
print(r) 

What did you expect to happen:
I would expect the API request to return model prediction result, but
instead I get 404 error in response.

From istio-gateway looks like this request is not routed properly, it is sent to centraldashboard.kubeflow.svc.cluster.local

[2020-10-22T02:11:42.455Z] "POST /v1/models/sklearn-iris:predict HTTP/1.1" 404 - "-" 111 170 4 1 "73.241.144.80,192.168.68.197" "python-requests/2.24.0" "44bfcc13-45d1-9d97-b207-b75504c24fe3" "sklearn-iris.seizadi.platform.example.com" "192.168.11.138:8082" outbound|80||centraldashboard.kubeflow.svc.cluster.local - 192.168.80.224:80 192.168.68.197:6436 -

Environment:

  • Istio Version:1.1.6

  • Knative Version:
    kubectl get namespace knative-serving -o 'go-template={{index .metadata.labels "serving.knative.dev/release"}}'
    v0.11.1%

  • KFServing Version:
    image: gcr.io/kfserving/kfserving-controller:0.2.2

  • Kubeflow version:
    ❯ kfctl version
    kfctl v1.0.2-0-ga476281

Server:
kfctl_aws_cognito.v1.0.2.yaml
https://github.com/kubeflow/manifests/blob/master/kfdef/kfctl_aws_cognito.v1.0.2.yaml

  • Kfdef:[k8s_istio/istio_dex/gcp_basic_auth/gcp_iap/aws/aws_cognito/ibm]
    aws_cognito

  • Minikube version:

  • Kubernetes version: (use kubectl version):

  • OS (e.g. from /etc/os-release):

AWS EKS

❯ kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.0", GitCommit:"9e991415386e4cf155a24b1da15becaa390438d8", GitTreeState:"clean", BuildDate:"2020-03-25T14:58:59Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15+", GitVersion:"v1.15.11-eks-065dce", GitCommit:"065dcecfcd2a91bd68a17ee0b5e895088430bd05", GitTreeState:"clean", BuildDate:"2020-07-16T01:44:47Z", GoVersion:"go1.12.17", Compiler:"gc", Platform:"linux/amd64"}
@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the labels:

Label Probability
area/inference 0.96

Please mark this comment with 👍 or 👎 to give our bot feedback!
Links: app homepage, dashboard and code for this bot.

@issue-label-bot
Copy link

Issue Label Bot is not confident enough to auto-label this issue.
See dashboard for more details.

@yuzisun
Copy link
Member

yuzisun commented Oct 23, 2020

@seizadi Please upgrade to kubeflow 1.1 if possible for KFServing, there are known issues in 1.0.2 see #928 .

@seizadi
Copy link
Author

seizadi commented Oct 23, 2020

I can upgrade to Kubeflow 1.1 and report, I did look at open issues with kfserving and noted this open issue for GCP IAP #824.

Will I have similar problem with AWS Cognito trying to get Host routing to work?

@yuzisun
Copy link
Member

yuzisun commented Oct 23, 2020

I can upgrade to Kubeflow 1.1 and report, I did look at open issues with kfserving and noted this open issue for GCP IAP #824.

Will I have similar problem with AWS Cognito trying to get Host routing to work?

I think so, there is walk-around you can do but we'd definitely recommend upgrading to kubeflow 1.1.

@seizadi
Copy link
Author

seizadi commented Oct 23, 2020

Ok, let me rebuild the cluster and we can discuss the work-around.

Similar to that issue with the work-around will AuthZ need to be disabled?

@seizadi
Copy link
Author

seizadi commented Oct 23, 2020

Is there an IDP solution that will work with AuthZ, there is the Dex option, https://www.kubeflow.org/docs/started/k8s/kfctl-istio-dex/ ?

@yuzisun
Copy link
Member

yuzisun commented Oct 23, 2020

Is there an IDP solution that will work with AuthZ, there is the Dex option, https://www.kubeflow.org/docs/started/k8s/kfctl-istio-dex/ ?

@seizadi checkout the dex example, AuthZ is not integrated with KFServing in kubeflow, so you would still need to either manually create the istio auth policy or disable the sidecar like in the example.

@seizadi
Copy link
Author

seizadi commented Nov 5, 2020

Ok, I finally was able to upgrade to 1.1.0, it was not as easy as I expected:
kubeflow/kubeflow#5370 (comment)

So I'm back to where I am in applying the model:

kubectl -n seizadi apply -f https://raw.githubusercontent.com/kubeflow/kfserving/master/docs/samples/sklearn/sklearn.yaml

and get the 404 error, how should I try to patch this to work with Kubeflow 1.1.0?

@seizadi
Copy link
Author

seizadi commented Nov 6, 2020

I followed instructions for
GCloud IAP.

This is the updated python script I used to make request:

url = 'https://kubeflow.platform.example.com/kfserving/seizadi/sklearn-iap:predict'

# data to be sent to api 
data = { 'instances': [ [6.8,  2.8,  4.8,  1.4], [6.0,  3.4,  4.5,  1.6] ] } 

resp = requests.post(url=url, cookies=cookies, data=data)

if resp.status_code == 403:
    print('Service account does not have permission to access the application.')
elif resp.status_code == 404:
    print('Host route not available to access the application.')
elif resp.status_code != 200:
    print('Bad response from application: {!r} / {!r} / {!r}'.format(
        resp.status_code, resp.headers, resp.text))
else:
    print(resp.text)

The request makes it way to the predictor, but I get a 400 error:

❯ k logs sklearn-iap-predictor-default-stx69-deployment-5844ff4b5c-ttbnc kfserving-container
[I 201106 00:52:49 storage:35] Copying contents of /mnt/models to local
....
[W 201106 00:59:39 web:2250] 400 POST /v1/models/sklearn-iap:predict (127.0.0.1) 1.31ms

Here is the error message I get from my client:

Bad response from application: 400 / {'Date': 'Fri, 06 Nov 2020 02:23:52 GMT', 'Content-Type': 'text/html; charset=UTF-8', 'Content-Length': '191', 'Connection': 'keep-alive', 'Set-Cookie': 'AWSELBAuthSessionCookie-0=xxxxx; Expires=Fri, 18 Sep 2071 03:28:04 GMT; Path=/; Secure; HttpOnly, AWSELBAuthSessionCookie-1=xxxx; Expires=Fri, 18 Sep 2071 03:28:04 GMT; Path=/; Secure; HttpOnly', 'server': 'istio-envoy', 'x-envoy-upstream-service-time': '5'} / '<html><title>400: Unrecognized request format: Expecting value: line 1 column 1 (char 0)</title><body>400: Unrecognized request format: Expecting value: line 1 column 1 (char 0)</body></html>'

Looks like the data is not getting to the
kfserving json decoder.

I'm not sure how you debug this type of problem, ideally you want to follow the request from ALB -> Ingress Gateway -> Predictor.

@seizadi
Copy link
Author

seizadi commented Nov 6, 2020

I added small fix to the client script to send out request in JSON:

url = 'https://kubeflow.platform.example.com/kfserving/seizadi/sklearn-iap:predict'

# data to be sent to api 
data = { 'instances': [ [6.8,  2.8,  4.8,  1.4], [6.0,  3.4,  4.5,  1.6] ] } 

resp = requests.post(url=url, cookies=cookies, json=data)

if resp.status_code == 403:
    print('Service account does not have permission to access the application.')
elif resp.status_code == 404:
    print('Host route not available to access the application.')
elif resp.status_code != 200:
    print('Bad response from application: {!r} / {!r} / {!r}'.format(
        resp.status_code, resp.headers, resp.text))
else:
    print(resp.text)

It generates the prediction result:

{"predictions": [1, 1]}

@seizadi seizadi closed this as completed Nov 6, 2020
@karlschriek
Copy link
Contributor

karlschriek commented Nov 30, 2020

@seizadi could you maybe provide some more explanation on what you did here? I am similarly not able to get the cookie-based auth to route correctly. I am following the AWS end-to-end instructions. Have posted an issue relating to that here: kubeflow/website#2378

Are you suggesting that one should use the GCloud IAP approach as above instead?

Also, why is your url https://kubeflow.platform.example.com/v1/models/sklearn-iris:predict and not https://sklearn-iris.kubeflow.platform.example.com/v1/models/sklearn-iris:predict as is suggested in the AWS end-to-end guide? Did you set the Knative network config to just {{Namespace}}.{{Domain}} (as opposed to the default {{Name}}.{{Namespace}}.{{Domain}} )?

@seizadi
Copy link
Author

seizadi commented Nov 30, 2020

I don't work on this project so I don't get notification like the one you posted on kubeflow/website#2378. Probably need a guide like GCloud IAP written for AWS Cognito, so you have guide to follow.

I read your issue and looks like you have not upgraded to Kubelow 1.1.x from 1.0.2 so that will be the first thing to do as there are changes you need from that release and you don't want to patch 1.0.2 with those changes.

To answer your last question once you read the GCloud IAP guide, similarly AWS Cognito, exposes what is called path-based routing. In contrast KFserving uses host-based routing. The solution is described here in using Istio Virtual Service. In this model you have path-based routing exposed to the public and you map is to a host-based route that kfserving can consume.

@karlschriek
Copy link
Contributor

@seizadi do the changes relate to Kubeflow 1.1.x or does it have to do with a newer version of KFServing? We are not upgrading to 1.1 (or 1.2 for that matter) yet because the multi-tenancy for Pipelines also faces authentication problems at the moment which we don't have a suitable solution for. We are currently running KFServing 0.3.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants