Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TFJobs UI doesn't work behind IAP; React APP needs support IAP? #574

Closed
jlewi opened this issue Apr 3, 2018 · 43 comments · Fixed by kubeflow/training-operator#688
Closed
Assignees
Labels

Comments

@jlewi
Copy link
Contributor

jlewi commented Apr 3, 2018

TFJobs UI is deployed on dev.kubeflow.org.

The UI shows up behind IAP but its doesn't work

  • No TFJobs are listed
  • Creating a job via the UI doesn't work.

Looking at the developer console we see requests to

https://accounts.google.com/o/oauth2/v2/auth?client_id=235037502967-9cpmvs4ljbiqb3ojtnhnhlkkd8d562rl.apps.googleusercontent.com&response_type=code&scope=openid+email&redirect_uri=https://dev.kubeflow.org/_gcp_gatekeeper/authenticate&state=Ci1odHRwczovL2Rldi5rdWJlZmxvdy5vcmcvdGZqb2JzL2FwaS9uYW1lc3BhY2USMEFOa2F0U2dYVjdSYnlHMzVXeGFwR1gxNURTS29TNjNiYnc6MTUyMjc4MDQxODEwOA

Which suggests to me the request is hitting the loadbalancer and being directed to do auth verification to sign in and its getting rejected.

So I think one of two things is happening

  1. The request is coming from the server running in K8s and incorrectly being redirected to the external loadbalncer and thus hitting IAP when it shouldn't be
  2. The request is coming from the client and the client needs to be updated to support IAP.

@wbuchwalter Do you know where the request is coming from?

You should be able to access it at
https://dev.kubeflow.org/tfjobs/ui/

@jlewi jlewi added priority/p1 area/front-end area/training Issues related to training and building models labels Apr 3, 2018
@wbuchwalter
Copy link
Contributor

@jlewi @ankushagarwal Could one of you grant me access to dev.kubeflow.org so I can test this?
wbuchwalter at gmail.
Thanks!

@jlewi This is most likely 2

@kkasravi
Copy link
Contributor

kkasravi commented Apr 5, 2018

@jlewi @ankushagarwal me too - Jeremy I know you tried adding me using my corp account but both don't seem to work (kamkasravi@gmail.com and kam.d.kasravi@intel.com). Sorry for being a pest.

I've had perhaps similar issues with the IAP I've set up (uses kubernetes 1.9) with the backend loadbalancers getting in the way. I tried logging into an envoy proxy as suggested and using curl to access noiap/whoami - no issues. It worked at one point a few weeks ago I think.

@jlewi
Copy link
Contributor Author

jlewi commented Apr 5, 2018

@kkasravi Doesn't look like you were a member so I added @intel.com

@wbuchwalter Can you send me your corporate account? I generally prefer to add corporate ids.

@jlewi
Copy link
Contributor Author

jlewi commented Apr 16, 2018

I think @wbuchwalter started looking at this last week.

@jlewi
Copy link
Contributor Author

jlewi commented May 8, 2018

@wbuchwalter any update?

@jlewi
Copy link
Contributor Author

jlewi commented Jun 5, 2018

Punting to 0.3 because no one is actively working on this so I don't think its going to make the cut for 0.2

@kkasravi
Copy link
Contributor

kkasravi commented Jun 6, 2018

@jlewi I can look at this - given my recent wrestling with jupyterhub I think it should be straigtforward.

@jlewi
Copy link
Contributor Author

jlewi commented Jun 8, 2018

@kkasravi That would be fantastic.

Related; it looks like recent changes to the central UI also created problems for IAP (#957)

I'd like to better understand the proper way to write web apps that are compatible with reverse proxies and IAP.

@jlewi
Copy link
Contributor Author

jlewi commented Jun 8, 2018

There's some guidance here about handling IAP refresh
https://cloud.google.com/iap/docs/sessions-howto

@jlewi
Copy link
Contributor Author

jlewi commented Jun 8, 2018

I just noticed that when I access the TFJobs dashboard the javascript console shows

Failed to load https://accounts.google.com/o/oauth2/v2/auth?client_id=236417448818-9roj383c48ruubrcjr9t6utf6mimlntf.apps.googleusercontent.com&response_type=code&scope=openid+email&redirect_uri=https://jlewi-kubeflow.endpoints.cloud-ml-dev.cloud.goog/_gcp_gatekeeper/authenticate&state=Ck1odHRwczovL2psZXdpLWt1YmVmbG93LmVuZHBvaW50cy5jbG91ZC1tbC1kZXYuY2xvdWQuZ29vZy90ZmpvYnMvYXBpL25hbWVzcGFjZRIwQUFoLVdabVA1N1IxWEFTRUE0RkFFanRUVXBqcU5MZjNUQToxNTI4NDgzNTQxNTk5: No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'https://jlewi-kubeflow.endpoints.cloud-ml-dev.cloud.goog' is therefore not allowed access. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.

@jlewi
Copy link
Contributor Author

jlewi commented Jun 8, 2018

Here's a suggestion I got

If it's all on the same domain, you just need withCredentials to be set to true on your XMLHttpRequests, that will cause the IAP cookie to be included when the Javascript makes its request.

In React:

fetch(url, {
   ...options
  credentials: 'include', // sets    withCredentials to true
});

@jlewi
Copy link
Contributor Author

jlewi commented Jun 11, 2018

@ankushagarwal Can you coordinate with Kam and try out the simple fix listed above?

/assign @ankushagarwal

@kkasravi
Copy link
Contributor

for jupyterhub it sets the x-goog-authenticated-user-email for REMOTE_USER - https://github.com/cwaldbieser/jhub_remote_user_authenticator#f1 - probably not telling @ankushagarwal anything he doesn't already know ...

@ankushagarwal
Copy link
Contributor

/assign @kkasravi

kkasravi pushed a commit to kkasravi/kubeflow that referenced this issue Jun 14, 2018
@kkasravi
Copy link
Contributor

working on this - had a few problems creating an IAP enabled cluster within our PROJECT
related to permission denied on patch-iam-policy

@jlewi
Copy link
Contributor Author

jlewi commented Jun 15, 2018

@kkasravi Kam you have access to project kubeflow-dev. Feel free to use that.

@kkasravi
Copy link
Contributor

kkasravi commented Jun 17, 2018

the services.js does have the fetch on line 5

export function getTFJobListService(namespace) {
  return fetch(`${host}/api/tfjob/${namespace}`).then(r => r.json());
}

However I don't think that's the problem.

I put tf-jobs behind ambassador and chrome is telling me that
the Access-Control-Allow-Origin is missing from the origin

I added cors to the tf-jobs-dashboard ambassador annotation by editing it directly

          "getambassador.io/config":
            std.join("\n", [
              "---",
              "apiVersion: ambassador/v0",
              "kind:  Mapping",
              "name: tfjobs-ui-mapping",
              "prefix: /tfjobs/",
              "rewrite: /tfjobs/",
              "cors:",
              "  origins: https://kamkasravi.com",
              "  credentials: yes",
              "service: tf-job-dashboard." + namespace,

i'm testing this approach. Will likely need to update the bootstrapper image.
Will have more info Sunday evening (1/17)

@kkasravi
Copy link
Contributor

i think i may need to rebuild the centraldashboard. Pinged @swiftdiaries

@jlewi
Copy link
Contributor Author

jlewi commented Jun 18, 2018

@kkasravi What is the connection between centraldashboard and the TFJobs UI?

@kkasravi
Copy link
Contributor

answered in slack but copying here for tracking purposes

when kubeflow is deployed with IAP enabled and you click on the 'tfjob' link in the kubeflow dashboard in the browser you'll see an error 'Failed to load https://accounts.google.com/o/oauth2/v2/auth?client_id=336335541993-1m7cegck4jic23263v0gplhc46f4rmmj.apps.googleusercontent.com...: No 'Access-Control-Allow-Origin' header is present on the requested resource'. The browser fails to load the oauth call to google because the request didn't set Origin, Access-Control-Request-Method, Cookie. If these request headers are set on the call to /tfjobs/ui/ then google will return Access-Control-Allow-Origin which will allow the browser to make the call to google for authentication. This can set in ambassador which will then allow the browser to call accounts.google.com and fetch the credentials. I'll provide a writeup on the issue and a reference in the PR.

@kkasravi
Copy link
Contributor

update: testing changes in tf-operator frontend services.js

@jlewi
Copy link
Contributor Author

jlewi commented Jun 28, 2018

@kkasravi I can try this out. I've got a sample app running. Here are the headers I see

{
  "headers": [
    [
      "Cookie", 
      "_xsrf=2|5c162792|f72997fe27b076e7b922d0923bc9297f|1529461576; jupyterhub-session-id=de3c3d7a16f048ae86fb34a546b6bb56; GCP_IAAP_XSRF_NONCE=jIHvjvW8o7MTu-8QbTzm6F6ihxK03ctj2jRLTx8xCzcbyZwKkjDxNBTuG0UQfYaRiI7qtduS_W09tuaEANJfbiChNO4xNuGlhIq1eBWNESP5TqRWNDFHX9p6FiCac2hnANjCGpjA7MJoElAZ44Hb4RhpzuRzILPfTaVZnQPvtsw"
    ], 
    [
      "Content-Length", 
      "0"
    ], 
    [
      "User-Agent", 
      "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36"
    ], 
    [
      "X-Goog-Authenticated-User-Id", 
      "accounts.google.com:112668565718473025939"
    ], 
    [
      "Via", 
      "1.1 google"
    ], 
    [
      "X-Forwarded-Proto", 
      "https"
    ], 
    [
      "X-Goog-Authenticated-User-Email", 
      "accounts.google.com:jlewi@google.com"
    ], 
    [
      "Sec-Istio-Auth-Userinfo", 
      "eyJpc3MiOiJodHRwczovL2Nsb3VkLmdvb2dsZS5jb20vaWFwIiwic3ViIjoiYWNjb3VudHMuZ29vZ2xlLmNvbToxMTI2Njg1NjU3MTg0NzMwMjU5MzkiLCJlbWFpbCI6ImpsZXdpQGdvb2dsZS5jb20iLCJhdWQiOiIvcHJvamVjdHMvMjM2NDE3NDQ4ODE4L2dsb2JhbC9iYWNrZW5kU2VydmljZXMvODc2NDc0OTc0NDY5MjYxODYxMCIsImV4cCI6MTUzMDIwNTA4MiwiaWF0IjoxNTMwMjA0NDgyLCJoZCI6Imdvb2dsZS5jb20ifQ"
    ], 
    [
      "X-Envoy-Original-Path", 
      "/echo/headers"
    ], 
    [
      "Host", 
      "jlewi-kubeflow.endpoints.cloud-ml-dev.cloud.goog"
    ], 
    [
      "Upgrade-Insecure-Requests", 
      "1"
    ], 
    [
      "Cache-Control", 
      "max-age=0"
    ], 
    [
      "Accept", 
      "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8"
    ], 
    [
      "Accept-Language", 
      "en-US,en;q=0.9"
    ], 
    [
      "X-Cloud-Trace-Context", 
      "0ec4d285905293ec89e399fd5d61635e/1146299229727795052"
    ], 
    [
      "X-Envoy-Expected-Rq-Timeout-Ms", 
      "3000"
    ], 
    [
      "X-Forwarded-For", 
      "104.132.7.98, 35.241.56.180"
    ], 
    [
      "X-Request-Id", 
      "b620ccff-ca78-4020-b4c3-fbcccc06d1b1"
    ], 
    [
      "Accept-Encoding", 
      "gzip, deflate, br"
    ]
  ]
}

@jlewi
Copy link
Contributor Author

jlewi commented Jun 28, 2018

I ran into some really strange behavior with ksonnet and adding Ambassador mappings; see ksonnet/ksonnet#670

jlewi added a commit to jlewi/kubeflow that referenced this issue Jun 28, 2018
* This is intended to support debugging IAP; we want to see what headers
  are on resulting requests.

* See kubeflow#574

* While creating this I ran into an issue with ksonnet not formatting the
  Ambassador mapping correctly unless we import it from a libsonnet file see
  ksonnet/ksonnet#670
@jlewi jlewi reopened this Jun 28, 2018
@jlewi
Copy link
Contributor Author

jlewi commented Jun 28, 2018

I tried the two fixes Kam mentioned above

#1 Use #688 in TFJobs UI to set request headers.
#2 #1073 to directly add route to tfjobs to envoy to avoid Ambassador.

For #1 I used
gcr.io/kubeflow-images-staging/tf_operator:kubeflow-tf-operator-presubmit-v2-688-1b33543-759-49f7
which is the image I grabbed from Kam's presubmit and then pushed to kubeflow-images-staging.

When I navigate to tfjobs/UI I'm still seeing errors

Failed to load https://accounts.google.com/o/oauth2/v2/auth?client_id=236417448818-9roj383c48ruubrcjr9t6utf6mimlntf.apps.googleusercontent.com&response_type=code&scope=openid+email&redirect_uri=https://jlewi-kubeflow.endpoints.cloud-ml-dev.cloud.goog/_gcp_gatekeeper/authenticate&state=CkpodHRwczovL2psZXdpLWt1YmVmbG93LmVuZHBvaW50cy5jbG91ZC1tbC1kZXYuY2xvdWQuZ29vZy90ZmpvYnMvYXBpL3Rmam9iLxIwQUFoLVdaa2huTkVXb0xHRnRCek55RVFlZUtnMXJreEs5ZzoxNTMwMjE4NDIzNzk1: No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'https://jlewi-kubeflow.endpoints.cloud-ml-dev.cloud.goog' is therefore not allowed access. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.

When I try to create a job in the UI I get

services.js:18 POST https://jlewi-kubeflow.endpoints.cloud-ml-dev.cloud.goog/tfjobs/api/tfjob 401 ()
SyntaxError: Unexpected token T in JSON at position 0

@jlewi
Copy link
Contributor Author

jlewi commented Jun 28, 2018

I ran an experiment where I enabled CORS everywhere using a firefox plugin
https://addons.mozilla.org/en-US/firefox/addon/cors-everywhere/

That fixed the Failed to load error. Not it looks like I'm just get the json error

When I created a TFJob (via the CLI) it should up in the dashboard but when I tried to click on it I got the json error above. I have v1alpha2 of TFJob configured. I wonder if its still using TFJob v1alpha1 and that is the problem.

@kkasravi
Copy link
Contributor

kkasravi commented Jun 28, 2018

@jlewi are you using 'default' for the namespace? I opened kubeflow/training-operator#701 which happens when a different namespace is used (in the UI not the CLI). I found that the POST to create the tfjob was reaching the server. I'm wondering if my cors headers aren't correct if your firefox plugin is working everywhere or cors needs to be set earlier before tfjobs is selected - in the central ui splash page. I mentioned that I get the 'Failed to load ...' error 1 time when going to the tfjob page and afterwards i don't see it reappearing even though the /tfjob/ refreshes every 30sec or so.

I'm working on a fix for kubeflow/training-operator#701 which adds the KUBEFLOW_NAMESPACE to the tf-operator environment in tf-job-operator.libsonnet.
#1099

@jlewi
Copy link
Contributor Author

jlewi commented Jun 29, 2018

@kkasravi Kubeflow is deployed in namespace kubeflow. I think I used that namespace for the job as well.

Would it be worth while to try to create a simple test app that we could use to work out the CORS issues without conflating it with other issues? I'm not sure what that would look like exactly.

@kkasravi
Copy link
Contributor

kkasravi commented Jun 29, 2018

@jlewi sure - a simple POST to a URL like this one with a backend go app that sends back an ack / 200 or something - eg what you opened #1097

@kkasravi
Copy link
Contributor

Won't be able to get to this until late afternoon/early evening due to PTO plans most of day. Will update

k8s-ci-robot pushed a commit that referenced this issue Jun 29, 2018
* Create a version of echo-server to echo headers.

* This is intended to support debugging IAP; we want to see what headers
  are on resulting requests.

* See #574

* While creating this I ran into an issue with ksonnet not formatting the
  Ambassador mapping correctly unless we import it from a libsonnet file see
  ksonnet/ksonnet#670

* Address comments.

* Reference the images in kubeflow-images-public.

* Autoformat.
@kkasravi
Copy link
Contributor

kkasravi commented Jun 30, 2018

i believe what is remaining on this is PR kubeflow/training-operator#688 which sets up cors for all tfjobs fetches (GET, POST, DELETE) in the UI (services.js).

@jlewi
Copy link
Contributor Author

jlewi commented Jul 23, 2018

We need to build a new image for tf-operator to pick up the changes kubeflow/training-operator#688

@jlewi jlewi reopened this Jul 23, 2018
@jlewi
Copy link
Contributor Author

jlewi commented Jul 23, 2018

I reopened this issue to track verification that it is working after updating the images.

@jlewi
Copy link
Contributor Author

jlewi commented Jul 25, 2018

I created a new image:
gcr.io/kubeflow-images-public/tf_operator:v20180724-13863edf

Looks like IAP is now working. Here's a screen shot showing that the jobs can load.

tfjobs_ui

Note though that nothing is showing up in Name, Status, Logs.

Developer console is showing me errors

:8080/tfjobs/api/namespace:1 Failed to load resource: the server responded with a status of 500 (Internal Server Error)
SyntaxError: Unexpected token a in JSON at position 1
    at services.js:81
registerServiceWorker.js:65 Content is cached for offline use.

I see similar errors if try to create a job via the UI.

This appears to be a different issue, not related to IAP. I observe similar behavior if I try to create a job via the UI.

@jlewi jlewi closed this as completed Jul 25, 2018
jlewi added a commit to jlewi/kubeflow that referenced this issue Jul 25, 2018
jlewi added a commit to jlewi/kubeflow that referenced this issue Jul 27, 2018
yanniszark pushed a commit to arrikto/kubeflow that referenced this issue Nov 1, 2019
saffaalvi pushed a commit to StatCan/kubeflow that referenced this issue Feb 11, 2021
* fixing kubeflow#574

* adding cors to ambassador for /tfjobs/

* update iap to include tfjobs path

* remove cors from ambassador annotations - not needed

* remove spurious changes
saffaalvi pushed a commit to StatCan/kubeflow that referenced this issue Feb 11, 2021
* Create a version of echo-server to echo headers.

* This is intended to support debugging IAP; we want to see what headers
  are on resulting requests.

* See kubeflow#574

* While creating this I ran into an issue with ksonnet not formatting the
  Ambassador mapping correctly unless we import it from a libsonnet file see
  ksonnet/ksonnet#670

* Address comments.

* Reference the images in kubeflow-images-public.

* Autoformat.
yanniszark pushed a commit to arrikto/kubeflow that referenced this issue Feb 15, 2021
* Add Validate Algorithm Settings

* Integrate ValidateAlgorithmSettings in ManagerClient

* Run dep ensure
elenzio9 pushed a commit to arrikto/kubeflow that referenced this issue Oct 31, 2022
* add jocstaa to members

* Update org.yaml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants