-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: "SSL connection errors" connecting to a Data Science Pipelines apiserver instance #244
Comments
related: opendatahub-io/notebooks#159 |
[internal Red Hat] slack thread about this: https://redhat-internal.slack.com/archives/C05KDB2HFQQ/p1693240911518529 |
Seeing another case of this from the field will post their findings here once I receive the same. |
#362 - This issue is also relevant to the topic being discussed here. I am working on a fix which might solve both the referenced and issue and this one. I will need to find a way test both the fixes once I am done. |
Investigated this a bit. Note this drawing This ticket is about the blue circle, whereas #362 is about the red circle. They will have different solutions. |
We should confirm whether this is a problem on dsp backend side |
We were seeing this on known good properly secured clusters -- specifically OSD, which plumbs in LetsEncrypt on the apps route. So this is not quite the same problem as the "use my self signed cert everywhere" problem. With LE properly on board the route, we were seeing both Elyra and Dashboard having TLS problems connecting to our route. In the case where LE is on the route, neither Elyra nor Dashboard should have a problem connecting to our route. The fact that they both do made me nervous that something was wrong with the DSP route, because the chances that Elyra and Dashboard both have to enable insecure connection to connect to the DSP route ... something is fishy. The chance that they are both struggling to load the correct OS-level CA bundle from their respective container image bases seems pretty low. |
Okay so I tried taking a look at this and my conclusion is that this is not a DSP related issue. I reproduced this error by enabling tls verify in the node js server for odh dashboard here. And deploying both DSP and ODH Dashboard on a hypershift cluster secured via LetsEncrypt, which is recognized by basically most major operating systems/browsers. Let's understand this toggle:
As we all know, this field is disabled by odh-dashboard currently to bypass the current issue above which is not ideal. Technically we would expect connections from Dashboard -> Pipelines route to always work (even on self-signed certs), because we use Re-encrypt Routes. Which means the Dashboard pod default cert bundles should be able to validate the cert chains received from DSP route. Indeed, if I curl the DSP route from the Dashboard pod from a LetsEncrypt secured OCP cluster I do not get any insecure tls errors : Curl successful from ODH Dashboard pod to DSP pod: sh-4.4$ curl -I --request GET "https://ds-pipeline-pipelines-definition-test.apps.rosa.greg-1102.kv5l.p3.openshiftapps.com/apis/v1beta1/runs" -H "Authorization: Bearer <omited>"
HTTP/1.1 200 OK
content-length: 2
content-type: application/json
date: Fri, 03 Nov 2023 17:52:58 GMT
gap-auth: hukhan@redhat.com@cluster.local
gap-upstream-address: localhost:8888
grpc-metadata-content-type: application/grpc
set-cookie: 16569b7999b96e75b5553be61c12f275=c764d98e5e4250350bd87f7716e922e4; path=/; HttpOnly; Secure; SameSite=None
cache-control: private If it was behind an unrecognized cert hitting the route would yield: $ curl -I https://ds-pipeline-sample-dspa.apps.hukhan-3.dev.datahub.redhat.com/apis/v1beta1/runs -H "Authorization: Bearer <omitted>"
curl: (60) SSL certificate problem: self-signed certificate in certificate chain
More details here: https://curl.se/docs/sslcerts.html
curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above. And I would need to enable curl's So we know the odh-dashboard pod can indeed run rest calls against the DSP route with tls enabled. Then why do we see the error above only when running calls via odh dashboard proxy? Here I have no clear answers, but my hunch is the culprit is here. When making proxy calls this CA seems to be passed in as part of the
I tried to mess around with the CAs being mounted here though and found little success, currently it seems the CA passed in is the same as the one found in |
Before closing this out, I think it would be good to get an ack from dashboard team that this is dashboard/UI side based on the above, if there's still opinions that this is a dsp issue we can continue pursuing / investigating this further. |
It is also worth mentioning currently Dashboard connects to DSP api server via an OCP |
I would encourage the dashboard team to remove the insecure flag, especially if they change to connect to the |
Just wanted to expand on this and provide a simple script anyone can use to validate these finding themselves on self-signed clusters. You can basically retrieve the certs from the dashboard pods and make api requests against the DSP api server without using # Set some env vars
DS_PROJECT=testing
DS_ROUTE=$(echo https://$(oc get routes -n ${DS_PROJECT} ds-pipeline-pipelines-definition --template={{.spec.host}}))
TOKEN=$(oc whoami --show-token)
DASHBOARD_NAMESPACE=opendatahub
DASHBOARD_POD_NAME=$(oc get pods | grep odh-dashboard | awk '{print $1}')
# Our working directory
cd $(mktemp -d)
# Copy the symlink to known location so we can easily retrieve it from the pod, if connecting via k8s service and not route, the path used should be /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt
oc rsh -n opendatahub $(oc get pods | grep odh-dashboard | awk '{print $1}')
mkdir /tmp/testcert && cd /tmp/testcert
cp /var/run/secrets/kubernetes.io/serviceaccount/ca.crt .
exit
# Get the cert to our local machine from the dashboard pod
oc cp ${DASHBOARD_NAMESPACE}/${DASHBOARD_POD_NAME}:/tmp/testcert/ca.crt ./ca.crt
# Use this cert to make a recognized connection to the dsp route
curl -I --request GET "${DS_ROUTE}/apis/v1beta1/runs" \
-H "Authorization: Bearer ${TOKEN}" --cacert $(pwd)/ca.crt Interestingly enough when
|
Okay yeah think I've got it. It is indeed because of this. I believe dashboard can do the following to allow secure connections via proxy calls:
env:
- name: NODE_EXTRA_CA_CERTS
value: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt In the dashboard container here. For the self-signed portion. This should allow dashboard to re-enable secure connections with dsp (and other public routes using well known trusted CA's). Using |
fyi @andrewballantyne ^ |
Based on the findings above, we conclude that the issue is not on backend dspo side. Closing this issue, see suggestion above for what we suggest Dashboard can do to re-enable secure connections. We suspect elyra will need to do something similar (i.e. extend well-known bundle, and not override). We have forwarded this information to both dev teams. Please feel free to:
|
Is there an existing issue for this?
Deploy type
ODH Dashboard UI
Version
varies
Environment
varies
Current Behavior
We're encountered a few instances of people contacting us about components getting "SSL connection errors" when they try to connect to a Data Science Pipelines instance. I'm concerned that something is nebulously "not quite right" with the way the DSP apiserver is using TLS.
One report:
[RHODS-8860] Data Science Pipelines error "unable to get issuer certificate" on OSD cluster
https://issues.redhat.com/browse/RHODS-8860
For this one, QE tests passed on an OpenStack cluster, but the problem was seen on an OSD cluster, which uses a LetsEncrypt certifcate installed by the OSD infrastructure.
Another report:
"I've got a partner running into certification issues while submitting pipelines through Elyra. They're using a custom certificate in their OCP cluster, which seems to break the Data Science Pipelines integration. Has anyone encountered similar issues? Elyra requests to Data Science Pipelines fail with this error message:
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1129). Running RHODS self-managed 1.29.0 on OCP 4.11.44."
Developers for Elyra and Dashboard have worked around this problem by allowing insecure connections in the caller, but that is a sub-optimal solution -- we need to get to the bottom of this.
Expected Behavior
In clusters with trusted certificates installed (e.g. LetsEncrypt), SSL connections to DSP apiserver from Dashboard, Elyra, or a laptop should all "just work" without needing to allow insecure connections or passing a custom certificate bundle to the client code.
In clusters with self-signed certificates installed, ???
Steps To Reproduce
unknown
Workaround (if any)
No response
Anything else
No response
The text was updated successfully, but these errors were encountered: