Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Pipeline server fails when using object storage "signed by unknown authority" #362

Closed
erwangranger opened this issue Sep 15, 2023 · 11 comments · Fixed by #440
Closed
Assignees
Labels
field-priority Identified as a high priority issue by users in the field. kind/bug Something isn't working priority/high Important issue that needs to be resolved asap. Releases should not have too many of these. rhods-2.5 triage/accepted

Comments

@erwangranger
Copy link

ODH Component

Data Science Pipelines

Current Behavior

Reporting this for a customer I am working with.

The pipeline pod fails with this message:

bash-3.2$ oc logs -f ds-pipeline-pipelines-definition-6cd6859c6f-twmxm -n test-model-serving
Defaulted container "ds-pipeline-api-server" out of: ds-pipeline-api-server, oauth-proxy
I0914 20:42:34.297035       1 client_manager.go:160] Initializing client manager
I0914 20:42:34.297126       1 config.go:74] Config DBConfig.ExtraParams not specified, skipping
F0914 20:42:34.449271       1 client_manager.go:411] Failed to check if Minio bucket exists. Error: Get https://my.int.s3.corporate.int:443/zone-1-dev/?location=: x509: certificate signed by unknown authority

Expected Behavior

I should be able to use an Object Storage even if it is using a self-signed cert.

Steps To Reproduce

No response

Workaround (if any)

None found

What browsers are you seeing the problem on? (If applicable)

No response

Open Data Hub Version

RHODS 1.31

Anything else

This is happening in a disconnected environment.

@erwangranger erwangranger added the kind/bug Something isn't working label Sep 15, 2023
@erwangranger
Copy link
Author

We are looking for how to overcome this issue.

  1. Is there any way to force the pipeline server to just plain ignore the fact that this is self-signed?
  2. Can we simply add the CA bundle to the secret containing the object storage credentials? In other words, will adding the CA bundle as is described here work for the pipeline server?

@mwaykole
Copy link

hi @erwangranger Does the endpoint begin with "http"?

@gmarkley-VI
Copy link

Have you considered using Let's Encrypt as a workaround instead of relying on a self-signed certificate?

@erwangranger
Copy link
Author

erwangranger commented Sep 15, 2023

@mwaykole , I anonymized the log message, but I did not change the beginning of the endpoint.

F0914 20:42:34.449271       1 client_manager.go:411] Failed to check if Minio bucket exists. 
Error: Get https://my.int.s3.corporate.int:443/zone-1-dev/?location=: x509: certificate signed by unknown authority

@gmarkley-VI , this is all running inside of a disconnected environment, with no access to the internet, and these are the corporate signing authorities and CA bundles, as far as I know. So I don't think Let's Encrypt would help here.

@erwangranger
Copy link
Author

In my environment, I can see that the image running the DS pipeline definition is

        registry.redhat.io/rhods/odh-ml-pipelines-api-server-rhel8@sha256:280ca08c24ccebe69b20ea31df65eb5344698b0ed5e57832205f4a64c2818862

If I were able to rebuild that image and embed my certs into it, how could I change the image chosen to swap in my image instead of the default? (assuming the operator is stopped).
(I am not saying I want to do it, I'm hoping it does not come to that).

@andrewballantyne andrewballantyne added the field-priority Identified as a high priority issue by users in the field. label Sep 18, 2023
@gregsheremeta
Copy link
Contributor

We talked about this in DSP scrum today and we're not sure where to move this issue to.

We have a Secure field in the DSPA CR (ref). What's not clear to me is how exactly that field is being used by the dashboard.

I would expect that setting Secure: false and using an endpoint with http works.
I would expect that setting Secure: true and using an endpoint with https works when that endpoint is secured by a publicly trusted certificate like LetsEncrypt.

What I am unsure about is the scenario where dashboard sets Secure: true and the user provides an S3 endpoint with https but the endpoint has a self-signed certificate. I would expect that to fail. What I'm unclear about is if the solution is just for the dashboard to simply set Secure: false. Does Secure: false tell the apiserver to trust self-signed certs? If so, I'd say 1. the fix is that the dashboard should allow the user to set that via a checkbox, and 2. the field name Secure should be renamed to something more clear like AllowInsecureConnections

@shalberd
Copy link
Contributor

shalberd commented Sep 25, 2023

hi all: please ask either @harshad16 or @VaishnaviHire for details, there is a very easy way to achieve full trust, both with self-signed certificates as well as with non-publicly-trusted CAs. For example, Guillome Moutier once had this issue with PIP_CERT, and I with connecting via Python to a server having a server certificate having a non-public root and intermediate CA, in his case it was even a self-signed one. The trust also applies to connections that are set in the central cluster proxy object config as NO_PROXY. Context is just like @erwangranger explained, that context is very much common. Case in point: imagine a corporate IBM COS-Cluster that has certificates based on private PKI, meaning by definition the certs are based on company-internal trust and not publicly-trusted. This is exactly why I could not really bring myself to use Pipelines as in ODH yet.

See opendatahub-io/kubeflow#105

and also https://github.com/opendatahub-io/kubeflow/pull/43/files#diff-447c1a1ddad4e46669c4371d0d9714dad4a3368c5ccf2292b356e3b5c7441ce1

the key idea: centrally, at cluster-config-level, there are one the one hand all core OS publicly-trusted CAs as well as additionally-trusted CAs, e.g. self-signed certificates or private PKI-issued CAs, defined. The configmap in a namespace is then filled with all that content, a bunch of PEM-style CAs, by the openshift cluster network operator. And that configmap data section file can be easily mounted in. Already happens for Kubeflow Notebooks at /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem.

secure: false is a bad idea for ssl trust in my opinion, we are not talking some test on an own laptop here :-) It is alright for a minio container or so with just svc-based http communication, but not for any other use case related to S3, https, and pipelines.

@HumairAK
Copy link
Collaborator

Thanks @shalberd , great post. I think this is something we should explore as a possible solution, I'm going to move this issue to the dspo repo, and take this on in our sprint.

/transfer data-science-pipelines-operator

@openshift-ci openshift-ci bot transferred this issue from opendatahub-io/opendatahub-community Sep 26, 2023
@HumairAK HumairAK added the priority/high Important issue that needs to be resolved asap. Releases should not have too many of these. label Sep 26, 2023
@shalberd
Copy link
Contributor

shalberd commented Sep 28, 2023

minor added note: our solution in notebooks is for acessing secured Routes based on either self-signed certificates or custom PKI issued certificates only, not for .svc that are potentially encryped cluster-internally. That is ok, though. Securing services cluster-internally is a whole different ballgame. We currently do not use data science pipelines operator in our corporate deployment of ODH. However, when it comes time to test any PR docker images, do let me know and I can arrange for that. By that time, I'll also have our Openshift to private PKI-based IBM COS / S3 Cluster connectivity ready :-)

@gregsheremeta
Copy link
Contributor

here's a document that describes a workaround for this while we work on an official fix

https://docs.google.com/document/d/15Fj-l0xDXXJAraZMXkT-SDD12BuYe-KfHITWCmtl-rQ/edit?usp=sharing

@HumairAK
Copy link
Collaborator

HumairAK commented Nov 9, 2023

Note that the resolution to this has only been implemented backend side for rhods-2.5, there is still UI yet to be added for this fix, tracking issue is here. This can be used from 2.5 to simplify a work around.

Workaround:

After starting a Pipeline Server with unrecognized certs, it will fail to start (the logs currently only get surfaced in the DSPO logs)

Upload a configmap with the ca cert bundle in PEM format, for example:

Note: If minio is deployed on the same self signed ocp cluster, you can use the kube root ca retrieved via:

$ oc -n $MINIO_NS get configmap kube-root-ca.crt -o yaml | yq '.data."ca.crt"' > root-ca.crt

Create the configmap

kind: ConfigMap
apiVersion: v1
metadata:
  name: custom-ca-bundle
data:
  ca-bundle.crt: |
    # contents of ca-bundle.crt

Then patch the underlying DataSciencePipelinesApplicaitonResource like so:

DSPA_NAME=ds-pipeline-pipelines-definition
DSPA_NAMESPACE=your-ds-project

patch='{"spec":{"apiServer":{"cABundle":{"configMapKey":"ca-bundle.crt","configMapName":"custom-ca-bundle"}}}}'
oc patch dspa ${DSPA_NAME} -n ${DSPA_NAMESPACE}  --type=merge -p ${patch}

Pipeline Server should come up successfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
field-priority Identified as a high priority issue by users in the field. kind/bug Something isn't working priority/high Important issue that needs to be resolved asap. Releases should not have too many of these. rhods-2.5 triage/accepted
Projects
Status: Done
Status: No status
Development

Successfully merging a pull request may close this issue.

8 participants