Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v0.11] Changes to minimize exposure to security issues #93

Conversation

israel-hdez
Copy link

@israel-hdez israel-hdez commented Sep 29, 2023

  • cmd/agent/main.go
    • Static code analysis states that unsanitized usage of the --component-port flag can potentially lead to SSRF. In practice, this may be very hard to happen, given that the source of the flag is always an integer (taken from containerPort of a Pod). Anyways, this is switching that flag to an Integer to be on the safe side (and to prevent the warning from the static code analysis tool).
  • python/kserve

Additionally, this adds some more resiliency to openshift-ci runs.

Fixes #91

Testing instructions

A simple sanity check should be OK.

Checkout this PR:

  1. Clone the repo: git clone https://github.com/opendatahub-io/kserve.git && cd kserve
  2. Fetch the PR: git fetch origin pull/93/merge:pr-93
  3. Checkout the code: git checkout pr-93

On a clean OpenShift cluster, and given you have the code of this PR checked out:

  1. Deploy ServiceMesh. Run: test/scripts/openshift-ci/deploy.ossm.sh
  2. Deploy Serverless. Run: test/scripts/openshift-ci/deploy.serverless.sh
  3. Deploy KServe:
oc new-project kserve
oc label namespace kserve testing.kserve.io/add-to-mesh=true
kustomize build config/overlays/odh | \
  sed "s|kserve/storage-initializer:latest|quay.io/edgarhz/kserve-storage-initializer:security-updates-v011|" | \
  sed "s|kserve/agent:latest|quay.io/edgarhz/kserve-agent:security-updates-v011|" | \
  sed "s|kserve/router:latest|quay.io/edgarhz/kserve-router:security-updates-v011|" | \
  sed "s|kserve/kserve-controller:latest|quay.io/edgarhz/kserve-controller:security-updates-v011|" | \
  oc apply -f -
  1. Deploy a test model:
oc new-project kserve-test
oc label namespace kserve-test testing.kserve.io/add-to-mesh=true
sed 's/ClusterServingRuntime/ServingRuntime/' config/runtimes/kserve-mlserver.yaml | \
  sed "s|mlserver:replace|docker.io/seldonio/mlserver:1.3.2|" | \
  oc apply -f -
cat <<EOF | oc apply -f -
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "sklearn-irisv2"
  annotations:
    serving.knative.openshift.io/enablePassthrough: "true"
    sidecar.istio.io/inject: "true"
    sidecar.istio.io/rewriteAppHTTPProbers: "true"
spec:
  predictor:
    model:
      modelFormat:
        name: sklearn
      runtime: kserve-mlserver
      storageUri: "gs://seldon-models/sklearn/mms/lr_model"
EOF
  1. Check that the model replies correctly:
ENDPOINT=$(oc get ksvc sklearn-irisv2-predictor -o jsonpath='{.status.url}')
curl -kv -H "Content-Type: application/json" -d '{"inputs": [{"name": "input-0","shape": [2, 4],"datatype": "FP32","data": [[6.8, 2.8, 4.8, 1.4],[6.0, 3.4, 4.5, 1.6]]}]}' $ENDPOINT/v2/models/sklearn-irisv2/infer

If you get a reply, it should all be running OK.

Notes

If you approve this PR, please, also approve #94

* cmd/agent/main.go
  * Static code analysis states that unsanitized usage of the
    `--component-port` flag can potentially lead to SSRF. In
    practice, this may be very hard to happen, given that the source of
    the flag is always an integer (taken from `containerPort` of a Pod).
    Anyways, this is switching that flag to an Integer to be on the safe
    side (and to prevent the warning from the static code analysis tool).
* python/kserve
  * Run `poetry update`.
  * The main reason of the upgrade is [a recent fix to the MSAL
    library](AzureAD/microsoft-authentication-library-for-python@3427c25)
    to escape an unsafe string.
  * As an aside, this also moves away from
    [CVE-202-4807](https://www.cve.org/CVERecord?id=CVE-2023-4807), which is
    only applicable to Windows 64 platforms.

Additionally, this adds some more resiliency to openshift-ci runs.

Signed-off-by: Edgar Hernández <23639005+israel-hdez@users.noreply.github.com>
@openshift-ci
Copy link

openshift-ci bot commented Sep 29, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: israel-hdez

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Jooho
Copy link

Jooho commented Oct 2, 2023

Looks like this is also good for upstream. why don't you send this pr to upstream?

# of configuration changes, leading to waitpodready to fail sometimes.
# Let's sleep 2minutes to let the KNative operator to stabilize the installation before
# checking for the readiness of KNative stack.
sleep 120
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't use a more accurate way to check if the pod is ready?
something like oc wait or this kind of script
This is an example

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The waitpodready function that is used on the following lines is using oc wait.

The problem is that oc wait fails if the deployment is restarted or a new version is rolled out, while waiting. So, it is a race condition; i.e. if oc wait is invoked before a new rollout, it will be watching the older pods and it will fail because those older pods will be terminated (oc wait will complain that the older pods can no longer be found).

The sleep is to try to prevent such race condition.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, as mentioned in the comment, the sleep is to let the KNative operator to stabilize, and be done with new rollouts. It is not the goal to wait for the pods to become ready.

@heyselbi heyselbi linked an issue Oct 3, 2023 that may be closed by this pull request
4 tasks
@israel-hdez
Copy link
Author

Looks like this is also good for upstream. why don't you send this pr to upstream?

It is there: kserve#3157

@vaibhavjainwiz
Copy link

/lgtm

@israel-hdez
Copy link
Author

Ignoring CI failure for python 3.11 - it is a known issue in our fork

Merging.

@israel-hdez israel-hdez merged commit 4f93edd into opendatahub-io:release-v0.11.0 Oct 6, 2023
39 of 42 checks passed
@israel-hdez israel-hdez deleted the security-updates-v011 branch October 10, 2023 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Follow-up: Address remaining "High" vulnerabilities in KServe repo from SNYK scans
3 participants