Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service creation - intermittent 400 bad request error #2385

Closed
riqueps opened this issue Jun 16, 2023 · 14 comments · Fixed by #2424 or #2474
Closed

Service creation - intermittent 400 bad request error #2385

riqueps opened this issue Jun 16, 2023 · 14 comments · Fixed by #2424 or #2474
Assignees
Labels
kind/bug Something isn't working
Milestone

Comments

@riqueps
Copy link

riqueps commented Jun 16, 2023

Describe the bug
After upgrade epinio to 1.8.0 we started experiencing some intermittent error during the creation of a custom service (epinio service create...), and to get it working again epinio-server deployment must be restarted.

When this occurs I'm still able to install the same service using helm directly, helm install SERVICE_NAME oci://REPO_URL --version 0.1.0.

I didn't experience this error for HTTPS based helm repo

Error message for CLI and Server:

❌ error creating service: service create failed: Internal Server Error: error deploying service helm chart: pulling from host public.ecr.aws failed with status code [manifests 0.3.0]: 400 Bad Request

No error log on helm-controller

Workaround

Downgrade epinio to 1.7.1 or restart epinio-server deployment

Service details:

  • Custom service
  • Helm chart stored on public ECR
  • Protocol: OCI
  • Service manifest:
apiVersion: application.epinio.io/v1
kind: Service
metadata:
  annotations:
    meta.helm.sh/release-name: epinio
    meta.helm.sh/release-namespace: epinio
  generation: 2
  name: SERVICE_NAME
  namespace: epinio
spec:
  appVersion: 1.5.1
  chart: oci://REPO_URL
  chartVersion: 0.1.0
  description: |
    This service is going to deploy a simple service.
  name: SERVICE_NAME
  serviceIcon: https://static-00.iconduck.com/assets.00/storage-amazons3-bucket-icon-495x512-rd519ixy.png
  shortDescription: A service that can be used during development
  values: |-
    # This is a YAML-formatted file.
    # Declare variables to be passed into your templates.
   
    region: xxxxx
    image: xxxxx

Env details

  • Epinio server and cli: 1.8.0 and 1.8.1
  • Kubernetes: Aws Eks 1.22

To Reproduce

  1. Install Epinio 1.8.0 or 1.8.1
  2. Create a custom service and store it's helm-chart on Aws Ecr, Gcp Artifact Regitry or another OCI based repo
  3. Add it to epinio catalog
  4. Do some tries creating/deleting the service
  5. Wait for around 3 hours, then try to create the service again
  6. See the error

Logs with trace-level 255

~$ epinio service create epinio-debug henrique-test4 --trace-level 255
🚢 Creating Service...
Catalog: epinio-debug
Service: henrique-test4
Wait For Completion: false
2023/06/13 14:21:34 EpinioApiClient: "level"=4 "msg"="POST https://epinio.DOMAIN.COM/api/v1/namespaces/workspace/services"
2023/06/13 14:21:35 EpinioApiClient: "level"=5 "msg"="request finished" "method"="POST" "uri"="https://epinio.DOMAIN.COM/api/v1/namespaces/workspace/services" "body"="{"catalog_service":"epinio-debug","name":"henrique-test4"}"

2023/06/13 14:21:35 EpinioApiClient: "level"=5 "msg"="response received" "status"=500 "header"={"Epinio-Version":["v1.8.0"],"X-Content-Type-Options":["nosniff"],"Strict-Transport-Security":["max-age=15724800; includeSubDomains"],"Date":["Tue, 13 Jun 2023 17:21:35 GMT"],"Content-Type":["application/json; charset=utf-8"],"Connection":["keep-alive"]} "TLSServerName"="epinio.DOMAIN.COM" "body"="{"errors":[{"status":500,"title":"error deploying service helm chart: pulling from host public.ecr.aws failed with status code [manifests 0.3.0]: 400 Bad Request","details":"\nServer Backtrace: pulling from host public.ecr.aws failed with status code [manifests 0.3.0]: 400 Bad Request\nerror deploying service helm chart\ngithub.com/epinio/epinio/internal/services.(*ServiceClient).Create\n\t/home/sles/actions-runner/_work/epinio/epinio/internal/services/instances.go:179\ngithub.com/epinio/epinio/internal/api/v1/service.Controller.Create\n\t/home/sles/actions-runner/_work/epinio/epinio/internal/api/v1/service/create.go:65\ngithub.com/epinio/epinio/internal/api/v1.errorHandler.func1\n\t/home/sles/actions-runner/_work/epinio/epinio/internal/api/v1/router.go:52\ngithub.com/gin-gonic/gin.(*Context).Next\n\t/home/sles/go/pkg/mod/github.com/gin-gonic/gin@v1.8.2/context.go:173\ngithub.com/alron/ginlogr.RecoveryWithLogr.func1\n\t/home/sles/go/pkg/mod/github.com/alron/ginlogr@v0.0.4/logr.go:121\ngithub.com/gin-gonic/gin.(*Context).Next\n\t/home/sles/go/pkg/mod/github.com/gin-gonic/gin@v1.8.2/context.go:173\ngithub.com/alron/ginlogr.Ginlogr.func1\n\t/home/sles/go/pkg/mod/github.com/alron/ginlogr@v0.0.4/logr.go:34\ngithub.com/gin-gonic/gin.(*Context).Next\n\t/home/sles/go/pkg/mod/github.com/gin-gonic/gin@v1.8.2/context.go:173\ngithub.com/gin-gonic/gin.CustomRecoveryWithWriter.func1\n\t/home/sles/go/pkg/mod/github.com/gin-gonic/gin@v1.8.2/recovery.go:101\ngithub.com/gin-gonic/gin.(*Context).Next\n\t/home/sles/go/pkg/mod/github.com/gin-gonic/gin@v1.8.2/context.go:173\ngithub.com/gin-gonic/gin.(*Engine).handleHTTPRequest\n\t/home/sles/go/pkg/mod/github.com/gin-gonic/gin@v1.8.2/gin.go:616\ngithub.com/gin-gonic/gin.(*Engine).ServeHTTP\n\t/home/sles/go/pkg/mod/github.com/gin-gonic/gin@v1.8.2/gin.go:572\nnet/http.serverHandler.ServeHTTP\n\t/home/sles/actions-runner/_work/_tool/go/1.19.8/x64/src/net/http/server.go:2947\nnet/http.(*conn).serve\n\t/home/sles/actions-runner/_work/_tool/go/1.19.8/x64/src/net/http/server.go:1991\nruntime.goexit\n\t/home/sles/actions-runner/_work/_tool/go/1.19.8/x64/src/runtime/asm_amd64.s:1594"}]}"
2023/06/13 14:21:35 EpinioApiClient: "level"=5 "msg"="response is not StatusOK: Internal Server Error: error deploying service helm chart: pulling from host public.ecr.aws failed with status code [manifests 0.3.0]: 400 Bad Request" "status"=500 "header"={"Connection":["keep-alive"],"Epinio-Version":["v1.8.0"],"X-Content-Type-Options":["nosniff"],"Strict-Transport-Security":["max-age=15724800; includeSubDomains"],"Date":["Tue, 13 Jun 2023 17:21:35 GMT"],"Content-Type":["application/json; charset=utf-8"]} "TLSServerName"="epinio.DOMAIN.COM" "body"="{"errors":[{"status":500,"title":"error deploying service helm chart: pulling from host public.ecr.aws failed with status code [manifests 0.3.0]: 400 Bad Request","details":"\nServer Backtrace: pulling from host public.ecr.aws failed with status code [manifests 0.3.0]: 400 Bad Request\nerror deploying service helm chart\ngithub.com/epinio/epinio/internal/services.(*ServiceClient).Create\n\t/home/sles/actions-runner/_work/epinio/epinio/internal/services/instances.go:179\ngithub.com/epinio/epinio/internal/api/v1/service.Controller.Create\n\t/home/sles/actions-runner/_work/epinio/epinio/internal/api/v1/service/create.go:65\ngithub.com/epinio/epinio/internal/api/v1.errorHandler.func1\n\t/home/sles/actions-runner/_work/epinio/epinio/internal/api/v1/router.go:52\ngithub.com/gin-gonic/gin.(*Context).Next\n\t/home/sles/go/pkg/mod/github.com/gin-gonic/gin@v1.8.2/context.go:173\ngithub.com/alron/ginlogr.RecoveryWithLogr.func1\n\t/home/sles/go/pkg/mod/github.com/alron/ginlogr@v0.0.4/logr.go:121\ngithub.com/gin-gonic/gin.(*Context).Next\n\t/home/sles/go/pkg/mod/github.com/gin-gonic/gin@v1.8.2/context.go:173\ngithub.com/alron/ginlogr.Ginlogr.func1\n\t/home/sles/go/pkg/mod/github.com/alron/ginlogr@v0.0.4/logr.go:34\ngithub.com/gin-gonic/gin.(*Context).Next\n\t/home/sles/go/pkg/mod/github.com/gin-gonic/gin@v1.8.2/context.go:173\ngithub.com/gin-gonic/gin.CustomRecoveryWithWriter.func1\n\t/home/sles/go/pkg/mod/github.com/gin-gonic/gin@v1.8.2/recovery.go:101\ngithub.com/gin-gonic/gin.(*Context).Next\n\t/home/sles/go/pkg/mod/github.com/gin-gonic/gin@v1.8.2/context.go:173\ngithub.com/gin-gonic/gin.(*Engine).handleHTTPRequest\n\t/home/sles/go/pkg/mod/github.com/gin-gonic/gin@v1.8.2/gin.go:616\ngithub.com/gin-gonic/gin.(*Engine).ServeHTTP\n\t/home/sles/go/pkg/mod/github.com/gin-gonic/gin@v1.8.2/gin.go:572\nnet/http.serverHandler.ServeHTTP\n\t/home/sles/actions-runner/_work/_tool/go/1.19.8/x64/src/net/http/server.go:2947\nnet/http.(*conn).serve\n\t/home/sles/actions-runner/_work/_tool/go/1.19.8/x64/src/net/http/server.go:1991\nruntime.goexit\n\t/home/sles/actions-runner/_work/_tool/go/1.19.8/x64/src/runtime/asm_amd64.s:1594"}]}" "body"="{"errors":[{"status":500,"title":"error deploying service helm chart: pulling from host public.ecr.aws failed with status code [manifests 0.3.0]: 400 Bad Request","details":"\nServer Backtrace: pulling from host public.ecr.aws failed with status code [manifests 0.3.0]: 400 Bad Request\nerror deploying service helm chart\ngithub.com/epinio/epinio/internal/services.(*ServiceClient).Create\n\t/home/sles/actions-runner/_work/epinio/epinio/internal/services/instances.go:179\ngithub.com/epinio/epinio/internal/api/v1/service.Controller.Create\n\t/home/sles/actions-runner/_work/epinio/epinio/internal/api/v1/service/create.go:65\ngithub.com/epinio/epinio/internal/api/v1.errorHandler.func1\n\t/home/sles/actions-runner/_work/epinio/epinio/internal/api/v1/router.go:52\ngithub.com/gin-gonic/gin.(*Context).Next\n\t/home/sles/go/pkg/mod/github.com/gin-gonic/gin@v1.8.2/context.go:173\ngithub.com/alron/ginlogr.RecoveryWithLogr.func1\n\t/home/sles/go/pkg/mod/github.com/alron/ginlogr@v0.0.4/logr.go:121\ngithub.com/gin-gonic/gin.(*Context).Next\n\t/home/sles/go/pkg/mod/github.com/gin-gonic/gin@v1.8.2/context.go:173\ngithub.com/alron/ginlogr.Ginlogr.func1\n\t/home/sles/go/pkg/mod/github.com/alron/ginlogr@v0.0.4/logr.go:34\ngithub.com/gin-gonic/gin.(*Context).Next\n\t/home/sles/go/pkg/mod/github.com/gin-gonic/gin@v1.8.2/context.go:173\ngithub.com/gin-gonic/gin.CustomRecoveryWithWriter.func1\n\t/home/sles/go/pkg/mod/github.com/gin-gonic/gin@v1.8.2/recovery.go:101\ngithub.com/gin-gonic/gin.(*Context).Next\n\t/home/sles/go/pkg/mod/github.com/gin-gonic/gin@v1.8.2/context.go:173\ngithub.com/gin-gonic/gin.(*Engine).handleHTTPRequest\n\t/home/sles/go/pkg/mod/github.com/gin-gonic/gin@v1.8.2/gin.go:616\ngithub.com/gin-gonic/gin.(*Engine).ServeHTTP\n\t/home/sles/go/pkg/mod/github.com/gin-gonic/gin@v1.8.2/gin.go:572\nnet/http.serverHandler.ServeHTTP\n\t/home/sles/actions-runner/_work/_tool/go/1.19.8/x64/src/net/http/server.go:2947\nnet/http.(*conn).serve\n\t/home/sles/actions-runner/_work/_tool/go/1.19.8/x64/src/net/http/server.go:1991\nruntime.goexit\n\t/home/sles/actions-runner/_work/_tool/go/1.19.8/x64/src/runtime/asm_amd64.s:1594"}]}"
2023/06/13 14:21:35 EpinioClient/ServiceCreate: "level"=4 "msg"="return"

❌ error creating service: service create failed: Internal Server Error: error deploying service helm chart: pulling from host public.ecr.aws failed with status code [manifests 0.3.0]: 400 Bad Request

@riqueps riqueps added the kind/bug Something isn't working label Jun 16, 2023
@enrichman enrichman added this to the v1.9.0 milestone Jun 19, 2023
@riqueps
Copy link
Author

riqueps commented Jun 20, 2023

As a workaround, we are migrating our custom helm-charts to S3, in order to have https instead of oci protocol.
After doing that we didn't experience the error (400 bad request) on epinio 1.8.0 anymore.

@andreas-kupries
Copy link
Contributor

Based on description and the latest comment I suspect that the OCI protocol is the issue, i.e. not supported.
Unsure if the missing support is in the helm packages (*), or in the mittwald client wrapped around them.

(*) The mittwald client uses the official helm packages. Currently at helm.sh/helm/v3 v3.12.0.

@andreas-kupries andreas-kupries self-assigned this Jun 30, 2023
@andreas-kupries
Copy link
Contributor

Hm ... Helm claims to be able to support oci registries since client 3.8.0, per https://helm.sh/blog/storing-charts-in-oci/
That actually tracks with helm install SERVICE_NAME oci://REPO_URL --version 0.1.0 said to be working, in the ticket description.

Assuming that helm package versions track client versions, the helm packages we use should support OCI.

Time to see if I can install a local OCI registry I could then use in attempts at reproducing the issue.

@andreas-kupries
Copy link
Contributor

andreas-kupries commented Jun 30, 2023

According to https://helm.sh/docs/topics/registries/ there are a lot of other OCI registries, beyond Amazon ECR, which was used here, I believe. docker hub might be easiest to trial with. (docs: https://docs.docker.com/docker-hub/oci-artifacts/)

@andreas-kupries
Copy link
Contributor

Running into issues in the intersection of helm, docker ids, and the pass password store apparently used by helm to store login credentials.

Starting to look at https://github.com/project-zot/zot for a local setup.

@andreas-kupries
Copy link
Contributor

andreas-kupries commented Jul 3, 2023

Switching back to docker hub after getting some ideas during the weekend ...
... Managed to use it ...

  1. Use gpg2 --quick-add-uid ID DOCKER-ID to add DOCKER-ID to the key ring. And ID points to my personal gnupg entry (substring search, however in hindsight the HEXID would have likely worked too).
  2. Use pass init HEXID where HEXID is the last 8 hex digits of the fingerprint for ID and DOCKER-ID from above.
  3. Run helm registry login registry-1.docker.io -u DOCKER-ID -p DOCKER-PASS
  4. We now have a working password store linked to the personal gnupg entry, linked through the name of the docker account, and it contains the encrypted docker login credentials for helm to use.
  5. helm push postgresql-12.1.6.tgz oci://registry-1.docker.io/DOCKER-ID/ places the postgres 12.1.6 chart into the registry, using the account's namespace.
  6. And looking at https://hub.docker.com/repository/docker/DOCKER-ID/postgresql/general shows the chart information, including tags.
  7. helm fetch oci://registry-1.docker.io/DOCKER-ID/postgresql works too, pulling the tarball, and checksums match.

So, have a OCI registry now, working with the helm client.

Now to see if the issue can be reproduced.

@andreas-kupries
Copy link
Contributor

andreas-kupries commented Jul 3, 2023

Using the service specification (added to the E chart to be deployed with it, instead of apply -f later)

> cat helm-charts/chart/epinio/templates/oci-service.yaml
---
apiVersion: application.epinio.io/v1
kind: Service
metadata:
  name: post-oci
  namespace: {{ .Release.Namespace }}
spec:
  name: post-oci
  shortDescription: A PostgreSQL service that can be used during development
  description: |
    Bitnami PostgreSQL using the chart stored in the personal docker OCI registry
  chart: oci://registry-1.docker.io/(DOCKER-ID)/postgresql
  chartVersion: 12.1.6
  serviceIcon: https://bitnami.com/assets/stacks/postgresql/img/postgresql-stack-220x234.png
  appVersion: 15.1.0
  values: |-
    {{- template "epinio.catalog-service-values" . }}

The command epinio service catalog post-oci shows

🚢  Show service details
Service: post-oci

✔️  Epinio Service:
|        KEY        |                   VALUE                    |
|-------------------|--------------------------------------------|
| Name              | post-oci                                   |
| Created           | 2023-07-03 11:46:19 +0200 CEST             |
| Version           | 15.1.0                                     |
| Short Description | A PostgreSQL service that can              |
|                   | be used during development                 |
| Description       | Bitnami PostgreSQL using the               |
|                   | chart stored in the personal               |
|                   | docker OCI registry                        |
| Helm Repository   |                                            |
| Helm Chart        | oci://registry-1.docker.io/a99k/postgresql |

⚠️  No settings

And creating a service instance from that works:

> ep service create post-oci pox --wait 

🚢  Creating Service...
Catalog: post-oci
Service: pox
Wait For Completion: true
>

This is also visible in the server logs (slightly modified to make the embedded json readable)

2023/07/03 10:52:48 EpinioServer: "level"=1 "msg"="service helm setup" "requestId"="325fa1f7-f00d-44f0-b72b-a81ea7a8b98c" "parameters"={
"name":"pox",
"namespace":"workspace",
"createdAt":"0001-01-01 00:00:00 +0000 UTC",
"Cluster":{"Kubectl":{"DiscoveryClient":{"LegacyPrefix":"/api","UseLegacyDiscovery":false}},
"RestConfig":"&rest.Config{Host:\"https://10.43.0.1:443\", APIPath:\"/apis\", ContentConfig:rest.ContentConfig{AcceptContentTypes:\"application/json\", ContentType:\"application/json\", GroupVersion:(*schema.GroupVersion)(0xc0015235a0), NegotiatedSerializer:(*serializer.negotiatedSerializerWrapper)(0xc001fcb730)}, Username:\"\", Password:\"\", BearerToken:\"--- REDACTED ---\", BearerTokenFile:\"/var/run/secrets/kubernetes.io/serviceaccount/token\", Impersonate:rest.ImpersonationConfig{UserName:\"\", UID:\"\", Groups:[]string(nil), Extra:map[string][]string(nil)}, AuthProvider:<nil>, AuthConfigPersister:rest.AuthProviderConfigPersister(nil), ExecProvider:<nil>, TLSClientConfig:rest.sanitizedTLSClientConfig{Insecure:false, ServerName:\"\", CertFile:\"\", KeyFile:\"\", CAFile:\"/var/run/secrets/kubernetes.io/serviceaccount/ca.crt\", CertData:[]uint8(nil), KeyData:[]uint8(nil), CAData:[]uint8(nil), NextProtos:[]string(nil)}, UserAgent:\"\", DisableCompression:false, Transport:http.RoundTripper(nil), WrapTransport:(transport.WrapperFunc)(nil), QPS:50, Burst:100, RateLimiter:flowcontrol.RateLimiter(nil), WarningHandler:rest.WarningHandler(nil), Timeout:0, Dial:(func(context.Context, string, string) (net.Conn, error))(nil), Proxy:(func(*http.Request) (*url.URL, error))(nil)}"},
"Chart":"oci://registry-1.docker.io/a99k/postgresql",
"Version":"12.1.6",
"Repository":"",
"Values":"epinio:\n  catalogServiceName: post-oci\n  serviceName: pox\nextraDeploy:\n- |\n  # Create a service account, role and binding to allow to list, get and\n  # delete PVCs. It should be used by the job below.\n\n  # To ensure the resources are deleted, use this annotation:\n  #\n  # annotations:\n  #  \"helm.sh/hook-delete-policy\": before-hook-creation,hook-succeeded\n\n  # https://helm.sh/docs/topics/charts_hooks/#hook-resources-are-not-managed-with-corresponding-releases\n  # https://helm.sh/docs/topics/charts_hooks/#hook-deletion-policies\n\n  ---\n  apiVersion: v1\n  kind: ServiceAccount\n  metadata:\n    name: \"pvc-deleter-{{ .Release.Name }}\"\n    namespace: {{ .Release.Namespace }}\n    annotations:\n      \"helm.sh/hook-delete-policy\": before-hook-creation,hook-succeeded\n      \"helm.sh/hook\": post-delete\n      \"helm.sh/hook-weight\": \"-6\"\n\n  ---\n  apiVersion: {{ include \"common.capabilities.rbac.apiVersion\" . }}\n  kind: Role\n  metadata:\n    name: \"pvc-deleter-{{ .Release.Name }}\"\n    namespace: {{ .Release.Namespace }}\n    annotations:\n      \"helm.sh/hook-delete-policy\": before-hook-creation,hook-succeeded\n      \"helm.sh/hook\": post-delete\n      \"helm.sh/hook-weight\": \"-6\"\n  rules:\n    - apiGroups:\n        - \"\"\n      resources:\n        - persistentvolumeclaims\n      verbs:\n        - get\n        - delete\n        - list\n\n  ---\n  kind: RoleBinding\n  apiVersion: {{ include \"common.capabilities.rbac.apiVersion\" . }}\n  metadata:\n    name: \"pvc-deleter-{{ .Release.Name }}\"\n    namespace: {{ .Release.Namespace }}\n    annotations:\n      \"helm.sh/hook-delete-policy\": before-hook-creation,hook-succeeded\n      \"helm.sh/hook\": post-delete\n      \"helm.sh/hook-weight\": \"-5\"\n  subjects:\n    - kind: ServiceAccount\n      name: \"pvc-deleter-{{ .Release.Name }}\"\n  roleRef:\n    apiGroup: rbac.authorization.k8s.io\n    kind: Role\n    name: \"pvc-deleter-{{ .Release.Name }}\"\n\n  ---\n  apiVersion: batch/v1\n  kind: Job\n  metadata:\n    name: \"pvc-deleter-{{ .Release.Name }}\"\n    labels:\n      app.kubernetes.io/managed-by: {{ .Release.Service | quote }}\n      app.kubernetes.io/instance: {{ .Release.Name | quote }}\n      app.kubernetes.io/version: {{ .Chart.AppVersion }}\n      helm.sh/chart: \"{{ .Chart.Name }}-{{ .Chart.Version }}\"\n    annotations:\n      # This is what defines this resource as a hook. Without this line, the\n      # job is considered part of the release.\n      \"helm.sh/hook\": post-delete\n      \"helm.sh/hook-weight\": \"-4\"\n      \"helm.sh/hook-delete-policy\": hook-succeeded\n  spec:\n    template:\n      metadata:\n        name: \"pvc-deleter-{{ .Release.Name }}\"\n        labels:\n          app.kubernetes.io/managed-by: {{ .Release.Service | quote }}\n          app.kubernetes.io/instance: {{ .Release.Name | quote }}\n          helm.sh/chart: \"{{ .Chart.Name }}-{{ .Chart.Version }}\"\n      spec:\n        restartPolicy: Never\n        serviceAccountName: \"pvc-deleter-{{ .Release.Name }}\"\n        containers:\n        - name: post-install-job\n          image: \"rancher/kubectl:v1.22.6\"\n          command: [\"kubectl\", \"delete\", \"pvc\", \"-n\", \"{{ .Release.Namespace }}\", \"-l\", \"app.kubernetes.io/instance={{ .Release.Name }}\"]\n",
"Wait":true}
2023/07/03 10:52:49 EpinioServer: "level"=1 "msg"="helm" "requestId"="829bf7f6-51da-4250-bcba-c2b3722000a8" "report"="creating 4 resource(s)"
2023/07/03 10:52:49 EpinioServer: "level"=1 "msg"="helm" "requestId"="829bf7f6-51da-4250-bcba-c2b3722000a8" "report"="beginning wait for 4 resources with timeout of 3m0s"
2023/07/03 10:52:49 EpinioServer: "level"=1 "msg"="helm" "requestId"="829bf7f6-51da-4250-bcba-c2b3722000a8" "report"="StatefulSet is not ready: workspace/x21d2445dbeda2447f41ed00f50fd-postgresql. update has not yet been observed"
2023/07/03 10:52:51 EpinioServer: "level"=1 "msg"="helm" "requestId"="829bf7f6-51da-4250-bcba-c2b3722000a8" "report"="StatefulSet is not ready: workspace/x21d2445dbeda2447f41ed00f50fd-postgresql. 0 out of 1 expected pods are ready"
2023/07/03 10:52:53 EpinioServer: "level"=1 "msg"="helm" "requestId"="829bf7f6-51da-4250-bcba-c2b3722000a8" "report"="StatefulSet is not ready: workspace/x21d2445dbeda2447f41ed00f50fd-postgresql. 0 out of 1 expected pods are ready"
2023/07/03 10:52:55 EpinioServer: "level"=1 "msg"="helm" "requestId"="829bf7f6-51da-4250-bcba-c2b3722000a8" "report"="StatefulSet is not ready: workspace/x21d2445dbeda2447f41ed00f50fd-postgresql. 0 out of 1 expected pods are ready"
2023/07/03 10:52:57 EpinioServer: "level"=1 "msg"="helm" "requestId"="829bf7f6-51da-4250-bcba-c2b3722000a8" "report"="StatefulSet is not ready: workspace/x21d2445dbeda2447f41ed00f50fd-postgresql. 0 out of 1 expected pods are ready"
2023/07/03 10:52:59 EpinioServer: "level"=1 "msg"="helm" "requestId"="829bf7f6-51da-4250-bcba-c2b3722000a8" "report"="StatefulSet is not ready: workspace/x21d2445dbeda2447f41ed00f50fd-postgresql. 0 out of 1 expected pods are ready"
2023/07/03 10:53:01 EpinioServer: "level"=1 "msg"="helm" "requestId"="829bf7f6-51da-4250-bcba-c2b3722000a8" "report"="StatefulSet is not ready: workspace/x21d2445dbeda2447f41ed00f50fd-postgresql. 0 out of 1 expected pods are ready"
2023/07/03 10:53:03 EpinioServer: "level"=1 "msg"="helm" "requestId"="829bf7f6-51da-4250-bcba-c2b3722000a8" "report"="StatefulSet is not ready: workspace/x21d2445dbeda2447f41ed00f50fd-postgresql. 0 out of 1 expected pods are ready"
2023/07/03 10:53:05 EpinioServer: "level"=1 "msg"="helm" "requestId"="829bf7f6-51da-4250-bcba-c2b3722000a8" "report"="StatefulSet is ready: workspace/x21d2445dbeda2447f41ed00f50fd-postgresql. 1 out of 1 expected pods are ready"
2023/07/03 10:53:05 EpinioServer: "level"=1 "msg"="helm" "requestId"="829bf7f6-51da-4250-bcba-c2b3722000a8" "report"="release installed successfully: x21d2445dbeda2447f41ed00f50fd/postgresql-12.1.6"
2023/07/03 10:53:05 EpinioServer: "level"=1 "msg"="OK" "requestId"="325fa1f7-f00d-44f0-b72b-a81ea7a8b98c" "origin"="/api/v1/namespaces/workspace/services" "returning"={"status":"ok"}
2023/07/03 10:53:05 EpinioServer: "level"=1 "msg"="/api/v1/namespaces/workspace/services" "status"=200 "method"="POST" "path"="/api/v1/namespaces/workspace/services" "query"="" "ip"="10.42.0.1" "user-agent"="Go-http-client/1.1" "time"="2023-07-03T10:53:05Z" "latency"="17.301910783s"

@andreas-kupries
Copy link
Contributor

@riqueps Given that the above basic trial with an OCI registry worked ... What are differences between mine and your setup ?

  1. Dockerhub OCI registry here, vs AWS ECR OCI yours
  2. Slight variation of the bitnami definition vs a full-blown custom service on your side.
  3. Epinio 1.8.1 is common.

So, questions:

  1. Does your custom service work when using dockerhub as the OCI registry instead of AWS ?
  2. Does the postgres example work when pulling it from the AWS ECR OCI ?

Depending on the answers suspicion will fall on either AWS or the custom service.

Table of data points:

AWS Docker
custom 💥 ?(1)
postgres ?(2) ✔️

Side note: My DOCKER-ID is a99k. That should make it simpler to helm fetch the postgres chart I used, for pushing it to AWS.

@andreas-kupries
Copy link
Contributor

@enrichman reminded me that the issue is something intermittent, i.e. appears mainly when deploying multiple times ...

Trying here ...

  1. (57 min after initial deployment) Deleted the instances
  2. Redeployed Fails with an auth error
  3. Trying again .. Ok (__Note: server was not restarted).
  4. Delete and redeploy ... ok again

(auth error: ❌ error creating service: service create failed: Internal Server Error: error deploying service helm chart: installing or upgrading service SYNC: pull access denied, repository does not exist or may require authorization: server message: invalid_token: authorization failed)

I now suspect that talking to the OCI registry involves a hidden token, created on demand, and it expires, causing a subsequent access to fail. However it also seems to be cleared and the next attempt gets a new token and works.

Will wait another hour to see if the auth error happens again.

@andreas-kupries
Copy link
Contributor

Confirmed, after an hour wait the first attempt fails, and the next attempt after works. Without restarting the server between the two.

work@tagetarl:~/SUSE/dev/epinio> time ep service create post-oci pop --wait 

🚢  Creating Service...
Catalog: post-oci
Service: pop
Wait For Completion: true

❌  error creating service: service create failed: Internal Server Error: error deploying service helm chart: installing or upgrading service SYNC: pull access denied, repository does not exist or may require authorization: server message: invalid_token: authorization failed

real	0m1.229s
user	0m0.277s
sys	0m0.017s
work@tagetarl:~/SUSE/dev/epinio> time ep service create post-oci pop --wait 

🚢  Creating Service...
Catalog: post-oci
Service: pop
Wait For Completion: true

real	0m17.673s
user	0m0.142s
sys	0m0.022s

@riqueps And for AWS the second attempt will also fail ?

Note: I am ware that AWS apparently reports Bad request and not auth error. I still suspect that AWS may have an auth error underneath also, just reporting it differently.

@andreas-kupries
Copy link
Contributor

My error contains installing or upgrading service SYNC which precisely locates the problematic command.
The reported error does not contain this string.
Checking the sources that should not happen. All parts of the function wrap errors into a string signaling the location.
However, report is for 1.8.[01].
Going through the history the #2364 makes changes to the code in that area, and added the installing or upgrading ... strings, they were missing before. This is past v1.8.1. And the only place where error wrapping did not happen before.

This means that the original issue happens for the same command as my issue, just different underlying messages.

Locations are

_, err = client.InstallOrUpgradeChart(ctx, &chartSpec, nil)

and
if _, err = client.InstallOrUpgradeChart(context.Background(), &chartSpec, nil); err != nil {

@thehejik
Copy link
Contributor

thehejik commented Jul 19, 2023

Reopening as I still could reproduce this issue on 1.9.0-rc4 (also on 1.8.1)

Reproducer:

  • deploy E over helm with --version 1.9.0-rc4
  • modify the chart value for existing postgresql-dev service to use an OCI registry and delete .spec.helmRepo key
$ kubectl edit services.application.epinio.io  -n epinio postgresql-dev
...
spec:
  ...
  chart: oci://registry-1.docker.io/bitnamicharts/postgresql
...
# delete spec.helmRepo keys
  • verify it set correctly, check Helm Chart value
epinio service catalog postgresql-dev

🚢  Show service details
Service: postgresql-dev

✔️  Epinio Service:
|        KEY        |                               VALUE                                |
|-------------------|--------------------------------------------------------------------|
| Name              | postgresql-dev                                                     |
| Created           | 2023-07-19 10:37:15 +0000 UTC                                      |
| Version           | 15.1.0                                                             |
| Short Description | A PostgreSQL service that can                                      |
|                   | be used during development                                         |
| Description       | This service is going to deploy a simple default                   |
|                   | Bitnami PostreSQL db instance. You can find more info at           |
|                   | https://github.com/bitnami/charts/tree/master/bitnami/postgresql/. |
|                   | This database is running inside the cluster so it's probably not   |
|                   | a good choice for production environments, at least with this      |
|                   | default configuration.                                             |
| Helm Repository   |                                                                    |
| Helm Chart        | oci://registry-1.docker.io/bitnamicharts/postgresql                |

⚠️  No settings
  • deploy the service:
$ epinio service create postgresql-dev psql-oci1 --wait --trace-level 255 --verbosity 1
> service deployed successfully
  • wait 10 minutes and try to deploy another service using the same catalog:
epinio service create postgresql-dev psql-oci2 --wait --trace-level 255 --verbosity 1
2023/07/19 11:03:29 Settings-0xc0005b80f0: "level"=4 "msg"="Loading" "from"="/home/sles/.config/epinio/settings.yaml"
2023/07/19 11:03:29 Settings-0xc0005b80f0: "level"=4 "msg"="Loaded" "value"="namespace=(workspace), user=(admin), pass=(password), access_token=({   0001-01-01 00:00:00 +0000 UTC}), api=(https://epinio.10.100.103.16.nip.io), wss=(wss://epinio.10.100.103.16.nip.io), color=(true), appchart=(), @(/home/sles/.config/epinio/settings.yaml)"
2023/07/19 11:03:29 EpinioClient/NewEpinioClient: "level"=4 "msg"="Ingress API" "url"="https://epinio.10.100.103.16.nip.io"
2023/07/19 11:03:29 EpinioClient/NewEpinioClient: "level"=4 "msg"="Settings API" "url"="https://epinio.10.100.103.16.nip.io"
2023/07/19 11:03:29 EpinioClient/ServiceCreate: "level"=4 "msg"="start"

🚢  Creating Service...
Catalog: postgresql-dev
Service: psql-oci2
Wait For Completion: true
2023/07/19 11:03:29 EpinioApiClient: "level"=5 "msg"="executing request" "method"="POST" "uri"="" "body"="{\"catalog_service\":\"postgresql-dev\",\"name\":\"psql-oci2\",\"wait\":true}" "header"={"Authorization":["Basic YWRtaW46cGFzc3dvcmQ="]}
2023/07/19 11:03:29 EpinioApiClient: "level"=5 "msg"="response is not StatusOK: error deploying service helm chart: installing or upgrading service SYNC: pull access denied, repository does not exist or may require authorization: server message: invalid_token: authorization failed" "body"="{\"errors\":[{\"status\":500,\"title\":\"error deploying service helm chart: installing or upgrading service SYNC: pull access denied, repository does not exist or may require authorization: server message: invalid_token: authorization failed\",\"details\":\"\\nServer Backtrace: pull access denied, repository does not exist or may require authorization: server message: invalid_token: authorization failed\\ninstalling or upgrading service SYNC\\ngithub.com/epinio/epinio/internal/helm.DeployService\\n\\t/home/runner/work/epinio/epinio/internal/helm/helm.go:178\\ngithub.com/epinio/epinio/internal/services.(*ServiceClient).Create\\n\\t/home/runner/work/epinio/epinio/internal/services/instances.go:200\\ngithub.com/epinio/epinio/internal/api/v1/service.Create\\n\\t/home/runner/work/epinio/epinio/internal/api/v1/service/create.go:82\\ngithub.com/epinio/epinio/internal/api/v1.errorHandler.func1\\n\\t/home/runner/work/epinio/epinio/internal/api/v1/router.go:52\\ngithub.com/gin-gonic/gin.(*Context).Next\\n\\t/home/runner/go/pkg/mod/github.com/gin-gonic/gin@v1.8.2/context.go:173\\ngithub.com/alron/ginlogr.RecoveryWithLogr.func1\\n\\t/home/runner/go/pkg/mod/github.com/alron/ginlogr@v0.0.4/logr.go:121\\ngithub.com/gin-gonic/gin.(*Context).Next\\n\\t/home/runner/go/pkg/mod/github.com/gin-gonic/gin@v1.8.2/context.go:173\\ngithub.com/alron/ginlogr.Ginlogr.func1\\n\\t/home/runner/go/pkg/mod/github.com/alron/ginlogr@v0.0.4/logr.go:34\\ngithub.com/gin-gonic/gin.(*Context).Next\\n\\t/home/runner/go/pkg/mod/github.com/gin-gonic/gin@v1.8.2/context.go:173\\ngithub.com/gin-gonic/gin.CustomRecoveryWithWriter.func1\\n\\t/home/runner/go/pkg/mod/github.com/gin-gonic/gin@v1.8.2/recovery.go:101\\ngithub.com/gin-gonic/gin.(*Context).Next\\n\\t/home/runner/go/pkg/mod/github.com/gin-gonic/gin@v1.8.2/context.go:173\\ngithub.com/gin-gonic/gin.(*Engine).handleHTTPRequest\\n\\t/home/runner/go/pkg/mod/github.com/gin-gonic/gin@v1.8.2/gin.go:616\\ngithub.com/gin-gonic/gin.(*Engine).ServeHTTP\\n\\t/home/runner/go/pkg/mod/github.com/gin-gonic/gin@v1.8.2/gin.go:572\\nnet/http.serverHandler.ServeHTTP\\n\\t/opt/hostedtoolcache/go/1.20.4/x64/src/net/http/server.go:2936\\nnet/http.(*conn).serve\\n\\t/opt/hostedtoolcache/go/1.20.4/x64/src/net/http/server.go:1995\\nruntime.goexit\\n\\t/opt/hostedtoolcache/go/1.20.4/x64/src/runtime/asm_amd64.s:1598\\nerror deploying service helm chart\\ngithub.com/epinio/epinio/internal/services.(*ServiceClient).Create\\n\\t/home/runner/work/epinio/epinio/internal/services/instances.go:219\\ngithub.com/epinio/epinio/internal/api/v1/service.Create\\n\\t/home/runner/work/epinio/epinio/internal/api/v1/service/create.go:82\\ngithub.com/epinio/epinio/internal/api/v1.errorHandler.func1\\n\\t/home/runner/work/epinio/epinio/internal/api/v1/router.go:52\\ngithub.com/gin-gonic/gin.(*Context).Next\\n\\t/home/runner/go/pkg/mod/github.com/gin-gonic/gin@v1.8.2/context.go:173\\ngithub.com/alron/ginlogr.RecoveryWithLogr.func1\\n\\t/home/runner/go/pkg/mod/github.com/alron/ginlogr@v0.0.4/logr.go:121\\ngithub.com/gin-gonic/gin.(*Context).Next\\n\\t/home/runner/go/pkg/mod/github.com/gin-gonic/gin@v1.8.2/context.go:173\\ngithub.com/alron/ginlogr.Ginlogr.func1\\n\\t/home/runner/go/pkg/mod/github.com/alron/ginlogr@v0.0.4/logr.go:34\\ngithub.com/gin-gonic/gin.(*Context).Next\\n\\t/home/runner/go/pkg/mod/github.com/gin-gonic/gin@v1.8.2/context.go:173\\ngithub.com/gin-gonic/gin.CustomRecoveryWithWriter.func1\\n\\t/home/runner/go/pkg/mod/github.com/gin-gonic/gin@v1.8.2/recovery.go:101\\ngithub.com/gin-gonic/gin.(*Context).Next\\n\\t/home/runner/go/pkg/mod/github.com/gin-gonic/gin@v1.8.2/context.go:173\\ngithub.com/gin-gonic/gin.(*Engine).handleHTTPRequest\\n\\t/home/runner/go/pkg/mod/github.com/gin-gonic/gin@v1.8.2/gin.go:616\\ngithub.com/gin-gonic/gin.(*Engine).ServeHTTP\\n\\t/home/runner/go/pkg/mod/github.com/gin-gonic/gin@v1.8.2/gin.go:572\\nnet/http.serverHandler.ServeHTTP\\n\\t/opt/hostedtoolcache/go/1.20.4/x64/src/net/http/server.go:2936\\nnet/http.(*conn).serve\\n\\t/opt/hostedtoolcache/go/1.20.4/x64/src/net/http/server.go:1995\\nruntime.goexit\\n\\t/opt/hostedtoolcache/go/1.20.4/x64/src/runtime/asm_amd64.s:1598\"}]}"
2023/07/19 11:03:29 EpinioClient/ServiceCreate: "level"=4 "msg"="return"

❌  error creating service: service create failed: error deploying service helm chart: installing or upgrading service SYNC: pull access denied, repository does not exist or may require authorization: server message: invalid_token: authorization failed
  • you may redeploy the same service immediately and it will succeed this time

@enrichman
Copy link
Member

Thanks @thehejik, I was able to reproduce the issue as well. It seems that the problem was the misunderstanding of the Steps value (mentioned in the review: #2424 (review)).

I tried incrementing it and now it's ok. Basically 1 means try once, and 2 means try once, and then if it fails retry one more.

#2474

@thehejik
Copy link
Contributor

Verified fixed in epinio v1.9.0-rc5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
Archived in project
4 participants