Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update pipeline-service #1725

Closed
wants to merge 4 commits into from
Closed

Update pipeline-service #1725

wants to merge 4 commits into from

Conversation

Roming22
Copy link
Member

  • OSP-1.10
  • PaC deployed through OSP-1.10
  • allow for watcher logging level customization via gitops

@gabemontero
Copy link
Contributor

I would think we would want to remove tekton-chains at https://github.com/openshift-pipelines/pipeline-service/tree/main/operator/gitops/argocd/pipeline-service/tekton-chains as well @Roming22 and validate that removal via pipeline service CI before updating staging

Admittedly though, I'm a little unclear about how we override openshift-pipelines over at https://github.com/openshift-pipelines/pipeline-service/tree/main/operator/gitops/argocd/pipeline-service/openshift-pipelines so as to install or not install tekton-chains

of course, maybe I'm wrong here .... can you elaborate ?

thanks

@Roming22
Copy link
Member Author

@gabemontero I'm not sure I understand why removing tekton-chains is a pre-requiste to upgrading to OSP-1.10.

We certainly want to remove it, sooner than later, but it's going to take some time before it's ready, and if possible I'd like to validate that OSP-1.10 is running smoothly as soon as possible to avoid last minute surprises before the freeze.

@gabemontero
Copy link
Contributor

@gabemontero I'm not sure I understand why removing tekton-chains is a pre-requiste to upgrading to OSP-1.10.

We certainly want to remove it, sooner than later, but it's going to take some time before it's ready, and if possible I'd like to validate that OSP-1.10 is running smoothly as soon as possible to avoid last minute surprises before the freeze.

I thought PaC and tekton-chains were the same in that we had to install them from upstream to get required levels, but at 1.10 both the PaC and tekton-chains installed by openshift-pipelines were at sufficient levels

What motivated the removal of PaC via openshift-pipelines/pipeline-service#620 to be included, bu not chains ... I would think we would want to be consistent.

Please clarify when you get the chance, but I won't block merge, assuming e2e's pass (they did not on the first go)

/lgtm

@gabemontero
Copy link
Contributor

and the e2e failed with a gitops hiccup wrt pipeline service:

 pipeline-service-in-cluster-local                 Unknown     Healthy
pipeline-service-in-cluster-local failed with:
[{"lastTransitionTime":"2023-04-25T20:26:09Z","message":"rpc error: code = Unknown desc = `kustomize build .components/pipeline-service/development` failed exit status 1: Error: accumulating resources: accumulation err='accumulating resources from 'git::https://github.com/openshift-pipelines/pipeline-service.git/developer/openshift/gitops/argocd/pipeline-service-storage?ref=14431236de0f1e3d3a9e27c145197d1a45942794': evalsymlink failure on '.components/pipeline-service/development/git::https:/github.com/openshift-pipelines/pipeline-service.git/developer/openshift/gitops/argocd/pipeline-service-storage?ref=14431236de0f1e3d3a9e27c145197d1a45942794' : lstat .components/pipeline-service/development/git::https:: no such file or directory': evalsymlink failure on '/tmp/kustomize-2974330201/developer/openshift/gitops/argocd/pipeline-service-storage' : lstat /tmp/kustomize-2974330201/developer/openshift/gitops/argocd/pipeline-service-storage: no such file or directory","type":"ComparisonError"}]

at first quick glance can't say for certain if it is related to the bump ...

/test appstudio-e2e-tests

@Mo3m3n
Copy link
Contributor

Mo3m3n commented Apr 25, 2023

@Roming22 can you update the commit message to add "using tls for rds connections" as result of this update ?

@flacatus
Copy link
Contributor

/retest

@gabemontero
Copy link
Contributor

gabemontero commented Apr 25, 2023

@Roming22 - unfortunately the latest run https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/redhat-appstudio_infra-deployments/1725/pull-ci-redhat-appstudio-infra-deployments-main-appstudio-e2e-tests/1650981219980021760 failed in the same way as what I reported in #1725 (comment)

I think that points to gitops / kustomize is not happy with something in pipeline-service between 062d00d304d6fba316085784f3b948217fc99d12 and 14431236de0f1e3d3a9e27c145197d1a45942794

By rough count I see 21 or 22 commits between those 2 commits, so if I am correct there is a large set to choose from, unless the error message wrt pipeline-service-storage helps narrow it down

My best guess based on that clue is that it is somehow related to openshift-pipelines/pipeline-service@a07a5a0 from @Mo3m3n based on a comment in that file, but that is certainly guess that is more wild than educated, especially since that merged before 062d00d304d6fba316085784f3b948217fc99d12

@adambkaplan
Copy link
Contributor

The external secrets for Pipelines as Code probably need to be updated. I think the PR as is will break staging because PaC will be deployed in the openshift-pipelines namespace.

@openshift-ci openshift-ci bot removed the lgtm label Apr 26, 2023
Copy link
Contributor

@gabemontero gabemontero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added lgtm and removed lgtm labels Apr 26, 2023
@Roming22
Copy link
Member Author

@Mo3m3n @adambkaplan Comments processed. SHA updated, and there's no references to the pipelines-as-code namespace anymore.

> kustomize build components/pipeline-service/staging/base | grep -c "namespace: pipelines-as-code"
0

@gabemontero
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Apr 26, 2023
@gabemontero
Copy link
Contributor

pr needs a rebase @Roming22

do you want to do it, or I can hit the "update with rebase" button if you like

@openshift-ci openshift-ci bot removed the lgtm label Apr 26, 2023
@gabemontero
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Apr 26, 2023
@Mo3m3n
Copy link
Contributor

Mo3m3n commented Apr 26, 2023

@Roming22 I have just found that you also need to update this line https://github.com/redhat-appstudio/infra-deployments/blob/main/hack/secret-creator/create-plnsvc-secrets.sh#L27 to match this change done here in pipeline-service openshift-pipelines/pipeline-service@b5f1dc2#diff-f394b1c9744cc285b6ac80384e967bd73210dd72a905e1ad7581b8ecab4e3773R201

Basically when switching to the helm based postgres deployment the name of the db service changed too (for dev mode). That's why I suggested in my comment to also add to the commit message that we are enabling tls with this update and using Helm for postgres deployment

@openshift-ci openshift-ci bot removed the lgtm label Apr 26, 2023
@gabemontero
Copy link
Contributor

@Mo3m3n - we minimally need #1737 to merge in order to pick up @mmorhun 's fix for with namespace PaC is installed in. It has not merged as I type this comment.

For what you noted with https://github.com/redhat-appstudio/infra-deployments/tree/main/components/authentication and https://github.com/redhat-appstudio/infra-deployments/tree/main/components/sprayproxy yes that is part of what @Michkov noted in #1725 (comment)

I would say @Roming22 each of those need to be addressed as part of this PR

Once all that happens, we can see about rebase/retest and try to merge

@openshift-ci openshift-ci bot removed the lgtm label Apr 28, 2023
@gabemontero
Copy link
Contributor

@Mo3m3n - we minimally need #1737 to merge in order to pick up @mmorhun 's fix for with namespace PaC is installed in. It has not merged as I type this comment.

@Roming22 just pulled ^^ into this PR :-)

For what you noted with https://github.com/redhat-appstudio/infra-deployments/tree/main/components/authentication and https://github.com/redhat-appstudio/infra-deployments/tree/main/components/sprayproxy yes that is part of what @Michkov noted in #1725 (comment)

I would say @Roming22 each of those need to be addressed as part of this PR

Once all that happens, we can see about rebase/retest and try to merge

@Roming22
Copy link
Member Author

@Michkov I believe I've updated all references to pipelines-as-code. I tried to make sure not to impact production by patching where necessary.

@gabemontero
Copy link
Contributor

the last CI failure needs #1579

we just (possibly re-)triaged that mvp-demo failure in https://redhat-internal.slack.com/archives/C02FANRBZQD/p1682697746956539

@gabemontero
Copy link
Contributor

hopefully konflux-ci/e2e-tests#488 fixes this

@gabemontero
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Apr 28, 2023
@gabemontero
Copy link
Contributor

more new unrelated flakes being discussed in slack (I've asked QE to open blocking bugs and put skips in these failing tests until they are sorted out).

/retest

@gabemontero
Copy link
Contributor

@gabemontero
Copy link
Contributor

/retest

@gabemontero
Copy link
Contributor

I see the same timed out when waiting for init PaC PR to be created error @Roming22 .... with minimal activity in the PaC pods

By comparison, #1741 is showing green with the older version of PaC.

Might be time to try out this locally, work with @psturc and @flacatus to understand the details of this test, see where the breakdown is.

@Michkov
Copy link
Contributor

Michkov commented May 2, 2023

Issue in build-service:

{"level":"error","ts":"2023-04-30T23:29:58.624Z","logger":"ComponentOnboarding.PaC-setup","caller":"controllers/component_build_controller_pac.go:415","msg":"failed to get Component PaC repository object","controller":"component","controllerGroup":"appstudio.redhat.com","controllerKind":"Component","Component":{"name":"test-mvp-component-kpta","namespace":"mvp-demo-dev-namespace-cbpg-tenant"},"namespace":"mvp-demo-dev-namespace-cbpg-tenant","name":"test-mvp-component-kpta","reconcileID":"9a13383c-095c-4cdc-91f8-a7a4898f6f74","action":"VIEW","error":"no matches for kind \"Repository\" in version \"pipelinesascode.tekton.dev/v1alpha1\"","stacktrace":"github.com/redhat-appstudio/build-service/controllers.(*ComponentBuildReconciler).ensurePaCRepository\n\t/opt/app-root/src/controllers/component_build_controller_pac.go:415\ngithub.com/redhat-appstudio/build-service/controllers.(*ComponentBuildReconciler).ProvisionPaCForComponent\n\t/opt/app-root/src/controllers/component_build_controller_pac.go:113\ngithub.com/redhat-appstudio/build-service/controllers.(*ComponentBuildReconciler).Reconcile\n\t/opt/app-root/src/controllers/component_build_controller.go:297\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.1/pkg/internal/controller/controller.go:122\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.1/pkg/internal/controller/controller.go:323\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.1/pkg/internal/controller/controller.go:274\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.1/pkg/internal/controller/controller.go:235"}

no matches for kind \"Repository\" in version \"pipelinesascode.tekton.dev/v1alpha1\" -> trying to reproduce

@openshift-ci openshift-ci bot removed the lgtm label May 2, 2023
@openshift-ci
Copy link

openshift-ci bot commented May 2, 2023

New changes are detected. LGTM label has been removed.

* OSP-1.10
* PaC deployed through OSP-1.10
* Allow for watcher logging level customization via gitops
* Using TLS for RDS connections

Signed-off-by: Romain Arnaud <rarnaud@redhat.com>
@Michkov
Copy link
Contributor

Michkov commented May 2, 2023

{"severity":"error","timestamp":"2023-05-02T16:51:35.531Z","logger":"pipelines-as-code-webhook.ValidationWebhook","caller":"controller/controller.go:566","message":"Reconcile error","knative.dev/traceid":"78b06509-0c86-464b-8b97-034e745398df","knative.dev/key":"validation.pipelinesascode.tekton.dev","duration":0.000023802,"error":"secret \"pipelines-as-code-webhook-certs\" is missing \"ca-cert.pem\" key","stacktrace":"knative.dev/pkg/controller.(*Impl).handleErr\n\t/go/src/github.com/openshift-pipelines/pipelines-as-code/vendor/knative.dev/pkg/controller/controller.go:566\nknative.dev/pkg/controller.(*Impl).processNextWorkItem\n\t/go/src/github.com/openshift-pipelines/pipelines-as-code/vendor/knative.dev/pkg/controller/controller.go:543\nknative.dev/pkg/controller.(*Impl).RunContext.func3\n\t/go/src/github.com/openshift-pipelines/pipelines-as-code/vendor/knative.dev/pkg/controller/controller.go:491"}
-> "error":"secret "pipelines-as-code-webhook-certs" is missing "ca-cert.pem" key"

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/redhat-appstudio_infra-deployments/1725/pull-ci-redhat-appstudio-infra-deployments-main-appstudio-e2e-tests/1653436401850519552/artifacts/appstudio-e2e-tests/redhat-appstudio-hypershift-gather/artifacts/pods/openshift-pipelines_pipelines-as-code-webhook-6b4f79686c-bsvr2_pac-webhook.log

@gabemontero
Copy link
Contributor

gabemontero commented May 2, 2023

So still the same error in the last e2e run.

@Roming22 - so in my local repro of this, I killed the build-service pod, and when it restarted, it no longer got the no matches for kind \"Repository\" in version \"[pipelinesascode.tekton.dev/v1alpha1](http://pipelinesascode.tekton.dev/v1alpha1)errors and the Repository object was created in the test namespace (which I had left up) after it was missing during the test failure

I wonder if this is a startup order sort of thing, where the build-service initializes its client before the openshift-pipelines operator has "fully installed things", where as previously, PaC was getting created by pipeline-service directly

Here are the current sync-waves I see for the two:

gmontero ~/go/src/github.com/redhat-appstudio/infra-deployments  (main)$ find . -name "*.yaml" -exec grep -s -H sync-wave {} \;
./components/build-service/base/external-secrets/pipelines-as-code-secret.yaml:    argocd.argoproj.io/sync-wave: "-1"
./components/pipeline-service/base/external-secrets/pipelines-as-code/pipelines-as-code-secret.yaml:    argocd.argoproj.io/sync-wave: "-1"
./components/pipeline-service/base/external-secrets/tekton-results/tekton-results-database.yaml:    argocd.argoproj.io/sync-wave: "-1"
./components/pipeline-service/base/external-secrets/tekton-results/tekton-results-s3.yaml:    argocd.argoproj.io/sync-wave: "-1"

@gabemontero
Copy link
Contributor

{"severity":"error","timestamp":"2023-05-02T16:51:35.531Z","logger":"pipelines-as-code-webhook.ValidationWebhook","caller":"controller/controller.go:566","message":"Reconcile error","knative.dev/traceid":"78b06509-0c86-464b-8b97-034e745398df","knative.dev/key":"validation.pipelinesascode.tekton.dev","duration":0.000023802,"error":"secret \"pipelines-as-code-webhook-certs\" is missing \"ca-cert.pem\" key","stacktrace":"knative.dev/pkg/controller.(*Impl).handleErr\n\t/go/src/github.com/openshift-pipelines/pipelines-as-code/vendor/knative.dev/pkg/controller/controller.go:566\nknative.dev/pkg/controller.(*Impl).processNextWorkItem\n\t/go/src/github.com/openshift-pipelines/pipelines-as-code/vendor/knative.dev/pkg/controller/controller.go:543\nknative.dev/pkg/controller.(*Impl).RunContext.func3\n\t/go/src/github.com/openshift-pipelines/pipelines-as-code/vendor/knative.dev/pkg/controller/controller.go:491"} -> "error":"secret "pipelines-as-code-webhook-certs" is missing "ca-cert.pem" key"

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/redhat-appstudio_infra-deployments/1725/pull-ci-redhat-appstudio-infra-deployments-main-appstudio-e2e-tests/1653436401850519552/artifacts/appstudio-e2e-tests/redhat-appstudio-hypershift-gather/artifacts/pods/openshift-pipelines_pipelines-as-code-webhook-6b4f79686c-bsvr2_pac-webhook.log

looks like we officially have multiple problems

@openshift-ci
Copy link

openshift-ci bot commented May 2, 2023

@Roming22: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/appstudio-e2e-tests ed9265b link true /test appstudio-e2e-tests

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@Roming22
Copy link
Member Author

Roming22 commented May 2, 2023

Closed in favor of #1769

We'll be taking baby steps:

  • Deploy OSP 1.10 without any changes to PaC
  • Update PaC and tekton-chains images to use images built by Red Hat
  • Deploy tekton-results through the operator
  • Deploy tekton-chains through the operator

@Roming22 Roming22 closed this May 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants