CI: 2023 06 02
Zack Galbreath edited this page Jun 2, 2023
·
1 revision
- Bill Hoffman
- Dan LaManna
- Jacob Nesbitt
- John Parent
- Mike VanDenburgh
- Ryan Krattiger
- Tamara Grimmett
- Todd Gamblin
- Zack Galbreath
- We are working Stephen & Max to speed up the bootstrapping portion of the pcluster stacks.
- We are also investigating better ways to support pcluster stacks in light of the constraint that we cannot make some of the resulting binaries publicly available. This article describes an intriguing approach to more fine-grained access permissions within a single S3 bucket.
- We've designed a system to automatically build EKS-optimized AMIs for pcluster whenever we bump our EKS version.
- We've noticed a higher than usual error rate for the
no_binary_for_spec
class of job failures. We are still investigating the cause of this problem. This type of error seems to correlate with higher job volume. - Earlier this week we noticed multiple pipelines being run for the same commit. This situation should be resolved now, but we will keep an eye on it.
- We hope to open up access to prometheus.spack.io to all members of the Spack organization soon.
- GitLab apparently provides lots of prometheus metrics and grafana dashboards out of the box. We are combing through these to find dashboards that are particularly well suited to our workflows. We will publish these on grafana.spack.io.
- Our DMS-based approach of replicating and modifying GitLab's postgres database is proving fragile. It seems to work initially, but eventually runs into some consistency errors from which it cannot recover without a fresh copy being created. If we can't figure out how to stabilize this system we will regather requirements and pursue alternate approaches. The motivation behind this system is to correlate diverse streams of data (packages, jobs, system data) into one place for easier / more powerful analytics.
- We are working on setting up cloud-based GitLab CI runners for Windows. Our initial efforts on this task are going well so far.
- Upgrade gitlab.spack.io
- Reduce the
no_binary_for_spec
error rate - Replace any remaining
gp2
volumes withgp3
- Continue to assist with the effort to speed up pcluster bootstrapping
- Keep pushing on the current DMS-based approach to populate the analytics database. if it proves too unstable, regather requirements and pursue alternate approaches. Ask Alec for help.