Skip to content
Zack Galbreath edited this page Jun 2, 2023 · 1 revision

Attendees

  • Bill Hoffman
  • Dan LaManna
  • Jacob Nesbitt
  • John Parent
  • Mike VanDenburgh
  • Ryan Krattiger
  • Tamara Grimmett
  • Todd Gamblin
  • Zack Galbreath

pcluster

  • We are working Stephen & Max to speed up the bootstrapping portion of the pcluster stacks.
  • We are also investigating better ways to support pcluster stacks in light of the constraint that we cannot make some of the resulting binaries publicly available. This article describes an intriguing approach to more fine-grained access permissions within a single S3 bucket.
  • We've designed a system to automatically build EKS-optimized AMIs for pcluster whenever we bump our EKS version.

CI status

  • We've noticed a higher than usual error rate for the no_binary_for_spec class of job failures. We are still investigating the cause of this problem. This type of error seems to correlate with higher job volume.
  • Earlier this week we noticed multiple pipelines being run for the same commit. This situation should be resolved now, but we will keep an eye on it.

Dashboarding and analytics

  • We hope to open up access to prometheus.spack.io to all members of the Spack organization soon.
  • GitLab apparently provides lots of prometheus metrics and grafana dashboards out of the box. We are combing through these to find dashboards that are particularly well suited to our workflows. We will publish these on grafana.spack.io.
  • Our DMS-based approach of replicating and modifying GitLab's postgres database is proving fragile. It seems to work initially, but eventually runs into some consistency errors from which it cannot recover without a fresh copy being created. If we can't figure out how to stabilize this system we will regather requirements and pursue alternate approaches. The motivation behind this system is to correlate diverse streams of data (packages, jobs, system data) into one place for easier / more powerful analytics.

Windows CI

  • We are working on setting up cloud-based GitLab CI runners for Windows. Our initial efforts on this task are going well so far.

Priorities

  • Upgrade gitlab.spack.io
  • Reduce the no_binary_for_spec error rate
  • Replace any remaining gp2 volumes with gp3
  • Continue to assist with the effort to speed up pcluster bootstrapping
  • Keep pushing on the current DMS-based approach to populate the analytics database. if it proves too unstable, regather requirements and pursue alternate approaches. Ask Alec for help.
Clone this wiki locally