Skip to content
Zack Galbreath edited this page Jun 30, 2023 · 1 revision

Attendees

  • Bill Hoffman
  • Dan LaManna
  • Jacob Nesbitt
  • John Parent
  • Massimiliano Culpo
  • Mike VanDenburgh
  • Ryan Krattiger
  • Scott Wittenburg
  • Todd Gamblin
  • Zack Galbreath

gitlab.spack.io

  • This week we upgraded gitlab.spack.io to v16.1.0. This process went pretty smoothly. We took notes to make future updates easier & document common pitfalls.
    • One issue we ran into was with minio. It requires non-trivial downtime to backup, and after the upgrade is finished, it is unpredictably slow to remount. We've started investigating what steps will be required to move to S3 instead.
  • We need to upgrade EKS and karpenter (at the same time) relatively soon.
  • We investigated the unreliability of the auto-cancel redundant pipelines feature. This unreliability seems to be caused by the sidekiq component getting overloaded.

CI Pipelines

  • PR #38514 fixes the "copy-only" job for the ml-darwin stack. It is currently blocked by an openmp build error on develop for that stack.
  • PR #38598 captures more fine-grained timing statistics for packages installed from source or a binary cache.
  • We removed unused compiler bootstrapping logic from the spack ci module in PR #38543
  • PR #38626 improved the information we display in our pipeline generation jobs. In particular, we now indicate why jobs were pruned (ie what mirror the spec was found in). This should make it easier to diagnose no_binary_for_specs errors as they reoccur.
  • We noticed that lots of specs are missing from the top-level buildcache. We are working to fix this problem by running the protected_publish job unconditionally.
    • We are also reviving spack-infra PR #310 to locate these missing specs in the per-stack buildcaches & copy them to the top-level buildcache.

Other

  • We are updating cache.spack.io to include results from our weekly snapshot releases.
  • We continue to make progress towards the goal of cloud-hosted GitLab CI runners for Windows.
  • We discussed what changes would be required to allow Spack developers to more easily be able to retry specific failed GitLab CI jobs, rather than rerunning an entire pipeline.
    • One problem to solve here is that we currently store S3 credentials in GitLab CI variables, which get rotated every 12 hours. So if you retry a job after 12 hours, it will fail due to invalid credentials. We are going to look into the feasibility of solving this problem by using IAM roles instead.

Priorities

  • Upgrade EKS and karpenter
  • Update gitlab.spack.io to use S3 and ElasticCache rather than minio and redis.
  • Finish PR #38598 (more fine-grained timing data).
  • Investigate using IAM roles instead of GitLab CI variables for S3 access.
  • Chat with Harshitha's team about job error classification.
  • Deploy GitLab CI Windows runners and pipeline(s).
Clone this wiki locally