Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

2021 Feb Release Plan #5253

Closed
27 of 55 tasks
debuggy opened this issue Jan 15, 2021 · 0 comments
Closed
27 of 55 tasks

2021 Feb Release Plan #5253

debuggy opened this issue Jan 15, 2021 · 0 comments

Comments

@debuggy
Copy link
Contributor

debuggy commented Jan 15, 2021

Release Manager

@hzy46

Endgame

Feature freeze: TBD
Code freeze: 2.24
Scrum demo date: 2.25
Bug Bash date: TBD
Release date & retrospective date: 2.28

Test Plan:

TBD

Work Items

Job submission page new UI

  • P0 side bar refine
  • P0 basic info + task role
  • P0 More info (advanced mode)
  • P1 SKU (hived scheduler logic)
  • P1 secrets function (including image auth)
  • P1 ssh function
  • P2 data (team storage)
  • P2 save as template

Job protocol update

  • P0 support data via extending prerequisite in job protocol extend prerequisite field in job protocol #5145
    • cmd runtime plugin modification
    • let runtime plugin parse prerequsite
    • change validation in webportal and rest-server
      • make uri optional
    • change openpai protocol
  • P1 Host prerequisite in marketplace

Marketplace 2021 Feb. Release Plan

Dockerhub pull policy #5219

  • Docker image pull frequency limit in dockerhub
    • P0 use ansible notebook to change docker daemon config (solve job pull and service start problems)
    • P0 start a new cache registry service to cache images from dockerhub
    • P1 change service yaml config when starting service to use mirror registry
    • leave an interface for external registry
  • Test:
    • submit a job using dockerhub image
      • cache registry log shows that it receives a pull image request from the job container
    • batch submit 100+ simple jobs
      • cache registry log records all the pull image requests
      • check the worker node rate limit and it does not reach the limit

DB controller

GPU Utilization Statistics

Let runtime plugin access a "job application token" @suiguoxin

save SSH publish keys on user profile page #5274

openpai runtime

log experience

Deployment

  • P2 verbose mode for deployment scripts @Starmys

Agile CI and nightly-build

P1 Agile CI and split heavy tests with nightly-deployment #5173 @yiyione

  • Setup the nightly-deployment and test
  • Build & publish nightly tag image

Fault Document

@mydmdm

Document

Bug Fix

Backlog

HiveD scheduler

  • P2 Cell as sku in hived scheduler. @abuccts (backbone support (config) for "Cell", submission form supports for Cell) @hzy46
    ETA: design: in progress
  • P2 UX for HiveD @yangou1988 @hzhua

Show cluster-level info #5254

dependabot alert

Need triage

Technical Investigations

@scarlett2018 scarlett2018 pinned this issue Jan 16, 2021
@scarlett2018 scarlett2018 mentioned this issue Feb 1, 2021
5 tasks
@hzy46 hzy46 mentioned this issue Feb 25, 2021
29 tasks
@Binyang2014 Binyang2014 unpinned this issue Mar 16, 2021
@Binyang2014 Binyang2014 pinned this issue Mar 25, 2021
@yiyione yiyione unpinned this issue Apr 26, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants