Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(controller): refine the abstract layer between job and k8s #2604

Merged
merged 3 commits into from
Aug 11, 2023

Conversation

anda-ren
Copy link
Member

@anda-ren anda-ren commented Aug 8, 2023

Description

Gains

  1. make job platform-agnostic
  2. make docker(ease to use and deploy) be an available platform
  3. make swcli server start command possible

Implementation Changes

  1. move reporting package to schedule package
  2. add SwSchedulerAbstractFactory/TaskLogCollector/ interface
  3. modify SwTaskScheduler.schedule interface
  4. add sw.scheduler[k8s, docker] property to configure which implementation to use
  5. add docker implementation
    • scheduler
    • TaskLogCollector
    • reporting

Limitations && TODOs

  1. dataset build is still coupling with k8s implementation i.e. not availabe in docker implementation(fixed by refactor(controller): make ds build a job #2626 )
  2. model serving is still coupling with k8s implementation i.e. not availabe in docker implementation(fixed by refactor(controller): make online evaluation be a job #2672)
  3. runtime image build is still coupling with k8s implementation i.e. not availabe in docker implementation(fixed by refactor(controller): make runtime dockerizing be a job #2651)
  4. resource pool is limited to the node where SW server is deployed in docker implementation

Modules

  • UI
  • Controller
  • Agent
  • Client
  • Python-SDK
  • Others

Checklist

  • run code format and lint check
  • add unit test
  • add necessary doc

@anda-ren anda-ren marked this pull request as ready for review August 9, 2023 05:48
…k id from container label; deliver cancelling status from docker impl to status machine; clear scheduler.stop semantics;
@codecov
Copy link

codecov bot commented Aug 10, 2023

Codecov Report

Merging #2604 (360e002) into main (898e871) will decrease coverage by 8.59%.
Report is 6 commits behind head on main.
The diff coverage is 70.82%.

@@             Coverage Diff              @@
##               main    #2604      +/-   ##
============================================
- Coverage     82.64%   74.05%   -8.59%     
- Complexity     2683     2734      +51     
============================================
  Files           454      364      -90     
  Lines         24882    13214   -11668     
  Branches       1509     1519      +10     
============================================
- Hits          20563     9786   -10777     
+ Misses         3654     2745     -909     
- Partials        665      683      +18     
Flag Coverage Δ
console ?
controller 74.05% <70.82%> (-0.27%) ⬇️
standalone ?
unittests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
...i/starwhale/mlops/api/DatasetBuildLogWsServer.java 0.00% <0.00%> (ø)
...starwhale/mlops/domain/dataset/DatasetService.java 82.39% <0.00%> (ø)
...ps/domain/dataset/build/log/BuildLogCollector.java 27.27% <ø> (ø)
...java/ai/starwhale/mlops/domain/job/EnvService.java 100.00% <ø> (ø)
...tarwhale/mlops/domain/job/ModelServingService.java 73.74% <ø> (ø)
.../mlops/domain/runtime/RuntimeRegistryListener.java 66.66% <ø> (ø)
...starwhale/mlops/domain/runtime/RuntimeService.java 76.71% <ø> (ø)
...i/starwhale/mlops/domain/system/SystemService.java 27.27% <0.00%> (ø)
...ps/domain/system/resourcepool/bo/ResourcePool.java 86.53% <ø> (ø)
...va/ai/starwhale/mlops/domain/task/TaskService.java 51.35% <0.00%> (ø)
... and 39 more

... and 107 files with indirect coverage changes

Copy link
Contributor

@jialeicui jialeicui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@anda-ren anda-ren merged commit 34eac83 into star-whale:main Aug 11, 2023
21 checks passed
@anda-ren anda-ren deleted the abs_job_merge branch December 21, 2023 03:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants