Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weโ€™ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: adapt for gpu cost #3596

Merged
merged 6 commits into from Jul 31, 2023
Merged

feat: adapt for gpu cost #3596

merged 6 commits into from Jul 31, 2023

Conversation

bxy4543
Copy link
Member

@bxy4543 bxy4543 commented Jul 27, 2023

๐Ÿค– Generated by Copilot at d5f5b71

Summary

๐ŸŽฎ๐Ÿ›ก๏ธ๐Ÿ“Š

This pull request adds support for Nvidia GPU resources in the sealos controller. It defines GPU-related constants, types, and functions in the gpu package, and uses them to create, price, and monitor GPU resources in the resources.go and monitor_controller.go files. It also updates the RBAC and deployment configurations for the monitor-controller controller to allow it to access node information.

We're the monitor-controller crew, we watch the nodes and pods
We gather GPU data and we store it in our logs
We heave and haul on the MonitorReconciler line
And we sing this shanty chorus on the count of three and nine

Walkthrough

  • Add support for collecting and storing Nvidia GPU resource usage data from pods and nodes (link,link,link,link,link,link,link,link,link,link,link,link,link,link)
  • Define constants, types, and functions related to Nvidia GPU labels and information in nvidia.go (link)
  • Import the gpu package and add a new constant ResourceGPU and a new function NewGpuResource to resources.go (link,link)
  • Add the ResourceGPU constant to the PricesUnit map in resources.go to define the unit price for GPU resources (link)
  • Add a new rule to role.yaml and deploy.yaml to allow the monitor role and the monitor-controller controller to get, list, and watch nodes (link,link)
  • Import the gpu package and add a new field NvidiaGpu to the MonitorReconciler struct in monitor_controller.go (link,link)
  • Add a kubebuilder annotation to monitor_controller.go to generate RBAC rules for accessing nodes (link)
  • Modify the NewMonitorReconciler function in monitor_controller.go to call the GetNodeGpuModel function from the gpu package and assign the result to the NvidiaGpu field (link)
  • Modify the podResourceUsage function in monitor_controller.go to handle the GPU resource usage from pods by checking the GPU limit, getting the GPU model, and adding the GPU request to the rs map (link)
  • Modify the getResourceValue function in monitor_controller.go to convert the GPU resource usage to integer values based on the unit price (link)
  • Modify the initResources function in monitor_controller.go to initialize the rs map with zero values for the ResourceGPU constant (link)
  • Add error handling and logging for the case when a user-owned namespace does not have a resource quota for storage (link)
    • Add a new variable hasStorageQuota to indicate whether the namespace has a resource quota for storage in the podResourceUsage function in monitor_controller.go (link)
    • Add a condition to check if the namespace has the UserAnnotationOwnerKey annotation and log an error if the resource quota is empty in the podResourceUsage function in monitor_controller.go (link)
    • Add a block of code to list all the PVCs in the namespace and add their storage requests to the rs map if the namespace does not have a storage quota in the podResourceUsage function in monitor_controller.go (link)

@sealos-ci-robot
Copy link
Member

sealos-ci-robot commented Jul 27, 2023

๐Ÿค– Generated by lychee action

Summary

Status Count
๐Ÿ” Total 914
โœ… Successful 347
โณ Timeouts 0
๐Ÿ”€ Redirected 0
๐Ÿ‘ป Excluded 566
โ“ Unknown 0
๐Ÿšซ Errors 0

Full action output

Full Github Actions output

@codecov
Copy link

codecov bot commented Jul 27, 2023

Codecov Report

Patch and project coverage have no change.

Comparison is base (0adc21d) 67.92% compared to head (d286b41) 67.92%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3596   +/-   ##
=======================================
  Coverage   67.92%   67.92%           
=======================================
  Files           8        8           
  Lines         664      664           
=======================================
  Hits          451      451           
  Misses        171      171           
  Partials       42       42           

โ˜” View full report in Codecov by Sentry.
๐Ÿ“ข Have feedback on the report? Share it here.

@bxy4543 bxy4543 added this to the v5.0 milestone Jul 27, 2023
@lingdie lingdie changed the title Feat/gpu cost feat: adapt for gpu cost Jul 29, 2023
@bxy4543 bxy4543 merged commit 9a013ce into labring:main Jul 31, 2023
39 checks passed
@bxy4543 bxy4543 deleted the feat/gpu_cost branch July 31, 2023 02:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants