New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create configuration level device plugin #565
Create configuration level device plugin #565
Conversation
Is there any chance to have it as a part of incoming release? |
Signed-off-by: Johnson Shih <jshih@microsoft.com>
Signed-off-by: Johnson Shih <jshih@microsoft.com>
Signed-off-by: Johnson Shih <jshih@microsoft.com>
Signed-off-by: Johnson Shih <jshih@microsoft.com>
Signed-off-by: Johnson Shih <jshih@microsoft.com>
Signed-off-by: Johnson Shih <jshih@microsoft.com>
Signed-off-by: Johnson Shih <jshih@microsoft.com>
Signed-off-by: Johnson Shih <jshih@microsoft.com>
Signed-off-by: Johnson Shih <jshih@microsoft.com>
Signed-off-by: Johnson Shih <jshih@microsoft.com>
Signed-off-by: Johnson Shih <jshih@microsoft.com>
Signed-off-by: Johnson Shih <jshih@microsoft.com>
…ge change Signed-off-by: Johnson Shih <jshih@microsoft.com>
Signed-off-by: Johnson Shih <jshih@microsoft.com>
5e3ed62
to
bcc848e
Compare
Signed-off-by: Johnson Shih <jshih@microsoft.com>
@johnsonshih I haven't given this a thorough look through yet, but i spec-ed out this work last April to a working POC, so i wanted to share the diff with you: main...kate-goldenring:akri:config-level-resources-poc. Rather than creating a separate
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for diving into this @johnsonshih! I think we can simplify this and still use one DevicePluginService
per #565 (comment). You are also welcome to pull in anything from the POC I put together last year: main...kate-goldenring:akri:config-level-resources-poc. Sorry for not sharing it earlier 😵💫
Signed-off-by: Johnson Shih <jshih@microsoft.com>
Thanks for the information. I on purposely separate out the instance plugin service and configuration plugin service into two different struct as most of the fields used in DevicePluginService is specific to instance (e.g., instance_name), and the behavior of instance device plugin and configuration device plugin are different, too. Put both behaviors in the same DevicePluginService struct make it difficult to maintain. I do have some follow up refactoring for CL resource after this PR committed. 'll definitely check the poc branch to see anything that we can leverage to improve the design and eliminate code duplication. Given the schedule, is it ok we focus on committing this PR? As long as the behavior is correct, we can improve our code base with follow up changes. Thanks! |
Signed-off-by: Johnson Shih <jshih@microsoft.com>
Signed-off-by: Johnson Shih <jshih@microsoft.com>
Signed-off-by: Johnson Shih <jshih@microsoft.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating on this @johnsonshih. I still do not see the need for the ConfigurationDevicePluginService
and the duplication of code. Can we aim to reduce the size of changes in this PR and maintain one DevicePluginService
with conditional extra functionality if it is a Configuration level DP? I am concerned about maintainability and debugability and reviewability. I wonder if one approach would be to start with one DevicePluginService
to reduce the size of this PR and then in a subsequent one if we feel like we want different services to add that. It is hard here to see what is changed verses relocated. Or maybe we can move the shared code of list_and_watch into shared methods or it could be argued that that consolidation could come in a separate PR, so I am talking in circles.
In this PR, each Configuration virtual devices are considered different, that's why you see the same instance is allocated to a pod multiple times. It's based on the proposal https://github.com/project-akri/akri-docs/blob/main/proposals/configuration-level-resources.md I'm going to create a switch in Configuration to support different behaviors for the Configuration virtual devices, i.e., introduce a switch in Configuration (switch name "configurationResourceFrom", possible values are "instance" and "deviceUsage". Default value of configurationResourceFrom : "instance") |
@johnsonshih I like the idea of making this configurable but i'd rather not expose users to the nuance of "deviceUsage" and IMO only contributors are familiar with that term. Maybe the switch can be |
@johnsonshih would you be able to demo this at the community meeting tomorrow? |
I'll change to use uniqueDevices. |
Signed-off-by: Johnson Shih <jshih@microsoft.com>
Signed-off-by: Johnson Shih <jshih@microsoft.com>
Signed-off-by: Johnson Shih <jshih@microsoft.com>
Signed-off-by: Johnson Shih <jshih@microsoft.com>
Signed-off-by: Johnson Shih <jshih@microsoft.com>
Signed-off-by: Johnson Shih <jshih@microsoft.com>
Signed-off-by: Johnson Shih <jshih@microsoft.com>
…ub.com/johnsonshih/akri into user/jshih/configuration-device-plugin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for continuing to iterate on this @johnsonshih. I have concerns about potential race cases and how it looks like the Configuration Device Plugin is reporting devices.
#[derive(Clone, Debug)] | ||
pub struct InstanceConfig { | ||
pub usage_update_message_sender: Option<broadcast::Sender<ListAndWatchMessageKind>>, | ||
pub instances: HashMap<String, InstanceInfo>, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add documentation around all public structs and fields (see InstanceInfo
and InstanceConnectivityStatus
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add document comments
agent/src/util/discovery_operator.rs
Outdated
// Create a device plugin for the Configuration | ||
let config_dp_name = get_device_configuration_name(&config_name); | ||
trace!( | ||
"start_discovery - create configuration device plugin {}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"start_discovery - create configuration device plugin {}", | |
"internal_start_discovery - create configuration device plugin {}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update message
agent/src/util/discovery_operator.rs
Outdated
} | ||
Err(e) => { | ||
error!( | ||
"start_discovery - error {} building configuration device plugin", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"start_discovery - error {} building configuration device plugin", | |
"internal_start_discovery - error {} building configuration device plugin", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update message
impl InstanceConfig { | ||
pub fn new() -> Self { | ||
InstanceConfig { | ||
usage_update_message_sender: None, | ||
instances: HashMap::new(), | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the equivalent of what the InstanceConfig
's fields default values would be. If you tag the struct with Default
then you don't need this and can change all instantiations to InstanceConfig::default()
impl InstanceConfig { | |
pub fn new() -> Self { | |
InstanceConfig { | |
usage_update_message_sender: None, | |
instances: HashMap::new(), | |
} | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use Default trait
pub configuration_usage_slots: HashSet<String>, | ||
} | ||
|
||
#[derive(Clone, Debug)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just tag with default instead of having constructor behave as default would (per comment below)
#[derive(Clone, Debug)] | |
#[derive(Clone, Debug, Default)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add Default trait
.unwrap(); | ||
discovered_devices.insert(instance_name, virtual_devices); | ||
} | ||
// construct virtual device info list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: consistency with comments (Capitalize the first letter of the comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update comments
); | ||
Err(Status::new( | ||
fn build_virtual_devices_list_for_instance( | ||
device_map: HashMap<String, Vec<v1beta1::Device>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be able to take in a reference to avoid unnecessary cloning
device_map: HashMap<String, Vec<v1beta1::Device>>, | |
device_map: &HashMap<String, Vec<v1beta1::Device>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pass reference instead of value
}, | ||
) | ||
.await | ||
.unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please propegate errors instead of unwrapping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove unwrap()
device_usage_id | ||
); | ||
Err(Status::new( | ||
fn build_virtual_devices_list_for_instance( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this function could use a different name. it isn't building virtual devices rather if i am reading it correctly, checking if there is a healthy slot. How about instances_with_healthy_devices
and maybe a comment explaining the function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change function name to build_virtual_device_health_state_for_instance and add comments.
Err(Status::new( | ||
fn build_virtual_devices_list_for_instance( | ||
device_map: HashMap<String, Vec<v1beta1::Device>>, | ||
) -> HashMap<String, String> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is behaving differently than i would expect. If I have a configuration with a capacity of 4 and 3 instances are discovered, i would expect 12 virtual devices to be reported. Why only 3 here? Kubelet keeps track of its allocate calls. It will not call allocate again on a device it has already allocated, therefor the other 3 capacity slots will never be used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a capacity of 4 and 3 instances are discovered, the Instance.deviceUsage is 12. These deviceUsage slots are shared by all IL and CL virtual devices from all nodes.
For CL resources on a node, every instance can only be allocated as CL virtual device once, hence CL DP reports 3, Instance-0, Instance-1, Instance-2. the CL virtual device-x is Healthy as long as the Instance-x has a free slot available or allocated as a CL virtual device.
If CL DP reports 12 virtual devices, when allocate one CL virtual device, CL DP need to refresh list_and_watch to flip the other virtual devices from the instance being allocated to Unhealthy, otherwise Kubernetes will call allocate (max) 12 times since Kubernetes think all virtual devices are healthy and can be allocated.
IMO, expose CL virtual devices from usage slot level is more difficult to track and easier to get out of sync since the list_and_watch is running in a different thread. For user's perspective, it's more intuitive to see the IL/CL resource availability if CL virtual devices are reported based on Instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you successfully allocated N+1 CL devices when N instances are discovered with a capacity of 2? I do not believe it is possible to do this and for kubelet to be healthy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran this branch locally to test this out. You cannot use the full capacity of devices with the current implementation. For example, when deploying this helm chart (with the agent and controller running locally):
helm install akri akri-helm-charts/akri $AKRI_HELM_CRICTL_CONFIGURATION --set agent.enabled=false --set controller.enabled=false --set debugEcho.configuration.enabled=true --set debugEcho.configuration.shared=false --set debugEcho.configuration.capcity=2
These devices are discovered
kagold@kagold-ThinkPad-X1-Carbon-6th:~$ kubectl get akrii
NAME CONFIG SHARED NODES AGE
akri-debug-echo-a19705 akri-debug-echo true ["myNode"] 10m
akri-debug-echo-8120fe akri-debug-echo true ["myNode"] 10m
The node is labeled with the following resources. Note how only 2 akri.sh/akri-debug-echo
devices are "allocatable" despite each instance having a capacity of 2, so 4 should be available to pods:
Capacity:
akri.sh/akri-debug-echo: 2
akri.sh/akri-debug-echo-8120fe: 2
akri.sh/akri-debug-echo-a19705: 2
...
Allocatable:
akri.sh/akri-debug-echo: 2
akri.sh/akri-debug-echo-8120fe: 2
akri.sh/akri-debug-echo-a19705: 2
To illustrate the point, deploy a replicaset with 3 replicas:
apiVersion: apps/v1
kind: Deployment
metadata:
name: debug-echo-deployment
labels:
app: debug-echo-broker
spec:
replicas: 3
selector:
matchLabels:
app: debug-echo-broker
template:
metadata:
labels:
app: debug-echo-broker
spec:
containers:
- name: debug-echo-broker
image: nginx
resources:
limits:
akri.sh/akri-debug-echo: "1"
requests:
akri.sh/akri-debug-echo: "1"
Only two pods run with the third left pending due to no resource being available:
NAME READY STATUS RESTARTS AGE
debug-echo-deployment-57ccdc95d9-24726 0/1 Pending 0 3m59s
debug-echo-deployment-57ccdc95d9-2wl6b 1/1 Running 0 3m59s
debug-echo-deployment-57ccdc95d9-5zkvb 1/1 Running 0 3m59s
$ kubectl describe pod debug-echo-deployment-57ccdc95d9-24726
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 4m18s default-scheduler 0/1 nodes are available: 1 Insufficient akri.sh/akri-debug-echo. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a key reason why i did a direct mapping of CL slots to IL slots in the original implementation main...kate-goldenring:akri:config-level-resources-poc
Signed-off-by: Johnson Shih <jshih@microsoft.com>
Signed-off-by: Johnson Shih <jshih@microsoft.com>
What this PR does / why we need it:
This PR is to implement the proposal of exposing resource at Configuration level. https://github.com/project-akri/akri-docs/blob/main/proposals/configuration-level-resources.md#configuration-level-resources
With Configuration level resource, users can select resource to use at configuration level without knowing the instance id beforehand.
Special notes for your reviewer:
Summary of changes in this PR
- CL resource and IL resource share the capacity pool. i.e. (# of allocated CL virtual devices + # of allocated IL virtual devices) <= capacity.
- The name of CL device plugin is the Akri Configuration name and follows the same convention of IL device plugin, i.e., replace ['.', '/'] with "-".
- The CL device plugin uses the same name schema as IL device plugin for virtual devices. The virtual device id reported by CL device plugin looks like _.
- ConfigurationDevicePlugin represent the behavior of CL device plugin.
- DevicePluginService contains a list_and_watch_message_sender to notify refreshing list_and_watch, used by the DevicePluginService internally, a copy of list_and_watch_message_sender is stored in the associated InstanceInfo, used by external entity to refresh the DPS's list_and_watch.
- Configuration DevicePluginService contains a list_and_watch_message_sender to notify refreshing list_and_watch, used by the Configuration DevicePluginService internally, a copy of list_and_watch_message_sender is store in the InstanceConfig, used by external entity to refresh the Configuration DPS's list_and_watch
- When IL DPS allocate a virtual device, it notify CL DPS to refresh list_and_watch ,and vice versa, CL DPS notify IL DPS to refresh list_and_watch when it allocates a virtual device.
If applicable:
cargo fmt
)cargo build
)cargo clippy
)cargo test
)cargo doc
)