-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add proposal and implementation for Definition of Cloud Robot CRD and Operator #147
Conversation
Welcome @PyPotato! It looks like this is your first PR to kubeedge/community 🎉 |
Signed-off-by: zsj <sjzhou2000@163.com>
To achieve our goal, we need three kinds of CRDs: **robot CRD**, **task CRD** and **robotSync CRD**. The robot CRD and task CRD describe the robot status and task information respectively, and the robotSync CRD is used to record the number of robot CR and their configuration informations. When the key information of the robot changes, the status information must be reported immediately, otherwise the information is reported in the form of a heartbeat frame at a period set by the user. Task resources are created by users and delivered by the Controller. | ||
|
||
- The fields contained in the **robot** resource can be described as the table below: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested Addition: Enhancing Robot CRD with Sensor-Related Fields
To ensure accurate assignment of sensor-related tasks, it is highly recommended to enrich the Robot CRD with sensor-related fields. These additional fields will enable more precise task allocation, taking into account the capabilities and availability of the robot's sensors. It can be an array including the sensor list that you think is optional for the agv/amr robot.
- CRDs to define robot properties and states | ||
- CRDs to extend teh robot API, through which we can control the robot | ||
- A series of controllers to realize the automatic control of the robot | ||
- This project is to design the robot CRD based on the interface definition of industrial mobile robots proposed by the Mobile Robot Industry Alliance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested Addition: Adding Reference Link for "the interface definition of industrial mobile robots proposed by the Mobile Robot Industry Alliance"
To provide more context and allow interested readers to explore further details, it is recommended to include a reference link to "the interface definition of industrial mobile robots proposed by the Mobile Robot Industry Alliance." This will enable readers to access the original source and gain a deeper understanding of the specific interface definitions used in the proposed CRD design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
- Developers:Building a complete solution for cloud robot management based on KubeEdge | ||
- End users:Publishing tasks according to requirements and realizing automatic robot control | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further Clarification of Target Users and Use Cases: Although you have mentioned the target users as developers and end users but lacks specific details. It is essential to elaborate on who these users are and provide specific industry use cases where this system will be applied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 👍 on this. since we can expect that broad use cases almost anything requires physical access, that could be the use cases. probably we would propose some specific use cases. that could be also better to bring up the concrete and feasible design details.
| battery_status | struct | Including information such as remaining battery power, charging status, etc. | | ||
| abnormal_events | struct[] | Record the exception information of the node | | ||
| robot_status | boolean | indicates whether robot is ready for scheduling | | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested Addition: Resource Definition - CPU, GPU, Memory, etc.
To better support resource management and scheduling for cloud robots, it is recommended to add resource-related definitions, including CPU, GPU, and memory, to the CRD. These resource definitions will assist the system in efficiently allocating and managing computing resources for cloud robots, thereby enhancing system performance and reliability.
|
||
| Field | Type | Description | | ||
| :---------------------: | :------: | :-----------------------------: | | ||
| registed_robots | string[] | Keep a list of registered nodes, including nodes name and a list of node devices | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RobotID definition for the robot above is uint
, why string[]
here? Why not just use node id
as robot id~
|
||
For this project, the architecture adopted basically follows the overall architecture of KubeEdge. The structure diagram of the project is shown in the figure below | ||
|
||
<img src="./images/architecture.png" alt="architecture" style="zoom:90%;" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should modify this architecture, if the node refers to the robot, what the device mean? If there are any new agent in the KubeEdge cluster, it should be clearly indicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JoeyHwong-gk: changing LGTM is restricted to collaborators In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
### Robot Registration | ||
|
||
<img src="./images/registration.png" alt="registration" style="zoom:90%;" /> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested additions: (no now)
|
3. task assignment: | ||
- The taskController monitors the creation of the task CR and obtains the task content then selects a suitable robot from the registered robot list in robotSync to assign this task. It also modifies the task_info field in the corresponding robot CR | ||
4. task execution: | ||
- The robotController listens to changes in the fields of the robot CR and obtains task-related information then it sends action messages to the corresponding nodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the robot network is unstable, how can we ensure that the action messages sent by robotController to robots are not lost?
|
||
For the *robotControlelr* | ||
|
||
1. The robotController monitors the changes of the robot resources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This place isn't very clear. For an edge node, we always install edgecore by keadm join xxx
. What is the process of registering a robot and installing edgecore?
Signed-off-by: zsj <sjzhou2000@163.com>
Signed-off-by: zsj <sjzhou2000@163.com>
/PTAL @fujitatomoya |
@JoeyHwong-gk thanks for pinging me on this, i will take a look and leave some comments later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PyPotato thanks for the effort, although this is draft, really interesting proposal. i have several questions and nitpicks which possibly include beginner's level stuff because of my experience for KubeEdge. cheers,
|
||
## Background | ||
|
||
The emergence of cloud-native architecture has further facilitated the development of cloud-native robotics technology. Cloud-native architecture encompasses characteristics such as containerization, microservices, and elastic scalability. These features enable robotic systems to adapt more effectively to complex work environments and diverse task requirements. Leveraging containerization technology for deployment and management, the maintainability, scalability, and flexibility of robot systems are significantly improved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Off topic from this proposal, but i would say one of the hard barrier is device abstraction. this is more like the problem to address by container device interfaces or k8s device-plugin, but actually robots and robotics can be really complicated and platform dependent, and how we can manage them to the containers is important.
Again, this is off topic from this enhancement proposal, but the following information to be shrared.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sharing, I will pay attention to this.
|
||
Operator is designed to achieve | ||
|
||
- automatizing robotic operations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
- automatizing robotic operations | |
- automatizing robotic operations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reminder, I will fix them and take care of these details.
|
||
The scope of this project includes | ||
|
||
- Providing a set of CRDs and an Operators for managing cloud robotics based on KubeEdge architecture |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Providing a set of CRDs and an Operators for managing cloud robotics based on KubeEdge architecture | |
- Providing a set of CRDs and an Operators for managing cloud robotics based on KubeEdge architecture |
|
||
- Providing a set of CRDs and an Operators for managing cloud robotics based on KubeEdge architecture | ||
- CRDs to define robot properties and states | ||
- CRDs to extend teh robot API, through which we can control the robot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- CRDs to extend teh robot API, through which we can control the robot | |
- CRDs to extend the robot API, through which we can control the robot |
|
||
- Developers:Building a complete solution for cloud robot management based on KubeEdge | ||
- End users:Publishing tasks according to requirements and realizing automatic robot control | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 👍 on this. since we can expect that broad use cases almost anything requires physical access, that could be the use cases. probably we would propose some specific use cases. that could be also better to bring up the concrete and feasible design details.
|
||
The current predominant approach for managing cloud robots is to treat the robot as a unified entity, considering it as a Device and accessing the KubeEdge platform for management and scheduling purposes. | ||
|
||
However, given that a robot is inherently composed of diverse sensors and controllers, it is more appropriate to view it as a combination of heterogeneous components rather than an indivisible arrangement. Consequently, in this solution, a robot is treated as an edge node, with its sensors serving as the device access platform for that node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think because of my lack of experience for k8s / kubeedge, having the hard time to understand this description.
my understanding is,
- operator manages the desired state and sync the state with CRD objects. that is said, we expect the application uses those APIs to CRD to operate and orchestrate the robots running in the edge environment from the cloud infrastructure.
- so the robot is the the node where edgecore runs to assign the spec/implementation to the resource object.
- it is likely that robots own the many sensors and devices connected directly such as lidar, radar, camera and audio. these devices are also managed as resources? there would be non-robotics application deployed to edgenode with robot CRD, it needs to be able to manage the resource, instead there would be access conflict between apps?
- How can we bind those sensors to the application container running in edge node? (Device-Plugin? and then attach those resource to robot CRD object?)
i might be mistaken, just checking if my understanding is okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your first two points are exactly right.
For the third point, Kubeedge manages the accessed devices through two CRDs, Device Model
and Device Instance
, whereas in my CRD definition, the fields related to sensors, are only there to determine that a certain robot is able to perform a specific task, but are not responsible for the management of these sensors.
And the last, when a robot node registers, it reports some necessary information, which includes the sensors it has, and that information will be updated into robot CRD. I hope this might answer your question.
| Fields | Type | Description | | ||
| :-----------------: | :------: | :------------------------------------: | | ||
| robotID | uint | Unique ID of a robot | | ||
| position | struct | Indicates the location of the robot, including coordinates, the last point passed, etc. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
once it comes to robots and robotics, this could bring the many questions. positions for what? geometry, robot arm? relative? in-house mapping position? it is nice to be flexible but if goes too far, it would hard to use for the user application? if this proposal specifies the position with clear statement, you can ignore my comment here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, to increase generalizability, many of the fields in the CRD follow the Specification for data interface between Industrial mobile robot and Robot control system, which defines criteria such as coordinates and position. In this case, position
indicates the position of the robot itself in the coordinate system defined by this standard.
|
||
- The fields contained in the **task** resource can be described as the table below: | ||
|
||
| Field | Type | Description | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about priority
, this sound required for me to do the traffic management and task management with using mobile robots. probably there would be some use case that task-A should prevail the task-B in urgent case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is indeed a requirement and I will take this into account in subsequent development.
<img src="./images/registration.png" alt="registration" style="zoom:70%;" /> | ||
|
||
Robot registration process: | ||
1. Once the robot node join in the cluster, it will send heartbeat message to the cloud |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the rational to have the robot node
but general node
for this? if this limits the operation on node
, that would be less flexibility for users? since they expect that robot node
can be running as normal edge node
? so that some other non-robotics services and application can be running?
2. If the Registration Controller finds that the heartbeat packet has not been received for a certain period of time, it will determine that the robot is offline. | ||
3. Remove the corresponding robot ID from the registered list. | ||
|
||
### Use case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the question for the architecture. do the robots on site can be expected to communicate each other? or everything needs to be managed by cloud as resource object? i think it is likely there would be problem about latency for collision detect and realtime application if it requires cloud communication always.
Signed-off-by: zsj <sjzhou2000@163.com>
// INSERT ADDITIONAL SPEC FIELDS - desired state of cluster | ||
// Important: Run "make" to regenerate code after modifying this file | ||
|
||
RobotID uint `json:"rbotID,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
@@ -0,0 +1,29 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommended to remove
# Image URL to use all building/pushing image targets | ||
IMG ?= controller:latest | ||
# ENVTEST_K8S_VERSION refers to the version of kubebuilder assets to be downloaded by envtest binary. | ||
ENVTEST_K8S_VERSION = 1.27.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check if this version will be supported by kubeedge
UnderTask bool `json:"underTask,omitempty"` | ||
TaskInfo TaskInfo `json:"taskInfo,omitempty"` | ||
AbnormalEvents []AbnormalEvents `json:"abnormalEvents,omitempty"` | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding comments would improve the readability and maintainability of the code.
p.mutex.Lock() | ||
defer p.mutex.Unlock() | ||
|
||
p.ids = append(p.ids, id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The IDPool
struct and its methods are used to manage a pool of IDs for robots. However, the Release
method does not check if the ID being released is already in the pool, which could lead to duplicate IDs. This could cause issues when allocating IDs. Consider adding a check in the Release
method to prevent duplicates.
func (p *IDPool) Release(id uint) {
p.mutex.Lock()
defer p.mutex.Unlock()
+ if !ContainsElement(p.ids, id) {
+ p.ids = append(p.ids, id)
+ }
}
// force close | ||
if err != nil { | ||
glog.Println("Force close websocket connection", err) | ||
conn.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The closeWebSocketConnection
function attempts to close a WebSocket connection by sending a close message up to three times. However, it does not return any error if it fails to send the close message after three attempts and instead forcefully closes the connection. Consider returning an error in this case to allow the caller to handle this situation.
// force close
if err != nil {
glog.Println("Force close websocket connection", err)
conn.Close()
+ return err
}
return nil
}
} | ||
|
||
func ContainsElement(arr []uint, target uint) bool { | ||
for _, item := range arr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ContainsElement
function checks if a slice contains a specific element. This function has a time complexity of O(n), which could be inefficient for large slices. Consider using a data structure with faster lookup times, like a map, if this function is called frequently.
} | ||
|
||
// Done | ||
func (r *RobotSyncReconciler) handleNodeDelete(ctx context.Context, node *corev1.Node) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The handleNodeDelete
function is responsible for handling the deletion of a Node
resource. It correctly fetches the Robot
resource associated with the node and removes it from the registered list. However, there is no error handling for the Delete
call. Consider adding error handling to ensure that any issues with deleting the Robot
resource are properly handled.
- r.Delete(ctx, robot)
+ if err := r.Delete(ctx, robot); err != nil {
+ glog.Println("Failed to delete robot", err)
+ return err
+ }
|
||
CRDs are defined to achieve | ||
|
||
- Difining robot resources |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo. The term "Difining" should be corrected to "Defining".
Signed-off-by: zsj <sjzhou2000@163.com>
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: fisherxu The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
/hold cancel |
Signed-off-by: zsj sjzhou2000@163.com
Add proposal for Definition of Cloud Robot CRD and Operator