Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add proposal and implementation for Definition of Cloud Robot CRD and Operator #147

Merged
merged 8 commits into from
Oct 31, 2023

Conversation

PyPotato
Copy link
Contributor

Signed-off-by: zsj sjzhou2000@163.com
Add proposal for Definition of Cloud Robot CRD and Operator

@kubeedge-bot
Copy link
Collaborator

Welcome @PyPotato! It looks like this is your first PR to kubeedge/community 🎉

@kubeedge-bot kubeedge-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 24, 2023
Signed-off-by: zsj <sjzhou2000@163.com>
To achieve our goal, we need three kinds of CRDs: **robot CRD**, **task CRD** and **robotSync CRD**. The robot CRD and task CRD describe the robot status and task information respectively, and the robotSync CRD is used to record the number of robot CR and their configuration informations. When the key information of the robot changes, the status information must be reported immediately, otherwise the information is reported in the form of a heartbeat frame at a period set by the user. Task resources are created by users and delivered by the Controller.

- The fields contained in the **robot** resource can be described as the table below:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested Addition: Enhancing Robot CRD with Sensor-Related Fields

To ensure accurate assignment of sensor-related tasks, it is highly recommended to enrich the Robot CRD with sensor-related fields. These additional fields will enable more precise task allocation, taking into account the capabilities and availability of the robot's sensors. It can be an array including the sensor list that you think is optional for the agv/amr robot.

robot Architecture

- CRDs to define robot properties and states
- CRDs to extend teh robot API, through which we can control the robot
- A series of controllers to realize the automatic control of the robot
- This project is to design the robot CRD based on the interface definition of industrial mobile robots proposed by the Mobile Robot Industry Alliance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested Addition: Adding Reference Link for "the interface definition of industrial mobile robots proposed by the Mobile Robot Industry Alliance"

To provide more context and allow interested readers to explore further details, it is recommended to include a reference link to "the interface definition of industrial mobile robots proposed by the Mobile Robot Industry Alliance." This will enable readers to access the original source and gain a deeper understanding of the specific interface definitions used in the proposed CRD design.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


- Developers:Building a complete solution for cloud robot management based on KubeEdge
- End users:Publishing tasks according to requirements and realizing automatic robot control

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further Clarification of Target Users and Use Cases: Although you have mentioned the target users as developers and end users but lacks specific details. It is essential to elaborate on who these users are and provide specific industry use cases where this system will be applied.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 👍 on this. since we can expect that broad use cases almost anything requires physical access, that could be the use cases. probably we would propose some specific use cases. that could be also better to bring up the concrete and feasible design details.

| battery_status | struct | Including information such as remaining battery power, charging status, etc. |
| abnormal_events | struct[] | Record the exception information of the node |
| robot_status | boolean | indicates whether robot is ready for scheduling |

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested Addition: Resource Definition - CPU, GPU, Memory, etc.

To better support resource management and scheduling for cloud robots, it is recommended to add resource-related definitions, including CPU, GPU, and memory, to the CRD. These resource definitions will assist the system in efficiently allocating and managing computing resources for cloud robots, thereby enhancing system performance and reliability.


| Field | Type | Description |
| :---------------------: | :------: | :-----------------------------: |
| registed_robots | string[] | Keep a list of registered nodes, including nodes name and a list of node devices |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RobotID definition for the robot above is uint, why string[] here? Why not just use node id as robot id~


For this project, the architecture adopted basically follows the overall architecture of KubeEdge. The structure diagram of the project is shown in the figure below

<img src="./images/architecture.png" alt="architecture" style="zoom:90%;" />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should modify this architecture, if the node refers to the robot, what the device mean? If there are any new agent in the KubeEdge cluster, it should be clearly indicated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for example:
image

@kubeedge-bot
Copy link
Collaborator

@JoeyHwong-gk: changing LGTM is restricted to collaborators

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

### Robot Registration

<img src="./images/registration.png" alt="registration" style="zoom:90%;" />

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just for reference:

image

@JoeyHwong-gk
Copy link
Contributor

Suggested additions: (no now)

  1. Comparison with Existing definition.
  2. API Reference
  3. installation
  4. Concepts (Task、Sensor、Devices、Robots、Fleetment ...)
  5. Examples
    ...

@PyPotato PyPotato marked this pull request as draft July 25, 2023 06:50
@kubeedge-bot kubeedge-bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 25, 2023
3. task assignment:
- The taskController monitors the creation of the task CR and obtains the task content then selects a suitable robot from the registered robot list in robotSync to assign this task. It also modifies the task_info field in the corresponding robot CR
4. task execution:
- The robotController listens to changes in the fields of the robot CR and obtains task-related information then it sends action messages to the corresponding nodes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the robot network is unstable, how can we ensure that the action messages sent by robotController to robots are not lost?


For the *robotControlelr*

1. The robotController monitors the changes of the robot resources
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This place isn't very clear. For an edge node, we always install edgecore by keadm join xxx. What is the process of registering a robot and installing edgecore?

@JoeyHwong-gk
Copy link
Contributor

JoeyHwong-gk commented Aug 9, 2023

/PTAL @fujitatomoya
Do you have any ideas or suggestions to share regarding this proposal ?

@fujitatomoya
Copy link
Contributor

@JoeyHwong-gk thanks for pinging me on this, i will take a look and leave some comments later.

Copy link
Contributor

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PyPotato thanks for the effort, although this is draft, really interesting proposal. i have several questions and nitpicks which possibly include beginner's level stuff because of my experience for KubeEdge. cheers,


## Background

The emergence of cloud-native architecture has further facilitated the development of cloud-native robotics technology. Cloud-native architecture encompasses characteristics such as containerization, microservices, and elastic scalability. These features enable robotic systems to adapt more effectively to complex work environments and diverse task requirements. Leveraging containerization technology for deployment and management, the maintainability, scalability, and flexibility of robot systems are significantly improved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Off topic from this proposal, but i would say one of the hard barrier is device abstraction. this is more like the problem to address by container device interfaces or k8s device-plugin, but actually robots and robotics can be really complicated and platform dependent, and how we can manage them to the containers is important.
Again, this is off topic from this enhancement proposal, but the following information to be shrared.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sharing, I will pay attention to this.


Operator is designed to achieve

- automatizing robotic operations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
- automatizing robotic operations
- automatizing robotic operations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reminder, I will fix them and take care of these details.


The scope of this project includes

- Providing a set of CRDs and an Operators for managing cloud robotics based on KubeEdge architecture
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Providing a set of CRDs and an Operators for managing cloud robotics based on KubeEdge architecture
- Providing a set of CRDs and an Operators for managing cloud robotics based on KubeEdge architecture


- Providing a set of CRDs and an Operators for managing cloud robotics based on KubeEdge architecture
- CRDs to define robot properties and states
- CRDs to extend teh robot API, through which we can control the robot
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- CRDs to extend teh robot API, through which we can control the robot
- CRDs to extend the robot API, through which we can control the robot


- Developers:Building a complete solution for cloud robot management based on KubeEdge
- End users:Publishing tasks according to requirements and realizing automatic robot control

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 👍 on this. since we can expect that broad use cases almost anything requires physical access, that could be the use cases. probably we would propose some specific use cases. that could be also better to bring up the concrete and feasible design details.


The current predominant approach for managing cloud robots is to treat the robot as a unified entity, considering it as a Device and accessing the KubeEdge platform for management and scheduling purposes.

However, given that a robot is inherently composed of diverse sensors and controllers, it is more appropriate to view it as a combination of heterogeneous components rather than an indivisible arrangement. Consequently, in this solution, a robot is treated as an edge node, with its sensors serving as the device access platform for that node.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think because of my lack of experience for k8s / kubeedge, having the hard time to understand this description.
my understanding is,

  • operator manages the desired state and sync the state with CRD objects. that is said, we expect the application uses those APIs to CRD to operate and orchestrate the robots running in the edge environment from the cloud infrastructure.
  • so the robot is the the node where edgecore runs to assign the spec/implementation to the resource object.
  • it is likely that robots own the many sensors and devices connected directly such as lidar, radar, camera and audio. these devices are also managed as resources? there would be non-robotics application deployed to edgenode with robot CRD, it needs to be able to manage the resource, instead there would be access conflict between apps?
  • How can we bind those sensors to the application container running in edge node? (Device-Plugin? and then attach those resource to robot CRD object?)

i might be mistaken, just checking if my understanding is okay.

Copy link
Contributor Author

@PyPotato PyPotato Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your first two points are exactly right.

For the third point, Kubeedge manages the accessed devices through two CRDs, Device Model and Device Instance, whereas in my CRD definition, the fields related to sensors, are only there to determine that a certain robot is able to perform a specific task, but are not responsible for the management of these sensors.

And the last, when a robot node registers, it reports some necessary information, which includes the sensors it has, and that information will be updated into robot CRD. I hope this might answer your question.

| Fields | Type | Description |
| :-----------------: | :------: | :------------------------------------: |
| robotID | uint | Unique ID of a robot |
| position | struct | Indicates the location of the robot, including coordinates, the last point passed, etc. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once it comes to robots and robotics, this could bring the many questions. positions for what? geometry, robot arm? relative? in-house mapping position? it is nice to be flexible but if goes too far, it would hard to use for the user application? if this proposal specifies the position with clear statement, you can ignore my comment here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, to increase generalizability, many of the fields in the CRD follow the Specification for data interface between Industrial mobile robot and Robot control system, which defines criteria such as coordinates and position. In this case, position indicates the position of the robot itself in the coordinate system defined by this standard.


- The fields contained in the **task** resource can be described as the table below:

| Field | Type | Description |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about priority, this sound required for me to do the traffic management and task management with using mobile robots. probably there would be some use case that task-A should prevail the task-B in urgent case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is indeed a requirement and I will take this into account in subsequent development.

<img src="./images/registration.png" alt="registration" style="zoom:70%;" />

Robot registration process:
1. Once the robot node join in the cluster, it will send heartbeat message to the cloud
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the rational to have the robot node but general node for this? if this limits the operation on node, that would be less flexibility for users? since they expect that robot node can be running as normal edge node? so that some other non-robotics services and application can be running?

2. If the Registration Controller finds that the heartbeat packet has not been received for a certain period of time, it will determine that the robot is offline.
3. Remove the corresponding robot ID from the registered list.

### Use case
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the question for the architecture. do the robots on site can be expected to communicate each other? or everything needs to be managed by cloud as resource object? i think it is likely there would be problem about latency for collision detect and realtime application if it requires cloud communication always.

@kubeedge-bot kubeedge-bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 18, 2023
// INSERT ADDITIONAL SPEC FIELDS - desired state of cluster
// Important: Run "make" to regenerate code after modifying this file

RobotID uint `json:"rbotID,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

@@ -0,0 +1,29 @@

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommended to remove

# Image URL to use all building/pushing image targets
IMG ?= controller:latest
# ENVTEST_K8S_VERSION refers to the version of kubebuilder assets to be downloaded by envtest binary.
ENVTEST_K8S_VERSION = 1.27.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check if this version will be supported by kubeedge

UnderTask bool `json:"underTask,omitempty"`
TaskInfo TaskInfo `json:"taskInfo,omitempty"`
AbnormalEvents []AbnormalEvents `json:"abnormalEvents,omitempty"`
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding comments would improve the readability and maintainability of the code.

p.mutex.Lock()
defer p.mutex.Unlock()

p.ids = append(p.ids, id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IDPool struct and its methods are used to manage a pool of IDs for robots. However, the Release method does not check if the ID being released is already in the pool, which could lead to duplicate IDs. This could cause issues when allocating IDs. Consider adding a check in the Release method to prevent duplicates.

func (p *IDPool) Release(id uint) {
	p.mutex.Lock()
	defer p.mutex.Unlock()

+	if !ContainsElement(p.ids, id) {
+		p.ids = append(p.ids, id)
+	}
}

// force close
if err != nil {
glog.Println("Force close websocket connection", err)
conn.Close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The closeWebSocketConnection function attempts to close a WebSocket connection by sending a close message up to three times. However, it does not return any error if it fails to send the close message after three attempts and instead forcefully closes the connection. Consider returning an error in this case to allow the caller to handle this situation.

	// force close
	if err != nil {
		glog.Println("Force close websocket connection", err)
		conn.Close()
+		return err
	}
	return nil
}

}

func ContainsElement(arr []uint, target uint) bool {
for _, item := range arr {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ContainsElement function checks if a slice contains a specific element. This function has a time complexity of O(n), which could be inefficient for large slices. Consider using a data structure with faster lookup times, like a map, if this function is called frequently.

}

// Done
func (r *RobotSyncReconciler) handleNodeDelete(ctx context.Context, node *corev1.Node) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The handleNodeDelete function is responsible for handling the deletion of a Node resource. It correctly fetches the Robot resource associated with the node and removes it from the registered list. However, there is no error handling for the Delete call. Consider adding error handling to ensure that any issues with deleting the Robot resource are properly handled.

-	r.Delete(ctx, robot)
+	if err := r.Delete(ctx, robot); err != nil {
+		glog.Println("Failed to delete robot", err)
+		return err
+	}


CRDs are defined to achieve

- Difining robot resources
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo. The term "Difining" should be corrected to "Defining".

Signed-off-by: zsj <sjzhou2000@163.com>
@PyPotato PyPotato marked this pull request as ready for review October 19, 2023 16:14
@kubeedge-bot kubeedge-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 19, 2023
@JoeyHwong-gk
Copy link
Contributor

/lgtm
I have completed the code tests. PTAL @Poorunga @fisherxu , thx.

@kubeedge-bot kubeedge-bot added the lgtm Indicates that a PR is ready to be merged. label Oct 23, 2023
@PyPotato PyPotato changed the title add proposal for Definition of Cloud Robot CRD and Operator add proposal and implementation for Definition of Cloud Robot CRD and Operator Oct 26, 2023
@fisherxu
Copy link
Member

/approve
/hold
Leave for @Poorunga

@kubeedge-bot kubeedge-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 30, 2023
@kubeedge-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fisherxu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubeedge-bot kubeedge-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 30, 2023
@Poorunga
Copy link
Member

/lgtm

@fisherxu
Copy link
Member

/hold cancel

@kubeedge-bot kubeedge-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 31, 2023
@kubeedge-bot kubeedge-bot merged commit 3b4413d into kubeedge:master Oct 31, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants