Skip to content

Comments

[core] Check cgroupv2 mount status#51141

Merged
edoakes merged 14 commits intoray-project:masterfrom
dentiny:hjiang/check-cgroupv2-mount-status
Mar 20, 2025
Merged

[core] Check cgroupv2 mount status#51141
edoakes merged 14 commits intoray-project:masterfrom
dentiny:hjiang/check-cgroupv2-mount-status

Conversation

@dentiny
Copy link
Contributor

@dentiny dentiny commented Mar 6, 2025

This PR does two things:

  • Update the existing mount status check from linux command to C++ syscalls;
  • Add check it's cgroupv2 but not cgroupv1 gets mounted.

Additional check:

  • I checked with cgroupv2 mounted with read-only mode, result ok
[/workspaces/pg_duckdb] (hjiang/fix-httpfs-semantics) 
postgres@5959e7653aeb$ mount | grep cgroup
cgroup on /sys/fs/cgroup type cgroup2 (ro,nosuid,nodev,noexec,relatime)

Signed-off-by: dentiny <dentinyhao@gmail.com>
@dentiny dentiny added the go add ONLY when ready to merge, run all tests label Mar 6, 2025
@dentiny dentiny requested review from edoakes, israbbani and jjyao March 6, 2025 22:23
Copy link
Contributor

@israbbani israbbani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I left a few small comments

Signed-off-by: dentiny <dentinyhao@gmail.com>
@dentiny dentiny force-pushed the hjiang/check-cgroupv2-mount-status branch from ae73a1a to b7df013 Compare March 7, 2025 01:54
Comment on lines 28 to 31
TEST(CgroupV2UtilsTest, CheckCgroupV2Mount) {
EXPECT_TRUE(IsCgroupV2MountedAsRw("/sys/fs/cgroup"));
EXPECT_FALSE(IsCgroupV2MountedAsRw("/tmp/non_existent_folder"));
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are more cases we should be testing here:

  • cgroup v2 is mounted but not readable
  • cgroup v2 is mounted but read only
  • cgroup v2 is mounted and rw but in a different path
  • cgroup v1 is mounted at the path
  • cgroup v1 and v2 are both mounted

per the suggestion above, I would also change the function to return a detail message about the reason the function returns false and validate those details in this test

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's also ensure that the test is running in an automated fashion. shouldn't require manual setup commands

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to more tests and automated testing.

We can't expect everyone's local box to have the correct environment setup for this. Is there an example we can follow for:

  • Tests that are only enabled in certain CI environments
  • Setting up new CI environments for this project. (Docker, K8S, VM)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are more cases we should be testing here:

Is it something we should do in one unit test? That means you need to have root permission to mount and unmount cgroup folder.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you propose a holistic testing plan that will cover these cases in a way you think makes sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing plan here: https://docs.google.com/document/d/10qM5J5rv3anKXr1S0OWDZIOk75e6auU5YF90u5EtRjo/edit?tab=t.0#heading=h.o0t94extwg3

Also discussed with Ibrahim offline on the testing cases and return values.

@israbbani
Copy link
Contributor

For testing purposes it's probably better to return true/false from this function and add the check/fatal error at the callsite.

@edoakes, I think you get to write tests against the same binary behavior (true/false or failure/non-failure). For fatal errors, I prefer throwing the error as close to the problem as possible so you get more context for debugging.

@edoakes
Copy link
Collaborator

edoakes commented Mar 7, 2025

For testing purposes it's probably better to return true/false from this function and add the check/fatal error at the callsite.

@edoakes, I think you get to write tests against the same binary behavior (true/false or failure/non-failure). For fatal errors, I prefer throwing the error as close to the problem as possible so you get more context for debugging.

Fine w/ me

@dentiny dentiny force-pushed the hjiang/check-cgroupv2-mount-status branch from 73b2585 to 6df11a1 Compare March 19, 2025 00:23
dentiny added 5 commits March 19, 2025 00:26
Signed-off-by: dentiny <dentinyhao@gmail.com>
Signed-off-by: dentiny <dentinyhao@gmail.com>
Signed-off-by: dentiny <dentinyhao@gmail.com>
Signed-off-by: dentiny <dentinyhao@gmail.com>
Signed-off-by: dentiny <dentinyhao@gmail.com>
@dentiny dentiny requested a review from edoakes March 19, 2025 01:05
Copy link
Collaborator

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically all style comments/nits

}

TEST(CgroupV2UtilsTest, CgroupV2DirectoryNotExist) {
EXPECT_FALSE(IsCgroupV2Prepared("/tmp/non_existent_folder").ok());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have a RAY_ASSERT_NOT_OK to match RAY_ASSERT_OK? not sure actually

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why I didn't wrote the NON_OK at first place is, for error status, we better check status code, instead of merely error or not. Updated to check error status in the latest commit.

@edoakes
Copy link
Collaborator

edoakes commented Mar 19, 2025

the change to return a more detailed error message in the status looks great btw

Copy link
Contributor

@israbbani israbbani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm. A few small suggests around typos

Signed-off-by: dentiny <dentinyhao@gmail.com>
dentiny added 2 commits March 19, 2025 20:45
Signed-off-by: dentiny <dentinyhao@gmail.com>
Signed-off-by: dentiny <dentinyhao@gmail.com>
@dentiny dentiny requested review from edoakes and israbbani March 19, 2025 21:01
Copy link
Contributor

@israbbani israbbani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

Signed-off-by: dentiny <dentinyhao@gmail.com>
@edoakes edoakes merged commit f7dfe9e into ray-project:master Mar 20, 2025
5 checks passed
dhakshin32 pushed a commit to dhakshin32/ray that referenced this pull request Mar 27, 2025
This PR does two things:
- Update the existing mount status check from linux command to C++
syscalls;
- Add check it's cgroupv2 but not cgroupv1 gets mounted.

Additional check:
- I checked with cgroupv2 mounted with read-only mode, result ok
```
[/workspaces/pg_duckdb] (hjiang/fix-httpfs-semantics)
postgres@5959e7653aeb$ mount | grep cgroup
cgroup on /sys/fs/cgroup type cgroup2 (ro,nosuid,nodev,noexec,relatime)
```

---------

Signed-off-by: dentiny <dentinyhao@gmail.com>
Signed-off-by: Dhakshin Suriakannu <d_suriakannu@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-backlog go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants