Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix and refactor device increate count #8782

Merged
merged 5 commits into from Apr 5, 2024

Conversation

Apokleos
Copy link
Contributor

@Apokleos Apokleos commented Jan 8, 2024

runtime-rs: bugfix incorrect use of refcount before vfio attach

When there's a pod with multiple containers, there may be case that
attach point more than 2, we should not return Err in that case when
we are doing attach ops, but just return Ok.

runtime-rs: introduce dedicated function do_increase_count/do_decrease_count

Since there are many implementations of reference counting in the
drivers, all of which have the same implementation, we should try
to reduce such duplicated code as much as possible. Therefore, a
new function is introduced to solve the problem of duplicated code.

runtime-rs: refactor increase_attach_count with do_increase_count/do_decrease_count

Try to reduce duplicated code in increase_attach_count with public
new function do_increase_count/do_decrease_count.

Fixes: #8738

@Apokleos Apokleos changed the title Device increate count bugfix and refactor device increate count Jan 8, 2024
@katacontainersbot katacontainersbot added the size/large Task of significant size label Jan 8, 2024
@Apokleos Apokleos self-assigned this Jan 8, 2024
Copy link
Member

@justxuewei justxuewei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! A few comments here.

return Err(anyhow!("attach count increased failed as some reason."));
warn!(
sl!(),
"It's not allowed that device {:?} is attached many times,", self.device_id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is more concise.

Suggested change
"It's not allowed that device {:?} is attached many times,", self.device_id
"The device {:?} is not allowed to be attached multiple times.", self.device_id

Comment on lines 75 to 91
0 => {
// When ref_count is 0, it indicates that the device is new here, In this case, we know that the device
// has not been attached into the Guest.
*ref_count += 1;
Ok(false)
}
// While in practice, the number of times a device is attempted to be inserted into the Guest cannot reach
// this maximum value, the device is theoretically allowed to do so.
// However, we will not allow the device to actually be inserted into the Guest more than once.
std::u64::MAX => Err(anyhow!("device was attached too many times")),
_ => {
// When ref_count is greater than 0, it indicates that how many times the device has been attempted to
// be inserted into the Guest.
// In this case, we know the device has been attached into the Guest.
*ref_count += 1;
Ok(true)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx @justxuewei Although I know that the checked_add() method is a good choice in some cases, I don't think it's appropriate here. And I don't want to introduce too much judgment logic because of this method, so I choose to stick to the original approach.

Another reason is, the code are just moved here from its original places, and the key point is, simple and it works well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justxuewei Maybe saturating_add(1)/saturating_sub(1) is a good choice !

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, It's indeed better to refactor it with check_add as @justxuewei said:

checked_add(1).ok_or(anyhow!("error msg")?;
return Ok(*ref_count != 1))

}

// do_decrease_count
pub fn do_decrease_count(ref_count: &mut u64) -> Result<bool> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

@@ -69,9 +69,55 @@ pub(crate) fn get_virt_drive_name(mut index: i32) -> Result<String> {
Ok(String::from(PREFIX) + std::str::from_utf8(&disk_letters)?)
}

// do_increase_count
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove it or expand on this, as this comment doesn't provide any valuable information.

}
}

// do_decrease_count
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

@Apokleos Apokleos force-pushed the device-increate-count branch 2 times, most recently from dc81a77 to 7b7153a Compare April 1, 2024 03:23
When there's a pod with multiple containers, there may be case that
attach point more than 2, we should not return Err in that case when
we are doing attach ops, but just return Ok.

Fixes: kata-containers#8738

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
@Apokleos Apokleos force-pushed the device-increate-count branch 2 times, most recently from 37b4796 to 25ed1c2 Compare April 1, 2024 08:06
@Apokleos Apokleos requested a review from Tim-Zhang April 1, 2024 09:40
@Apokleos
Copy link
Contributor Author

Apokleos commented Apr 2, 2024

/test

Copy link
Member

@justxuewei justxuewei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm, thanks! @Apokleos

Copy link
Member

@studychao studychao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, a few comments.

Since there are many implementations of reference counting in the
drivers, all of which have the same implementation, we should try
to reduce such duplicated code as much as possible. Therefore, a
new function is introduced to solve the problem of duplicated code.

Fixes: kata-containers#8738

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Introduce a dedicated public function do_decrease_count to
reduce duplicated code in drivers' decrease_attach_count.

Fixes: kata-containers#8738

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Try to reduce duplicated code in increase_attach_count with public
new function do_increase_count.

Fixes: kata-containers#8738

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Try to reduce duplicated code in decrease_attach_count with public
new function do_decrease_count.

Fixes: kata-containers#8738

Signed-off-by: Alex Lyn <alex.lyn@antgroup.com>
Copy link
Member

@studychao studychao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Apokleos
Copy link
Contributor Author

Apokleos commented Apr 3, 2024

/test

@Apokleos Apokleos merged commit 0e0a361 into kata-containers:main Apr 5, 2024
289 of 299 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ok-to-test size/large Task of significant size
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants