Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ToUserDevice+FromUserDevice unsafe across /click/config changes #123

Open
pallas opened this issue Apr 30, 2014 · 6 comments
Open

ToUserDevice+FromUserDevice unsafe across /click/config changes #123

pallas opened this issue Apr 30, 2014 · 6 comments

Comments

@pallas
Copy link
Contributor

pallas commented Apr 30, 2014

With the following program, slow_read,

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int
main(int, char*[]) {
  char c;
  while (1 == read(STDIN_FILENO, &c, sizeof c)) {
    if (write(STDOUT_FILENO, &c, sizeof c) < 0)
      return EXIT_FAILURE;
    sleep(1);
  }
  return EXIT_SUCCESS;
}

install the /click/config

InfiniteSource(LIMIT -1, LENGTH 1, ACTIVE true)
-> ToUserDevice(0, ENCAP none);

and then

mknod /dev/click0 c 241 0
slow_read < /dev/click0 &
sleep 5
echo "" > /click/config

and watch the kernel crashes.

The issue is that the methods of ToUserDevice are not synchronized with the click config in any way, so while they are using elem it might disappear out from under them. It seems like clickfs might need to export the read versions of lock_config and unlock_config in order to prevent this from happening while elem is live.

I can implement that solution @kohler, but I wanted to get your take first. It also is not clear to me whether the router should be blocked + unblocked as well.

@kohler
Copy link
Owner

kohler commented Apr 30, 2014

I don't completely agree. ToUserDevice::cleanup() is trying to spin until no one is waiting for it (see the _exit boolean and the loop). That might be insufficient/incorrect, but I think something simpler is missing. cleanup() should walk over elem_map[] and set all entries that equal this to null instead. Then when a user program calls back into the device they'll exit immediately.

@pallas
Copy link
Contributor Author

pallas commented Apr 30, 2014

What if we're in the middle of ToUserDevice::dev_read when /dev/config is written?

I guess it doesn't matter why the Element is being destroyed (so /dev/config is incidental to the problem) but _lock is not held around all places where elem should be valid and since it is a member we get into trouble if elem is destroyed in the meantime.

Since _proc_queue and _lock are members, using them is problematic to begin with. Perhaps all that is needed here is a global lock that controls access to elem_map and an atomic ref count on TUDs that indicates whether cleanup can go forward.

SMP version of the issue:

CPU0
ToUserDevice::dev_read
  ToUserDevice *elem = GETELEM(filp); // valid here
  // _lock is not held and we are not in _proc_queue

CPU1
ToUserDevice::cleanup
  elem_map[_dev_minor] = 0;
  ...
~ToUserDevice

CPU0
  elem->_read_count++; // crash!

@kohler
Copy link
Owner

kohler commented Apr 30, 2014

If we;'re in the middle of dev_read then we should have the lock. That's
why the destructor for ToUD should acquire the lock...

On Wed, Apr 30, 2014 at 12:19 PM, pallas notifications@github.com wrote:

What if we're in the middle of ToUserDevice::dev_read when /dev/config is
written?

I guess it doesn't matter why the Element is being destroyed (so
/dev/config is incidental to the problem) but _lock is not held around
all places where elem should be valid and since it is a member we get
into trouble if elem is destroyed in the meantime.

Since _proc_queue and _lock are members, using them is problematic to
begin with. Perhaps all that is needed here is a global lock that controls
access to elem_map and an atomic ref count on TUDs that indicates whether
cleanup can go forward.

SMP version of the issue:

CPU0
ToUserDevice::dev_read
ToUserDevice *elem = GETELEM(filp); // valid here
// _lock is not held and we are not in _proc_queue

CPU1
ToUserDevice::cleanup
elem_map[_dev_minor] = 0;
...
~ToUserDevice

CPU0
elem->_read_count++; // crash!


Reply to this email directly or view it on GitHubhttps://github.com//issues/123#issuecomment-41816870
.

@kohler
Copy link
Owner

kohler commented Apr 30, 2014

Sorry, I ahve a deadline so take my effluvia as weak. But there is
definitely a way to solve this problem with locks, perhaps involving
rw-locks around the device map (GETELEM).

On Wed, Apr 30, 2014 at 12:20 PM, Eddie Kohler ekohler@gmail.com wrote:

If we;'re in the middle of dev_read then we should have the lock. That's
why the destructor for ToUD should acquire the lock...

On Wed, Apr 30, 2014 at 12:19 PM, pallas notifications@github.com wrote:

What if we're in the middle of ToUserDevice::dev_read when /dev/config is
written?

I guess it doesn't matter why the Element is being destroyed (so
/dev/config is incidental to the problem) but _lock is not held around
all places where elem should be valid and since it is a member we get
into trouble if elem is destroyed in the meantime.

Since _proc_queue and _lock are members, using them is problematic to
begin with. Perhaps all that is needed here is a global lock that controls
access to elem_map and an atomic ref count on TUDs that indicates
whether cleanup can go forward.

SMP version of the issue:

CPU0
ToUserDevice::dev_read
ToUserDevice *elem = GETELEM(filp); // valid here
// _lock is not held and we are not in _proc_queue

CPU1
ToUserDevice::cleanup
elem_map[_dev_minor] = 0;
...
~ToUserDevice

CPU0
elem->_read_count++; // crash!


Reply to this email directly or view it on GitHubhttps://github.com//issues/123#issuecomment-41816870
.

@pallas
Copy link
Contributor Author

pallas commented May 2, 2014

So I have good newses and bad news. The bad news is that I verified that this is actually a problem. The good news is that it is very rare and I have a patch, which I'll prepare for pull soon. The other good news is that the more frequent problem was a local change to replace register_chrdev & friends with cdev_init & friends, since the former is deprecated, and I can turn that into a mainline pull as well if you're interested.

@kohler
Copy link
Owner

kohler commented Jul 28, 2014

I would love to see cdev_init as a patch, fwiw.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants