Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue removing click kernel module. #407

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

neoben
Copy link

@neoben neoben commented Jul 23, 2018

The click router threads refuse to die because no one is setting the _stop_flag.
There is an infinite loop in RouterThread::driver() function.
Call request_stop killing router in master.cc to fix the problem.

The click router threads refuse to die because no one is setting the _stop_flag.
There is an infinite loop in RouterThread::driver() function.
Call request_stop killing router in master.cc to fix the problem.
@tbarbette
Copy link
Collaborator

tbarbette commented Jul 23, 2018

Shouldn't it be already dying at that point? What sequence of actions lead your router to stay alive?

@neoben
Copy link
Author

neoben commented Jul 23, 2018

The loop inside RouterThread::driver() function breaks if _stop_flag is set to true:

#if !BSD_NETISRSCHED
        // check to see if driver is stopped
        if (_stop_flag && _master->verify_stop(this))
            break;
#endif

If I'm not wrong, there is just a function setting this flag which is request_stop() in master.hh:

inline void
Master::request_stop()
{
    for (RouterThread **t = _threads; t != _threads + _nthreads; ++t)
        (*t)->_stop_flag = true;
    // ensure that at least one thread is awake to handle the stop event
    wake_somebody();
}

Without calling request_stop from Master::kill_router(Router *router) , when I remove the click kernel module (using rmmod) I receive this kind of error:

[  386.600000] click: current router threads refuse to die!
[  386.610000] click: Following threads still active, expect a crash:
[  386.620000] click:   router thread pid 1916

So if _stop_flag is false, RouterThread::driver() is an infinite loop and I suppose this is why kill_router_threads() from linuxmodule/sched.cc fails after 5 seconds and several attempts:

static int
kill_router_threads()
{
    delete placeholder_router;
    if (click_router)
	click_router->set_runcount(Router::STOP_RUNCOUNT);

    // wait up to 5 seconds for routers to exit
    unsigned long out_jiffies = jiffies + 5 * HZ;
    int num_threads;
    do {
	MDEBUG("click_sched: waiting for threads to die");
	SOFT_SPIN_LOCK(&click_thread_lock);
	num_threads = click_thread_tasks->size();
	SPIN_UNLOCK(&click_thread_lock);
	if (num_threads > 0)
	    schedule();
    } while (num_threads > 0 && jiffies < out_jiffies);

    if (num_threads > 0) {
	printk(KERN_ALERT "click: current router threads refuse to die!\n");
	return -1;
    } else
	return 0;
}

I think this issue is related just to kernel space and it works fine user space.
Let me know if this helps or if I'm missing something.
Thanks!

@tbarbette
Copy link
Collaborator

I'm less familiar with kernel click. However request_stop should be called from adjust_runcount/set_runcount.
When you unload the module, click_cleanup_sched is called, which calls kill_router_threads.
kill_router_threads set the runcount to STOP_RUNCOUNT using set_runcount, that in turn should set the _stop_flag to true, no?

@neoben
Copy link
Author

neoben commented Jul 23, 2018

Yes, but set_runcount with STOP_RUNCOUNT is called just if click_router is defined and it can be undefined (actually this is my case) so there is no function calling set_runcount:

    if (click_router)
	click_router->set_runcount(Router::STOP_RUNCOUNT);

@tbarbette
Copy link
Collaborator

I acknowledge the problem, I had it myself. However setting always stop_flag to true seems dangerous. If run_count is still not 0 because some references are still kept, the router should not finish to die.

Maybe you can detail your sequence of event? How did click_router got undefined without run_count being 0? That's the real issue, I think.

@neoben
Copy link
Author

neoben commented Jul 23, 2018

I will investigate more about click_router getting undefined.

@neoben
Copy link
Author

neoben commented Jul 23, 2018

Anyway I discovered that the change I made was part of the trunk code before and it was removed at some point:
85f1315#diff-fd6f8a65a460329f21fa8327a96ce179

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants