New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scheduler is not working #1
Comments
Hi Mohammad, are you trying to schedule the web server (with SCHED_FIFO policy) using one of the ghost schedulers (say fifo_scheduler)? If yes, then you'll need to move the application threads into the ghost scheduling class. All of the various ghost schedulers only schedule tasks with policy == SCHED_GHOST. There's a couple of ways to move application threads into ghost (for your use case I'll recommend #3):
|
Thanks @neelnatu, Yep, it is exactly what I'm trying to do. I have misunderstood that the I'll try the tool and update the issue with the result. Thanks |
I tried the tool and it seems to be working fine. I pass the I tried again using the Core map
( 0 ) ( 1 ) ( 2 ) ( 3 ) ( 4 ) ( 5 ) ( 6 ) ( 7 )
Initializing...
Initialization complete, ghOSt active.
Once I migrate my web server's threads to Core map
( 0 ) ( 1 ) ( 2 ) ( 3 ) ( 4 ) ( 5 ) ( 6 ) ( 7 )
Initializing...
Initialization complete, ghOSt active.
PID 6149 Fatal segfault at addr 0x10:
[0] 0x7fe13de823c0 : __restore_rt
[1] 0x55babc33b2cb : ghost::BasicDispatchScheduler<>::DispatchMessage()
[2] 0x55babc339f19 : ghost::ShinjukuAgent::AgentThread()
[3] 0x55babc33e3c4 : ghost::Agent::ThreadBody()
[4] 0x7fe13e0d2de4 : (unknown)
For example, if I run a flask dev web server, I use the following command to pipe the pid of the running python process: pidof python | sudo ./pushtosched 18 Where do you think the problem might be from? Thanks |
Hi Mohammad, Neel is out of office today, so let me help you until he is back. Regarding Shinjuku, I assume you compiled with optimizations turned on, which strips out many of the debug symbols from the crash trace. What I assume is happening though is that since Shinjuku requires the client app being scheduled to set up a shared memory region for communication (check out the RocksDB experiment), it is crashing in your case because this region is not set up. I would not use the Shinjuku scheduler for scheduling the web server at the moment since it requires more extensive setup. As for the FIFO scheduler, you do not need to do any sort of setup beyond running pushtosched.c to move the threads into the ghOSt sched class, so let me get some more information from you.
I want to make sure the policy is up and running correctly on your machine. If it is, then we can delve into the specifics of the server. For example, the FIFO policy is non-preemptive, so perhaps certain threads are taking a while to run which is why you do not see the server responding. |
Hi Jack, Thanks for your hints. The Regarding the Shinjuku's scheduler, I haven't taken a look at it yet, but since this is for a research purpose I'm gonna need to run that too. I'll investigate the RocksDB experiment as you suggested to see how it works. Will keep this issue updated with the result. Many thanks to you guys for your kind help. 🌹 |
Hi Mohammad, Glad that it is working now. We're happy to open source the cgroup code, but there is a complication. We use cgroups v1 internally in Google whereas others generally use v2 nowadays. So our code may not be useful to others, and since cgroups v1 does not support inotify (to detect when a new thread is added to a cgroup), we rely on a periodic polling mechanism to detect new threads which may be too slow for a webserver that spawns a pthread for every new request. If you want to implement this functionality for cgroups v2, we would be thrilled to merge it in. Alternatively, is it possible for you to modify the pthread spawn code in the server to instead use |
For my current test, yes I can modify the source of my webserver. But in the end, I believe I should take my experiments on some latency-sensitive apps like Memcached, etc (not sure if Memcached spawns new threads in runtime, I assume it doesn't). Probably not easy to modify the source in this case. |
Hmm, once you move all threads of the webserver into ghost any new threads created by that application should automatically start out in the ghost sched_class. You can verify this by periodically doing a 'grep policy /proc//sched' for all tasks you get via Is that not happening in your case? |
Hi Neel, You're right. I double checked and It's working as you said. However, The throughout drops significantly with FIFO. Probably because of its non-preemptive implementation. Anyway, I'm trying to run the Shinjuku to compare the performance. But stillll have some problems with running it. I'll ask for your help if I couldn't fix it today. Thanks |
On Sun, Oct 31, 2021 at 4:03 PM Mohammad Siavashi ***@***.***> wrote:
Hmm, once you move all threads of the webserver into ghost any *new*
threads created by that application should automatically start out in the
ghost sched_class. You can verify this by periodically doing a 'grep policy
/proc//sched' for all tasks you get via ls /proc/$(pidof
webserver_app)/task.
Is that not happening in your case?
Hi Neel,
You're right. I double checked and It's working as you said. However, The
throughout drops significantly with FIFO. Probably because of its
non-preemptive implementation.
Yes, that's not surprising. In addition to being non-preemptive, the
fifo_scheduler also does not do any load balancing between cpus at runtime
(it statically assigns a task to a cpu and it stays there until it dies or
departs). It's goal is to serve as a demonstration of a per-cpu scheduling
model in ghost.
Anyway, I'm trying to run the Shinjuku to compare the performance. But
stillll have some problems with running it. I'll ask for your help if I
couldn't fix it today.
Yes, sounds good.
best
Neel
… Thanks
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADYOKDODKNKADKIN65G6XATUJXDNJANCNFSM5GRJBV2A>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
For Shinjuku, here are the main gotchas:
|
And just to be clear, Shinjuku is a centralized scheduler that has preemption. So you should get better QPS for your web server, though the spinning Shinjuku global agent will take up a logical core. |
Thanks, Just started to get the Shinjuku running, for now, it seems the I'm looking into it. |
That's likely because the Shinjuku experiments require a machine with at least 8 logical cores. You can lower this by updating |
Also take a look at the |
Shinjuku's experiment worked successfully. I have some clues now to get my webserver working with the Shinjuku scheduler thanks to your help. I'll integrate my webserver with PrioTable and hope it goes without a hitch. Thanks again |
Yes let us know. A key point is that you need to add a pthread to the PrioTable as soon as it is spawned, so make sure to allocate a large enough PrioTable when the server starts. Depending on the scheduling policy and the scheduling hints from the server that your research requires, it may make sense to ditch the PrioTable altogether since it becomes a bottleneck at high thread counts (since the agent needs to scan the table). You could use Shinjuku or FIFO as a starting point and then take out the PrioTable and slim down other irrelevant parts of the policies. Another option is to use the SOL scheduler as a starting point. It is a centralized FIFO scheduler with no PrioTable. Maybe adding preemption support to it (which should be relatively easy) would be what you're looking for. |
Dear @jackhumphries, I used the orchestrators as samples and implemented a very simple demonstration to get familiar with the code. Here is my code: void printHelloWorld(uint32_t num){
cout << "Hello World" << endl;
}
int main(int argc, char const *argv[])
{
ghost_test::Ghost ghost_(1, 1);
ghost_test::ExperimentThreadPool thread_pool_(1);
vector<ghost::GhostThread::KernelScheduler> kernelSchedulers;
vector<function<void (uint32_t)>> threadWork;
kernelSchedulers.push_back(ghost::GhostThread::KernelScheduler::kGhost);
threadWork.push_back(&printHelloWorld);
thread_pool_.Init(kernelSchedulers, threadWork);
ghost::sched_item si;
ghost_.GetSchedItem(0, si);
si.sid = 0;
si.gpid = thread_pool_.GetGtids()[0].id();
si.flags |= SCHED_ITEM_RUNNABLE;
ghost_.SetSchedItem(0, si);
return 0;
}
As you can see, I create only 1 ghOSt thread, assigned my
Any idea why this repetition happens? Am I missing something? My second question is, how is it possible to ditch the PrioTable? Is it easily possible with the current implementation or it requires a new implementation of Shinjuku? I assume it should not be easy to achieve this with the current implementation of its agent. I'm I correct? Thanks |
Hi Mohammad, In the experiments directory, we have a helper thread pool that runs a thread body over and over again until the thread is marked as ready to exit. Below is some updated code that prints "Hello World" only once, as you would expect. Just update the include paths to work with your build.
Basically, I use notifications to wait for the ghOSt thread to print once, then I mark the ghOSt thread as ready to exit which causes the thread pool to let it exit. Then I call You can modify the thread pool to avoid the repeating behavior or just create the ghOSt threads directly yourself. If you want to ditch the PrioTable altogether (which I would recommend for your case), then I would take a look at the SOL scheduler. It is a centralized FIFO scheduler without a PrioTable. The only downside is that it is not preemptive, but you can implement this very easily yourself. On each iteration of the global scheduling loop, see how long each thread has been running so far and then schedule something else on a CPU whose currently running thread has exceeded the time slice. |
Hi Mohammad, Please let me know if you have any additional questions. I will close this thread for now. |
… ghost during agent restart. Prior to this change task discovery would abort with the following error: "ERROR: Got repeated ESTALEs from a quiescent reassociation for gtid 4494803534350351, flags 7!" When a task departs and then moves back into ghost there can be more than one status_word that refers to the same task (all except the most recently allocated one are associated with earlier incarnations of the task in ghost). By definition using the 'barrier' from an earlier incarnation will result in queue association failing with ESTALE. We now detect if we are dealing with an orphaned status_word by checking GHOST_SW_F_CANFREE and freeing the status_word in that case. Note that we could still get a false positive on queue association if the barrier in the _orphaned_ status_word happens to match the barrier of the _current_ status_word. This will be fixed but this is how it "works" currently: - the first false-positive association injects a synthetic TASK_NEW. - any subsequent false-positive associations are no-ops because they return GHOST_ASSOC_SF_ALREADY (thanks to the first false-positive). - all status words associated with earlier incarnations are leaked. - agent restarts successfully and task is scheduled. This false-positive can be engineered by killing the agent and moving the task into and out of ghost multiple times (each of the orphaned status words will have a barrier of 2: TASK_NEW + TASK_DEPARTED). After the final move into ghost we can use 'taskset -p <cpumask> <pid>' and bump the barrier of the current status_word to 2: TASK_NEW + AFFINITY_CHANGED. TESTED: # create enclave mount -t ghost ghost /sys/fs/ghost echo "create 1" > /sys/fs/ghost/ctl echo "0-5" > /sys/fs/ghost/enclave_1/cpulist # start agent fifo_agent --enclave=/sys/fs/ghost/enclave_1 & # start test program cat > /tmp/t.sh << EOF #!/bin/bash while [ 1 ]; do sleep 1 iter=$((iter+1)) echo $iter done EOF /tmp/t.sh & # move /tmp/t.sh into ghost echo $T_SH_PID | pushtosched 18 # kill the agent kill -INT $FIFO_AGENT_PID # move /tmp/t.sh out of ghost and then back into ghost. echo $T_SH_PID | pushtosched 0 echo $T_SH_PID | pushtosched 18 # We now have two status words with gtid matching /tmp/t.sh: # - the live status word associated with the current incarnation. # - the old status word associated with the previous incarnation. # restart the agent fifo_agent --enclave=/sys/fs/ghost/enclave_1 & Prior to this change DiscoverTasks would error out trying to associate the old status_word since it would always return ESTALE (see below). root@(none):/usr/local/google/home/neelnatu/linux/9xx# Initializing... sw(0,6): gtid 0xff8000000080f(2044), flags 0x7 Associating with status_word in region 0, index 6 Trying to associate with gtid 4494803534350351(2044), flags 0x7, runtime 23719147, barrier 38 AssociateTask failed: errno 116 Trying to associate with gtid 4494803534350351(2044), flags 0x7, runtime 23719147, barrier 38 AssociateTask failed: errno 116 ./third_party/ghost/lib/scheduler.h:231(2088) ERROR: Got repeated ESTALEs from a quiescent reassociation for gtid 4494803534350351, flags 7! PID 2087 Backtrace: [0] 0x56180ef1782d : ghost::Exit() [1] 0x56180eed9eb3 : ghost::BasicDispatchScheduler<>::DiscoverTasks()::{lambda()#1}::operator()() [2] 0x56180eed9dae : std::__u::__function::__policy_invoker<>::__call_impl<>() [3] 0x56180eef1faf : ghost::StatusWordTable::ForEachTaskStatusWord() [4] 0x56180eee861d : ghost::LocalEnclave::ForEachTaskStatusWord() [5] 0x56180eed9262 : ghost::BasicDispatchScheduler<>::DiscoverTasks() [6] 0x56180eee762c : ghost::Enclave::Ready() [7] 0x56180eed882e : ghost::FullFifoAgent<>::FullFifoAgent() [8] 0x56180eed85db : std::__u::make_unique<>() [9] 0x56180eed7d4e : ghost::AgentProcess<>::AgentProcess() [10] 0x56180eed7091 : main [11] 0x7f26c3bb6bbd : __libc_start_main [12] 0x56180eed60e9 : _start
Specifically the bug is that we are depending on the initialization of GHOST_TID_SEQNUM_BITS (in base.cc) during the initialization of another global Agent::kVersionCheck (in agent.cc). Since there is no ordering across compilation units this can result in accessing GHOST_TID_SEQNUM_BITS before it has been initialized. READ of size 4 at 0x7ff981751180 thread T0 #0 0x7ff98134368f in ghost::gtid(long) third_party/ghost/lib/base.cc:122:19 #1 0x7ff98134196d in GetGtid third_party/ghost/lib/base.cc:212:28 #2 0x7ff98134196d in Current third_party/ghost/lib/base.h:164:40 #3 0x7ff98134196d in ghost::Exit(int) third_party/ghost/lib/base.cc:285:28 #4 0x7ff981b93db4 in ghost::Ghost::MountGhostfs() third_party/ghost/lib/ghost.cc:147:5 #5 0x7ff981b93fe5 in ghost::Ghost::GetSupportedVersions(std::__u::vector<unsigned int, std::__u::allocator<unsigned int> >&) third_party/ghost/lib/ghost.cc:155:5 #6 0x7ff982102744 in ghost::Ghost::CheckVersion() third_party/ghost/lib/ghost.h:274:5 #7 0x7ff98217c564 in __cxx_global_var_init third_party/ghost/lib/agent.cc:102:35 #8 0x7ff98217c564 in _GLOBAL__sub_I_agent.cc third_party/ghost/lib/agent.cc #9 0x7ff9cc37b4cc in call_init (/usr/grte/v5/lib64/ld-linux-x86-64.so.2+0x1c4cc) #10 0x7ff9cc37b329 in _dl_init (/usr/grte/v5/lib64/ld-linux-x86-64.so.2+0x1c329) Fix this by caching the result in a function-local static variable in get_tid_seqnum_bits() which is guaranteed to be initialized the first time it is accessed, and this initialization will happen before any other access. TESTED=all units tests pass in virtme
Hi,
My question might be naive but I'm trying to execute some experiments with the provided schedulers (e.g. fifo scheduler, shinjuku scheduler). I've noticed that the agent's executes and initializes the cores successfully, however it appears to me that it does not schedule any process. I have placed a few
printf
s in theEnqueue
method as well as the constructor. Theprintf
of the constructor prints, once the scheduler is initialized, however the prints inside of theenqueue
method, do not get called at all. No significant difference in performance observed as well.I have a web server running on the
SCHED_FIFO
policy with99
priority.The provided tests also pass but the scheduler does not log anything except for core mapping and initialization prints.
I also debugged the kernel to see if the
SCHED_GHOST
policy of the kernel works. It seems the kernel schedules the agent perfectly, however, the agent does not schedule the real-time (or any) of my processes.So simply my question is, how to make sure the agent's scheduler is working correctly when the
enqueue
method seems not to be called at all?I appreciate your help.
Thanks
The text was updated successfully, but these errors were encountered: