Careful servers

harisankarh edited this page Aug 27, 2012 · 2 revisions
Clone this wiki locally

Careful servers: servers which power off safely during cooling system power outages

Careful servers is a daemon which runs in servers which safely powers off them during cooling system power outages.

The problem

We had two racks of about 30 servers in our university for student use. They were placed in a room having 2 air conditioners(A/Cs) for cooling. Because of the scarcity of UPS power, only the servers were powered through UPS while the A/Cs where running on raw power. This led to issues when there were power outages. During power outages, the A/Cs will be off while the servers could be running on UPS power for hours. This will cause the servers to heat up and could potentially lead to fire accidents.

The solution

Ideally the A/Cs and servers should have the same power source. But, due to infrastructure limitations, we didn't have that option(neither we had temperature sensors or configurable thermal shutdown in the servers). Hence, the servers had to be shut down soon after the raw power goes off. The raw power usually goes down for short intervals and comes back. If it comes back within a small duration of time: say 5 minutes, we need not shut down the servers affecting all the running tasks. Hence, we had to shut down the servers only if the raw power goes down for more than 5 minutes.

Hence we had to first detect when the raw power goes off, the wait for 5 minutes and if the raw power is still off, shutdown the machines. Steps to realize each of these are described below:

Detecting raw power outages

The raw power outages had to be detected by all the servers. We implemented a heartbeat mechanism by which all the servers periodically pinged another machine(called a power node) which is directly connected to raw power(similar to the A/Cs). In our case, we used a small defunct wireless router as the power node. Its wireless functionality was not working, but it was configured to have a static ip when connected to the wired LAN.

Ignoring short-term raw power outages

Even if each of the servers are configured to ping the power node every 5 minutes and shut down on a ping fail, all the servers won't wait for at least 5 minutes to ignore a short-term raw power outage . Because, a server can detect a short-term 2 minute raw power outage within 1 minute and shut down. To avoid this, we implemented a small finite state machine-based long-term failure detector as shown in figure. FSM of long-term failure detector

The initial state is NORMAL. Steps to be performed at each state are shown below.

NORMAL: wait for 1 minute and ping power node

PRE-FAIL: wait for 5 minutes and ping power node

SHUTDOWN: shutdown the server

Thus, the failure detector will frequently(every 1 minute) ping normally and wait for 5 minutes to check again on detecting a failure. If the power node is not up within 5 minutes, the server shuts itself down.

Making sure the daemon is always up

We have to invoke the state machine as a daemon on startup. But, an issue with simply putting it in rc.local is that if the daemon accidentally gets killed(due to insufficient system memory etc.), then the safety is compromised. So we wrote another wrapper script which will check if the deamon is up and will start if not. This script is called(as root's process) every 5 minutes by adding an entry to /etc/crontab.

Steps for installation

  1. Copy the files careful.c(FSM implementation) and script for careful) to all the servers.

  2. Set the IP of the power node in (Note that the wait times are also configurable(by default set to 1 and 5 minutes))

  3. Compile careful.c in the server.

    cc careful.c -o careful

  4. Copy careful and to /usr/bin

    sudo mv careful /usr/bin/careful

    sudo mv /usr/bin/

  5. Add to /etc/crontab and restart the servers.

    sudo nano /etc/crontab

The /etc/crontab should have an entry like:

#executes every 5 minutes
*/5 *	* * *	root	/usr/bin/

Future works

I guess the next step should be to randomize the waiting times a bit to avoid synchronized high ping load on the power nodes. Pull requests welcome!