# Lab 1: High-Performance Computing Environment
___

In this lab, we will set up a PC-based parallel computing environment in our computer laboratory and try some MPI-based parallel computing. In the future, we might extend it to a Hadoop+Spark computational platform.
![](images/clusters.png)

### Prerequisites

#### System
- CentOS 6.x, CentOS 7.x
- RHEL 6.x, 7.x
- Scientific Linux 6.x, 7.x
- SUSE Linux Enterprise Server 11, 12
    * Run daemons: `service` -> `systemctl`
    * daemon scripts: `/etc/init.d/` -> `/usr/lib/systemd/system/`

#### Software
- `yum install libtool libxml2-devel openssl-devel boost-devel`
- [Torque](http://www.adaptivecomputing.com)
- [MPICH](http://www.mpich.org/)
- [Environment Modules](http://modules.sourceforge.net/)

## 1. Prepation: Architecture settings

1. Each PC cluster are comprised of 4 computers - one as the **master node**: (for resource management), the other three as the **slave nodes** (for executing computational tasks).

2. The master node communicates with the slave nodes via **SSH**, without entering the password.

3. All the nodes share part of the file system using **NFS**.

## 2. Network configuration

On each node, add the names and corresponding IP address of all the nodes in our cluster to `/etc/hosts` file, as shown in the following example:
```
127.0.0.1    localhost
192.168.5.1   node01   # master
192.168.5.2   node02   # slave
192.168.5.3   node03   # slave
192.168.5.4   node04    # slave
```
and then restart the network:
```bash
/etc/init.d/network restart
```

## 3. SSH setup

On `node01` (the master node), run as a non-root user:
```bash
ssh-keygen -t rsa -C "your_email@example.com"
```
Do NOT enter any password. This will generate two files under your `~/.ssh` directory:
- **id_rsa.pub**: public key (公钥)
- **id_rsa**: private key (私钥)

Copy the public key to another file:
```bash
cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys
```

Repeat the above process on all the nodes. And merge the files `~/.ssh/authorized_keys` into a single file and then copy to all the nodes under the same directory `~/.ssh/`.

Now you can transfer between these nodes without entering password, cool.

```bash
# on node 01
ssh node02
```

## 4. Setup NFS for shared file system

On node01, add something like the folloiwng to the file `/etc/exports`, set the two file systems `/usr/local` and `/home` shared by the other nodes in our cluster.
```
/home    192.168.5.0/24(rw,no_root_squash)
/usr/local    192.168.5.0/24(rw,no_root_squash)
```

and then restart the NFS service:
```bash
chkconfig --add nfs
/etc/init.d/nfs restart
```

For slave nodes, you should add two lines to the file `/etc/fstab`:
```bash
node01:/home    /home    nfs   default    0 0
node01:/usr/local    /usr/local  nfs    default    0 0
```

and then run
```bash
mount -a
```
to mount the files systems.

## 5. Install Torque for Job management

### Configuration on the server (node01)

On the master node (**node01**), run the following command to install the most popular open-source Job Manager, Torque:
```bash
yum install git
git clone https://github.com/adaptivecomputing/torque.git -b 6.0.2 6.0.2
cd 6.0.2
./autogen.sh
./configure --disable-gui
make && make install
```

And then begin configuration:
```bash
echo $(hostname) > /var/spool/torque/server_name
cp contrib/init.d/trqauthd /etc/init.d/
chkconfig --add trqauthd
echo /usr/local/lib > /etc/ld.so.conf.d/torque.conf
ldconfig
/etc/init.d/trqauthd start
qterm
./torque.setup root
```

And then add the nodes resources into the file:
```bash
cat > /var/spool/torque/server_priv/nodes <<EOF
node01  np=2
node02  np=2
node03  np=2
node04  np=2
EOF
```

Copy the starting scripts to `/init.d`, and start the service:
```bash
cp contrib/init.d/pbs_server /etc/init.d/
cp contrib/init.d/pbs_sched /etc/init.d/
chkconfig --add pbs_server
chkconfig --add pbs_sched
/etc/init.d/pbs_server restart
/etc/init.d/pbs_sched restart
```

### 5.2 Configuration on the mom nodes (node02, node03, node04)

Build the packages for the slave nodes:
```bash
## run on node01 
make packages
for i in 2 3 4; do
    scp contrib/init.d/pbs_mom node0"${i}":/etc/init.d/
    scp torque-package-mom-linux-x86_64.sh node0"${i}":/root/
    scp torque-package-clients-linux-x86_64.sh node0"${i}":/root
    ssh node0${i} ./torque-package-mom-linux-x86_64.sh --install
    ssh node0${i} ./torque-package-clients-linux-x86_64.sh --install
    ssh node0${i} chkconfig --add pbs_mom
    ssh node0${i} /etc/init.d/pbs_mom restart
done
```

Be sure to check whether the service has started.

And then add the following lines to the file `/var/spool/torque/mom_priv/config`:
```bash
for node in 2 3 4; do
    ssh node0${node} echo > /var/spool/torque/mom_priv/config <<EOF
\$pbsserver    node01    # hostname running pbs server     
\$logevent      225         # bitmap of which events to log   
 EOF
done
```

<font color="red">Note</font>: If you want to run both the server and mom on a single PC, give the server and mom different host names for a same PC.

### 5.3 Some important configuration files
- `/var/spool/torque/server_name`
- `/var/spool/torque/server_priv/nodes`
- `/var/spool/torque/mom_priv/config`

## 6. Environment Modules – A Great Tool for Clusters

The Environment Modules package provides for the dynamic modification of a user's environment via **modulefiles**.

You can use Environment Modules to alter or change environment variables such as \$PATH, \$MANPATH, \$LD_LIBRARY_LOAD, and others.

### 6.1 Installation and Configuration

Download the Environment Modules from [sourceforge.net](http://modules.sourceforge.net)
```bash
cd /usr/local
mkdir Modules
cd Modules
mkdir src
cp modules-3.2.6.tar.gz /usr/local/Modules/src
tar xzvf modules-3.2.6.tar.gz
cd modules-3.2.6
cd /usr/local/Modules/src/modules-3.2.6
./configure 
make && make install

cd /usr/local/Modules
ln -s 3.2.6 default

cp /usr/local/Modules/default/init/sh /etc/profile.d/modules.sh
chmod 755 /etc/profile.d/modules.sh
```

Now users can use Environment Modules by just putting the following in their *.bashrc* or *.profile*:

```bash
. /etc/profile.d/modules.sh
```

#### Some configuration files
- `/usr/share/Modules/int/`
    * `/usr/share/Modules/.modulepath`
    * `/usr/share/Modules/init/sh`
- `/usr/share/Modules/modulefiles/` directory
- `/etc/modulefiles/` directory
- `/etc/profile.d/modules.sh`

### 6.2 Using Environment Modules

To begin, I’ll assume that Environment Modules is installed and functioning correctly, so you can now test a few of the options typically used. 

- **List the available modules**:
```bash
module avail
```

- **Load the required software**:
```bash
module load bwa
```

- **Remove the modules from the current environment**:
```bash
module unload bwa
```

- **List the loaded softwares**:
```bash
module list
```


### 6.3 Writing module files

Here is an example for a newer version of `compilers/gcc-4.6.2`:
```
#%Module1.0#####################################################################
##
## modules compilers/gcc-4.6.2
##
## modulefiles/compilers/gcc-4.6.2
##
proc ModulesHelp { } {
        global version modroot

        puts stderr "compilers/gcc-4.6.2 - sets the Environment for GCC 4.6.2 in my home directory"
}

module-whatis   "Sets the environment for using gcc-4.6.2 compilers (C, Fortran)"

# for Tcl script use only
set     topdir          /usr/local/gcc-4.6.2
set     version         4.6.2
set     sys             linux86

setenv          CC              $topdir/bin/gcc
setenv          GCC             $topdir/bin/gcc
setenv          FC              $topdir/bin/gfortran
setenv          F77             $topdir/bin/gfortran
setenv          F90             $topdir/bin/gfortran
prepend-path    PATH            $topdir/include
prepend-path    PATH            $topdir/bin
prepend-path    MANPATH         $topdir/man
prepend-path    LD_LIBRARY_PATH $topdir/lib

```


### 6.4 Module example together with torque

Here is the combination of `Environment Modules` and `Torque`:
```bash
#PBS -l nodes=2:ppn=8,pmem=1000mb,walltime=8:00:00
#PBS -m example
#PBS -M sample_email@sjtu.edu.cn

module load bwa
bwa index input.fasta
bwa aln -t $PBS_NP input.fasta input.fq > output.sai
```

## 7. MPI Parallel Programming Development

MPI is the abbreviation of `Message Passing Interface`, 

### 7.1 Install MPICH2

Now installation of MPICH2 has become an easy job, you can run `yum install mpich2` directly as privileged user.

## 8. Practical MPI programming

The details will be illustrated in the [lab notebook](lab1.pdf).