## PRP gigiot-fiona GridFTP and system tuning notebook

<code>This notebook is running a Bash kernel and can execute any simple executable or shell command

In [1]:
uname -r

3.10.0-693.17.1.el7.x86_64


<br>
<br>
<code> Install the Globus yum repository

In [6]:
cat << EOF > /etc/yum.repos.d/globus.repo
[globus]
name=Extra Packages for Enterprise Linux 7 - $basearch
baseurl=http://137.164.48.196/centos-7-x86_64/globus
failovermethod=priority
enabled=1
EOF

In [7]:
rpm --import http://fiona-10-01:8000/RPM-GPG-KEY-Globus

In [8]:
cat << EOF > /etc/yum.repos.d/perfsonar.repo
[perfsonar]
name=Extra Packages for Enterprise Linux 7 - $basearch
baseurl=http://137.164.48.196/centos-7-x86_64/perfsonar
failovermethod=priority
enabled=1
gpgcheck=1
EOF

<br>
<br>
<code> Disable SELINUX to allow code installation

In [9]:
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
setenforce 0

<br>
<br>
<code> enable TP for accurate clock sync

In [10]:
yum -y install ntp
systemctl enable ntpd
systemctl start ntpd

Loaded plugins: fastestmirror
globus                                                   | 2.9 kB     00:00     
perfsonar                                                | 2.9 kB     00:00     
globus/primary_db                                          | 636 kB   00:00     
Loading mirror speeds from cached hostfile
Package ntp-4.2.6p5-25.el7.centos.2.x86_64 already installed and latest version
Nothing to do


<br>
<br>
<code> Check for clock sync

In [11]:
ntpq -p

     remote           refid      st t when poll reach   delay   offset  jitter
*fiona-10-01     69.89.207.99     2 u  133  256  377    0.260    2.066   0.478


<br>
<br>
<code> Check the hostname <br>
Must respond with FQHN

In [12]:
hostname -f

gigiot-22.conf.cenic.org


<br>
<br>
<code>detect the interface used as the default route

In [13]:
default_iface=$(awk '$2 == 00000000 { print $1 }' /proc/net/route)
echo $default_iface

enp3s0


<br>
<br>
<code> Use ethtool to disable adaptive-rx on the interface<br>ifconfig for setting the txqueuelen of the default interface<br>and tc to enable fair queueing and set pacing to 1Gbps

In [14]:
if [ ! -f /opt/prp/netconf.sh ]; then
mkdir -p /opt/prp
cat>/opt/prp/netconf.sh <<EOL
#/usr/sbin/ethtool -C $default_iface adaptive-rx off # not supported on EL-20-3700
/sbin/ifconfig $default_iface txqueuelen 10000
/usr/sbin/tc qdisc add dev $default_iface root fq maxrate 1gbit
EOL
chmod +x /opt/prp/netconf.sh
fi

<br>
<br>
<code>Install the netconf tuning script as a systemd service

In [15]:
if [ ! -f /etc/systemd/system/prp-net.service ]; then
cat>/etc/systemd/system/prp-net.service <<EOL
[Unit]
After=network.target

[Service]
ExecStart=/opt/prp/netconf.sh

[Install]
WantedBy=default.target
EOL
systemctl daemon-reload
systemctl enable prp-net.service
fi

Created symlink from /etc/systemd/system/default.target.wants/prp-net.service to /etc/systemd/system/prp-net.service.


<br>
<br>
<code> These settings are tuning the TCP stack in the OS increasing default values to allow large flows from remote systems with high RTT. The bbr congestion control was developed by Google and now available in most Linux distributions.

In [16]:
if [ ! -f /etc/sysctl.d/prp.conf ]; then
cat>/etc/sysctl.d/prp.conf <<EOL
net.core.rmem_max = 67108864 
net.core.wmem_max = 67108864 
net.ipv4.tcp_rmem = 4096 87380 33554432
net.ipv4.tcp_wmem = 4096 65536 33554432
net.ipv4.tcp_congestion_control=bbr
net.ipv4.tcp_mtu_probing=1
net.core.default_qdisc = fq
EOL
fi

<br>
<br>
<code>Add a series of options to grub.cfg and run grub2-mkconfig to register the new kernel parameters 

In [17]:
if ! grep --quiet default_hugepagesz /etc/default/grub; then
	sed -i 's/^GRUB_CMDLINE_LINUX=\"[^\"]*/& default_hugepagesz=1GB hugepagesz=1GB hugepages=8 elevator=noop processor.max_cstate=1 intel_iommu=off idle=poll intel_idle.max_cstate=0 nosoftlockup mce=ignore_mce/' /etc/default/grub
	grub2-mkconfig -o /boot/grub2/grub.cfg
fi

Generating grub configuration file ...
Found linux image: /boot/vmlinuz-3.10.0-693.17.1.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-693.17.1.el7.x86_64.img
Found linux image: /boot/vmlinuz-3.10.0-693.el7.x86_64
Found initrd image: /boot/initramfs-3.10.0-693.el7.x86_64.img
Found linux image: /boot/vmlinuz-0-rescue-a909d43b6cb248ee8a139582f5a92e90
Found initrd image: /boot/initramfs-0-rescue-a909d43b6cb248ee8a139582f5a92e90.img
done


<br>
<br>
<code>Look at the interface queueing setting to the left of "state"

In [18]:
ip a | grep UP | grep -v lo

2: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq state UP qlen 1000


# SAVE your work NOW

In [None]:
reboot

<br>
<br>
<code> Now look again at the queueing setting to the left of "state"<br>
<code> You should see "fq" <br>
<code> This shows "Fair Queueing" is enabled.

In [1]:
ip a | grep UP | grep -v lo

2: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq state UP qlen 1000


<br>
<br>
<code> Disable unwanted services

In [2]:
systemctl stop wpa_supplicant
systemctl disable wpa_supplicant
systemctl stop NetworkManager 
systemctl disable NetworkManager
systemctl stop irqbalance.service
systemctl disable irqbalance.service

Removed symlink /etc/systemd/system/multi-user.target.wants/irqbalance.service.


<br>
<br>
<code> Enable tuned for managing server performance profiles

In [3]:
systemctl start tuned
systemctl enable tuned

<br>
<br>
<code> Check the CPU speed

In [4]:
cat /proc/cpuinfo | grep MH

cpu [01;31m[KMH[m[Kz		: 480.000
cpu [01;31m[KMH[m[Kz		: 480.000
cpu [01;31m[KMH[m[Kz		: 640.000
cpu [01;31m[KMH[m[Kz		: 720.000


<br>
<br>
<code> Check current performace tuning setting

In [5]:
tuned-adm active

Current active profile: balanced


<br>
<br>
<code> Set tuned profile to network-throughput

In [6]:
tuned-adm profile network-throughput

<br>
<br>
<code> Check CPU speed again.<br>
You should see the cores running at max.

In [7]:
cat /proc/cpuinfo | grep MH

cpu [01;31m[KMH[m[Kz		: 1601.000
cpu [01;31m[KMH[m[Kz		: 1601.000
cpu [01;31m[KMH[m[Kz		: 1601.000
cpu [01;31m[KMH[m[Kz		: 1601.000


<br>
<br>
<code> Add port restrictions to the root users bash environment

In [8]:
cat << EOF >> /root/.bashrc
export GLOBUS_TCP_PORT_RANGE=50000,51000
export GLOBUS_TCP_SOURCE_RANGE=50000,51000
EOF

<br>
<br>
<code> Install the globus connect server opensource repository and yum install the globus-connect-server package

In [9]:
rpm --import http://fiona-10-01:8000/RPM-GPG-KEY-Globus
yum -y update
yum -y install globus-connect-server 

Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
Resolving Dependencies
--> Running transaction check
---> Package libevent.x86_64 0:2.0.21-4.el7 will be updated
---> Package libevent.x86_64 0:2.0.22-1.rhel7 will be an update
--> Finished Dependency Resolution

Dependencies Resolved

 Package          Arch           Version                Repository         Size
Updating:
 libevent         x86_64         2.0.22-1.rhel7         perfsonar         212 k

Transaction Summary
Upgrade  1 Package

Total download size: 212 k
Downloading packages:
Delta RPMs disabled because /usr/bin/applydeltarpm not installed.
Public key for libevent-2.0.22-1.rhel7.x86_64.rpm is not installed
libevent-2.0.22-1.rhel7.x86_64.rpm                         | 212 kB   00:00     


Public key for libevent-2.0.22-1.rhel7.x86_64.rpm is not installed
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
Resolving Dependencies
--> Running transaction check
---> Package globus-co

<br>
<br>
<code> Add the gridftp user and enable the service using chkconfig

In [10]:
adduser gridftp
chkconfig globus-gridftp-server on
chkconfig globus-gridftp-sshftp on

<br>
<br>
<code> Replace the default config file with this new config that <br>
allows anonymous access to a restricted read only volume.

In [11]:
cat << EOF >/etc/gridftp.conf
\$GLOBUS_ERROR_VERBOSE 1
\$GLOBUS_TCP_PORT_RANGE 50000,51000
\$GLOBUS_TCP_SOURCE_RANGE 50000,51000
port 2811
log_level ALL
log_single /var/log/gridftp.log
log_transfer /var/log/gridftp-transfer.log
allow_anonymous 1
anonymous_user gridftp
disable_command_list "APPE,RNTO,RNFR,ESTO,RMD,CHMOD,SITE,RDEL"
restrict_paths /data,/dev/zero,/dev/null
dc_whitelist udt,gsi,tcp
EOF

<br>
<br>
<code> Reload the globus gridftp server applies the changed config file.

In [12]:
/etc/init.d/globus-gridftp-server force-reload

Stopping globus-gridftp-server: OK
Starting globus-gridftp-server: OK


<br>
<br>
<code> Install the python gridftp log parser

In [13]:
pip install --upgrade esmond-client

Collecting esmond-client
  Downloading esmond_client-2.0.tar.gz (63kB)
[K    100% |████████████████████████████████| 71kB 1.8MB/s 
[?25hCollecting requests (from esmond-client)
  Downloading requests-2.18.4-py2.py3-none-any.whl (88kB)
[K    100% |████████████████████████████████| 92kB 2.5MB/s 
[?25hRequirement already up-to-date: python-dateutil in /usr/lib/python2.7/site-packages (from esmond-client)
Requirement already up-to-date: certifi>=2017.4.17 in /usr/lib/python2.7/site-packages (from requests->esmond-client)
Collecting chardet<3.1.0,>=3.0.2 (from requests->esmond-client)
  Downloading chardet-3.0.4-py2.py3-none-any.whl (133kB)
[K    100% |████████████████████████████████| 143kB 2.3MB/s 
[?25hCollecting idna<2.7,>=2.5 (from requests->esmond-client)
  Downloading idna-2.6-py2.py3-none-any.whl (56kB)
[K    100% |████████████████████████████████| 61kB 3.0MB/s 
[?25hRequirement already up-to-date: urllib3<1.23,>=1.21.1 in /usr/lib/python2.7/site-packages (from requests->esm

<br>
<br>
<code> Grab a helper script that gragefully times out the globus url copy if it exceeds a value in seconds.

In [14]:
mkdir /opt/esmond-gridftp/
wget http://fiona-10-01:8000/timeout.sh -O /usr/local/bin/timeout.sh
chmod 755 /usr/local/bin/timeout.sh

--2018-03-03 14:21:02--  http://fiona-10-01:8000/timeout.sh
Resolving fiona-10-01 (fiona-10-01)... 137.164.48.196
Connecting to fiona-10-01 (fiona-10-01)|137.164.48.196|:8000... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2658 (2.6K) [application/x-sh]
Saving to: ‘/usr/local/bin/timeout.sh’


2018-03-03 14:22:47 (176 MB/s) - ‘/usr/local/bin/timeout.sh’ saved [2658/2658]



<br>
<br>
<code> Wget the cron-load-gridftp.sh script and make it executable

In [15]:
wget http://fiona-10-01:8000/cron-load-gridftp.sh -O /usr/local/bin/cron-load-gridftp.sh
chmod 755 /usr/local/bin/cron-load-gridftp.sh

--2018-03-03 14:22:47--  http://fiona-10-01:8000/cron-load-gridftp.sh
Resolving fiona-10-01 (fiona-10-01)... 137.164.48.196
Connecting to fiona-10-01 (fiona-10-01)|137.164.48.196|:8000... connected.
HTTP request sent, awaiting response... 200 OK
Length: 781 [application/x-sh]
Saving to: ‘/usr/local/bin/cron-load-gridftp.sh’


2018-03-03 14:23:17 (86.6 MB/s) - ‘/usr/local/bin/cron-load-gridftp.sh’ saved [781/781]



<br>
<br>
<code> Inspect the contents of the script.

In [16]:
cat /usr/local/bin/cron-load-gridftp.sh

#!/bin/bash
TMPFILE=`mktemp`
python /usr/bin/esmond-ps-load-gridftp -p /opt/esmond-gridftp/prp-maddash.pickle -U http://fiona-10-02/esmond -u gridftp -k 252d94ad58d2ad08811bba190e441f9e37b65a50 -f /var/log/gridftp-transfer.log

wget http://fiona-10-01:8000/cron-gridftp-transfer-mesh.sh -O /usr/local/bin/cron-gridftp-transfer-mesh.sh
chmod 755 /usr/local/bin/cron-gridftp-transfer-mesh.sh
sed -i '/'"$HOSTNAME"'/s/^/#/' /usr/local/bin/cron-gridftp-transfer-mesh.sh
curl -s  http://fiona-10-01:8000/cron-mesh-timing.sh -o $TMPFILE
MY_HOSTNAME=`hostname -f`

grep -iq $MY_HOSTNAME $TMPFILE
RC=$?

if [[ $RC -eq 0 ]];then
      grep -i $MY_HOSTNAME $TMPFILE > /etc/cron.d/cron-gridftp-transfer-mesh
else
      logger -t "$0" 'Error while downloading cronjob files'
fi

rm -f $TMPFILE

<br>
<br>
<code> This script scrapes the gridftp log files for transfers that it has not included in its pickel file.<br>
The results are uploaded to the central measurement archive running on fiona-10-02

In [17]:
/usr/local/bin/cron-load-gridftp.sh 

ts=2018-03-03 14:23:33,378 event=main.start id=1520115813 no last record found
ts=2018-03-03 14:23:33,379 event=main.process id=1520115813 No last record, processing /var/log/gridftp-transfer.log
ts=2018-03-03 14:23:33,379 event=scan_and_load_netlogger.end id=1520115813 Loaded 0 records
--2018-03-03 14:23:33--  http://fiona-10-01:8000/cron-gridftp-transfer-mesh.sh
Resolving fiona-10-01 (fiona-10-01)... 137.164.48.196
Connecting to fiona-10-01 (fiona-10-01)|137.164.48.196|:8000... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3423 (3.3K) [application/x-sh]
Saving to: ‘/usr/local/bin/cron-gridftp-transfer-mesh.sh’


2018-03-03 14:23:33 (116 MB/s) - ‘/usr/local/bin/cron-gridftp-transfer-mesh.sh’ saved [3423/3423]



<br>
<br>
<code> Running this file the first time pulles in the cron-gridftp-transfer-mesh for the host.

In [18]:
chmod 644 /etc/cron.d/cron-gridftp-transfer-mesh

<br>
<br>
<code> Inspect the cron job

In [19]:
cat /etc/cron.d/cron-gridftp-transfer-mesh

15 5,11,17,23 * * * root /usr/local/bin/cron-gridftp-transfer-mesh.sh &> /var/log/cron-gridftp-transfer-mesh.log # gigiot-22.conf.cenic.org
31 * * * * root /usr/local/bin/cron-load-gridftp.sh &> /var/log/cron-gridftp-load.log	# gigiot-22.conf.cenic.org


<br>
<br>
<code> Install ZFS <br>
https://github.com/zfsonlinux/zfs/wiki/RHEL-and-CentOS<br>
This takes too long so we installed it during PXE boot

<br>
<br>
<code> Once you see a zfs install sucessful you can load the kernel module

<br>
<br>
<code> Now create a zfs pool<br>
This will be changed to /dev/sda /dev/sdb<br>
For now we are using the second 32GB SD card and its slower than the thumbdrives

In [20]:
zpool create -f -m /data data -o ashift=12 /dev/sda /dev/sdb
zfs set recordsize=1024K data
zfs set checksum=off data
zfs set atime=off data

In [21]:
zpool export data
zpool import -d /dev/disk/by-id/ data

<br>
<br>
<code> Enable and start the systemd service

In [22]:
systemctl enable zfs-import-cache
systemctl enable zfs-mount.service 
systemctl start zfs-import-cache
systemctl start zfs-mount.service 
systemctl enable zfs.target
systemctl enable zfs-share.service 
systemctl start zfs.target
systemctl start zfs-share.service 

<br>
<br>
<code> Add firewalld rules for globus-connect-server

In [23]:
firewall-cmd --permanent --zone=public --add-port=2811/tcp
firewall-cmd --permanent --zone=public --add-port=7512/tcp
firewall-cmd --permanent --zone=public --add-port=50000-51000/tcp
firewall-cmd --permanent --zone=public --add-port=50000-51000/udp
firewall-cmd --permanent --zone=public --add-port=2223/tcp
firewall-cmd --reload

success
success
success
success
success
success


<br>
<br>
<code> Wget the 1G.dat test file

In [24]:
wget http://fiona-10-01:8000/1G.dat -O /data/1G.dat

--2018-03-03 14:26:16--  http://fiona-10-01:8000/1G.dat
Resolving fiona-10-01 (fiona-10-01)... 137.164.48.196
Connecting to fiona-10-01 (fiona-10-01)|137.164.48.196|:8000... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1073741824 (1.0G) [application/octet-stream]
Saving to: ‘/data/1G.dat’


2018-03-03 14:26:39 (44.2 MB/s) - ‘/data/1G.dat’ saved [1073741824/1073741824]



<br>
<br>
<code> Execute the transfer mesh to run a test that pulls the 1G.dat file from every other host in the mesh.

In [25]:
cat /usr/local/bin/cron-gridftp-transfer-mesh.sh

#!/bin/bash
export GLOBUS_TCP_PORT_RANGE=50000,51000
export GLOBUS_TCP_SOURCE_RANGE=50000,51000
/usr/local/bin/timeout.sh -t 2000 globus-url-copy -vb -fast ftp://gigiot-01.conf.cenic.org:2811/data/1G.dat file:///data/1G.out
/usr/local/bin/timeout.sh -t 2000 globus-url-copy -vb -fast ftp://gigiot-02.conf.cenic.org:2811/data/1G.dat file:///data/1G.out
/usr/local/bin/timeout.sh -t 2000 globus-url-copy -vb -fast ftp://gigiot-03.conf.cenic.org:2811/data/1G.dat file:///data/1G.out
/usr/local/bin/timeout.sh -t 2000 globus-url-copy -vb -fast ftp://gigiot-04.conf.cenic.org:2811/data/1G.dat file:///data/1G.out
/usr/local/bin/timeout.sh -t 2000 globus-url-copy -vb -fast ftp://gigiot-05.conf.cenic.org:2811/data/1G.dat file:///data/1G.out
/usr/local/bin/timeout.sh -t 2000 globus-url-copy -vb -fast ftp://gigiot-06.conf.cenic.org:2811/data/1G.dat file:///data/1G.out
/usr/local/bin/timeout.sh -t 2000 globus-url-copy -vb -fast ftp://gigiot-07.conf.cenic.org:2811/data/1G.dat file:///data/1G.out
/usr/loc

<br>
<br>
<code> Load the results from other hosts pulling data from your host <br>
This is run by cron but you can run by hand any time to push results to the central measurement archive

In [26]:
/usr/local/bin/cron-load-gridftp.sh

ts=2018-03-03 14:27:24,391 event=main.start id=1520116044 no last record found
ts=2018-03-03 14:27:24,391 event=main.process id=1520116044 No last record, processing /var/log/gridftp-transfer.log
ts=2018-03-03 14:27:24,391 event=scan_and_load_netlogger.end id=1520116044 Loaded 0 records
--2018-03-03 14:27:24--  http://fiona-10-01:8000/cron-gridftp-transfer-mesh.sh
Resolving fiona-10-01 (fiona-10-01)... 137.164.48.196
Connecting to fiona-10-01 (fiona-10-01)|137.164.48.196|:8000... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3423 (3.3K) [application/x-sh]
Saving to: ‘/usr/local/bin/cron-gridftp-transfer-mesh.sh’


2018-03-03 14:27:24 (186 MB/s) - ‘/usr/local/bin/cron-gridftp-transfer-mesh.sh’ saved [3423/3423]



<br>
<br>
<code> Install basic debugging service for bwctld-server

In [27]:
yum clean all
yum -y install iperf3 bwctl nuttcp

Loaded plugins: fastestmirror
Cleaning repos: base epel extras globus perfsonar updates zfs
Cleaning up everything
Maybe you want: rm -rf /var/cache/yum, to also free up space taken by orphaned data from disabled or removed repos
Cleaning up list of fastest mirrors
Loaded plugins: fastestmirror
base                                                     | 3.6 kB     00:00     
epel                                                     | 4.7 kB     00:00     
extras                                                   | 3.4 kB     00:00     
globus                                                   | 2.9 kB     00:00     
perfsonar                                                | 2.9 kB     00:00     
updates                                                  | 3.4 kB     00:00     
zfs                                                      | 2.9 kB     00:00     
(1/10): base/group_gz                                      | 156 kB   00:00     
(2/10): epel/group_gz                                   

<br>
<br>
<code> Enable and start the bwctld-server systemd service

In [28]:
systemctl enable bwctl-server
systemctl start bwctl-server

<br>
<br>
<code> Add firewall rules for bwctld ports

In [29]:
firewall-cmd --permanent --zone=public --add-port=4823/tcp
firewall-cmd --permanent --zone=public --add-port=5001-5900/tcp
firewall-cmd --permanent --zone=public --add-port=5001-5900/udp
firewall-cmd --permanent --zone=public --add-port=6001-6200/tcp
firewall-cmd --permanent --zone=public --add-port=6001-6200/udp
firewall-cmd --reload

success
success
success
success
success
success


<br>
<br>
<code> Show current firewalld config

In [30]:
firewall-cmd --list-all

public (active)
  target: default
  icmp-block-inversion: no
  interfaces: enp3s0
  sources: 
  services: ssh dhcpv6-client ntp http https
  ports: 5001-5300/udp 5001-5300/tcp 5301-5600/udp 5301-5600/tcp 5601-5900/udp 5601-5900/tcp 6001-6200/udp 6001-6200/tcp 8760-9960/udp 8760-9960/tcp 8888/tcp 2811/tcp 7512/tcp 50000-51000/tcp 50000-51000/udp 2223/tcp 4823/tcp 5001-5900/tcp 5001-5900/udp
  protocols: 
  masquerade: no
  forward-ports: 
  source-ports: 
  icmp-blocks: 
  rich rules: 
	


<br>
<br>
Load FIONA-Tuning for later lab.

In [31]:
wget http://fiona-10-01:8000/FIONA-Tuning.ipynb

--2018-03-03 14:28:30--  http://fiona-10-01:8000/FIONA-Tuning.ipynb
Resolving fiona-10-01 (fiona-10-01)... 137.164.48.196
Connecting to fiona-10-01 (fiona-10-01)|137.164.48.196|:8000... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1429 (1.4K) [application/octet-stream]
Saving to: ‘FIONA-Tuning.ipynb’


2018-03-03 14:28:30 (117 MB/s) - ‘FIONA-Tuning.ipynb’ saved [1429/1429]



<br>
<br>
Install the perfSONAR local repo.

In [32]:
cat << EOF > /etc/yum.repos.d/perfsonar.repo
[perfsonar]
name=Extra Packages for Enterprise Linux 7 - $basearch
baseurl=http://137.164.48.196/centos-7-x86_64/perfsonar
failovermethod=priority
enabled=1
EOF