-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #18 from hyoklee/main
Use CTest for multi-node testing. (#17)
- Loading branch information
Showing
34 changed files
with
606 additions
and
29 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
set(CTEST_PROJECT_NAME "fabtsuite") | ||
set(CTEST_NIGHTLY_START_TIME "00:00:00 CST") | ||
set(SLURM FALSE) | ||
set(PBS FALSE) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
# Developer's Guide | ||
|
||
## Naming Conventions | ||
|
||
There are 6 abbreviations (a.) for testing features: | ||
|
||
| Feature | a. | | ||
|----------------|----| | ||
| FI_WAIT_FD | w | | ||
| fi_cancel() | c | | ||
| cross-job-comm | x | | ||
| multi-thread | t | | ||
| vectored-IO | v | | ||
| MPI Interop. | m | | ||
|
||
All multi-node scripts start with `fabt` and have file extension like `.sh`. | ||
|
||
## Debugging with hlog | ||
|
||
|
||
## Single-Node Test | ||
|
||
[test/test.sh](../test/test.sh) is used to check if programs run correctly | ||
on local host. | ||
|
||
## Multi-Node Test | ||
|
||
The programs require shell scripting because they do not generate time. | ||
`nohup` is necessary . | ||
|
||
## Adding a New CTest | ||
|
||
### Local | ||
1. Write a script that runs `fabtget` and `fabtput`. | ||
2. Add the script to `transfer/CMakeTests.cmake'. | ||
|
||
### Multi-node | ||
1. Write a job script that runs `fabtget` and `fabtput` on different nodes. | ||
2. Add the script to either `transfer/CMakeTests_s.cmake` or | ||
`transfer/CMakeTests_p.cmake` file depending on SLURM or PBS job. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
# FAQ | ||
|
||
* GitHub Action fails with `Error: Process completed with exit code 145.` Why? | ||
|
||
We don't know the reason yet. However, you can try to run the failed job | ||
again and it will pass eventually. | ||
|
||
* I installed fabtsuite using Spack but I get the `available libfabric version | ||
< 1.13` error when I run programs. | ||
|
||
Please try update LD_LIBRARY_PATH and PATH like as follows. | ||
``` | ||
export LD_LIBRARY_PATH=$PREFIX/lib:$LD_LIBRARY_PATH | ||
export PATH=$PREFIX/bin:$PATH | ||
``` | ||
The `PREFIX` is where Spack installed the libfabric and fabtsuite package. | ||
|
||
* What is the default timeout value for CTest? | ||
|
||
It is 1500 seconds (= 25 minutes). | ||
If a test fails due to timeout, you'll get an output like below: | ||
|
||
``` | ||
4/8 Test #4: fi_cancel ........................ Passed 554.06 sec | ||
Start 5: cross-job-comm | ||
5/8 Test #5: cross-job-comm ...................***Timeout 1500.12 sec | ||
Start 6: multi-thread | ||
6/8 Test #6: multi-thread .....................***Timeout 1500.10 sec | ||
Start 7: vectored-IO | ||
``` | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
This directory has files for CTest. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
#!/bin/sh | ||
## | ||
## Usage: qsub cancel.qsub | ||
## Author: Hyokyung Lee (hyoklee@hdfgroup.org) | ||
## Last Update: 2022-09-14 | ||
## | ||
#PBS -l select=2:system=polaris | ||
#PBS -l place=scatter | ||
#PBS -l walltime=10:00 | ||
#PBS -q debug | ||
#PBS -A CSC250STDM12 | ||
|
||
# Set the libfabric library location. | ||
PREFIX=/lus/grand/projects/radix-io | ||
export LD_LIBRARY_PATH=$PREFIX/lib:$LD_LIBRARY_PATH | ||
|
||
# Set the current working directory. | ||
WORKDIR=$PBS_O_WORKDIR | ||
|
||
# Get all node names first. | ||
mpiexec -n 1 -ppn 1 cat $PBS_NODEFILE >& $WORKDIR/nodes.txt | ||
|
||
# Run 1 server and (select - 1) client(s). | ||
# The debug queue has only 2 nodes. | ||
# Therefore, this script will run only 1 client. | ||
# The first node in nodes.txt will be the server. | ||
# The rest will be clients. | ||
j=0 | ||
for i in `cat $WORKDIR/nodes.txt`; do | ||
if [[ "$j" -gt 0 ]]; then | ||
mpiexec -host $i -n 1 -ppn 1 $WORKDIR/tput.sh -c | ||
else | ||
mpiexec -host $i -n 1 -ppn 1 nohup $WORKDIR/tget.sh -c > fabtget.out 2> fabtget.err < /dev/null & | ||
fi | ||
((j++)) | ||
done | ||
echo $? |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
#!/bin/bash | ||
## | ||
## Usage: sbatch cancel.slurm | ||
## Author: Hyokyung Lee (hyoklee@hdfgroup.org) | ||
## Last Update: 2022-09-14 | ||
## | ||
#SBATCH -A CSC332_crusher | ||
#SBATCH -J cancel | ||
#SBATCH -o %x-%j.out | ||
#SBATCH -t 00:00:20 | ||
#SBATCH -N 2 | ||
srun -N1 -n1 ./tget.sh -c & | ||
srun -N1 -n1 ./tput.sh -c & | ||
sleep 20 | ||
|
||
a=$(grep Result cancel-*.out | wc -l) | ||
if [ "$a" -eq "0" ]; then | ||
exit 1 | ||
fi | ||
|
||
b=$(grep error cancel-*.out | wc -l) | ||
exit $b | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
#!/bin/sh | ||
## | ||
## Usage: qsub cross.qsub | ||
## Author: Hyokyung Lee (hyoklee@hdfgroup.org) | ||
## Last Update: 2022-09-19 | ||
## | ||
#PBS -l select=3:system=polaris | ||
#PBS -l place=scatter | ||
#PBS -l walltime=10:00 | ||
#PBS -q debug-scaling | ||
#PBS -A CSC250STDM12 | ||
|
||
# Set the libfabric library location. | ||
PREFIX=/lus/grand/projects/radix-io | ||
export LD_LIBRARY_PATH=$PREFIX/lib:$LD_LIBRARY_PATH | ||
|
||
# Set the current working directory. | ||
WORKDIR=$PBS_O_WORKDIR | ||
|
||
# Get all node names first. | ||
mpiexec -n 1 -ppn 1 cat $PBS_NODEFILE >& $WORKDIR/nodes.txt | ||
|
||
# Run 1 server and (select - 1) client(s). | ||
# The debug queue has only 2 nodes. | ||
# Therefore, this script will run only 1 client. | ||
# The first node in nodes.txt will be the server. | ||
# The rest will be clients. | ||
j=0 | ||
for i in `cat $WORKDIR/nodes.txt`; do | ||
if [[ "$j" -gt 0 ]]; then | ||
mpiexec -host $i -n 1 -ppn 1 $WORKDIR/tput.sh -n 4 -k 2 > $WORKDIR/cross_p_$j.out 2> $WORKDIR/cross_p_$j.err | ||
else | ||
mpiexec -host $i -n 1 -ppn 1 nohup $WORKDIR/tget.sh -n 4 > $WORKDIR/cross_g.out 2> $WORKDIR/cross_g.err < /dev/null & | ||
fi | ||
((j++)) | ||
done | ||
echo $? |
Oops, something went wrong.