# FFT Cupy GPU Seq-X & B715

- https://github.com/cupy

# CuPy

## On Sdumont18 (Sequana X)

    ssh sdumont18    
    # 2x Xeon Gold 6152 (22 cores), total 44 cores (88 vcores), 754 G RAM, 4x Tesla V100
    module load sequana/current
    module load sdbase
    conda activate dir/env2
    conda activate --stack dir/env3
    # module load anaconda3/2018.12
    # conda activate /scratch${PWD#"/prj"}/env2
    export NUMBAPRO_NVVM=/usr/local/cuda-10.1/nvvm/lib64/libnvvm.so
    export NUMBAPRO_LIBDEVICE=/usr/local/cuda-10.1/nvvm/libdevice/

# Sequana-X

In [27]:
import numpy as np, cupy as cp, time as tm
def f() :
    t0 = -tm.time()    # <--- time measurement
    L = M = N = 576
    a = np.fromfunction( lambda i, j, k:
            np.sin ( i + j + k + 3 ), (N, M, L), dtype=cp.complex128 )
    f = cp.asarray(a)
    fft = cp.fft.fftn(f)
    s = complex(cp.sum(fft))
    t0 += tm.time()    # <--- time measurement
    print(f"S:{s*1e-5:.0f}", end='')
    print(f" | T:{t0:.4f}")

In [28]:
f()

S:270-0j | T:19.7136


In [29]:
f()

S:270-0j | T:19.4425


In [30]:
f()

S:270-0j | T:19.7182


# B715

Fila | Wall-clock máximo (em horas) | Número mínimo de nós (núcleos+ dispositivos) | Número máximo de nós (núcleos+ dispositivos) | Número máximo de tarefas em execução por usuário | Número máximo de tarefas em fila por usuário
- | - | - | - | - | -
nvidia_small | 1 | 1 (24+2) | 20 (480+40) | 4 | 24
nvidia_dev | 0:20 | 1 (24+2) | 4 (96+8) | 1 | 1
sequana_gpu | 96 | 1 (48+4) | 21 (1008+84) | 4 | 24

In [40]:
%%writefile gnb715.py
import numpy as np, cupy as cp, time as tm

t0 = -tm.time()    # <--- time measurement
L = M = N = 576
a = np.fromfunction( lambda i, j, k:
    np.sin ( i + j + k + 3 ), (N, M, L), dtype=cp.complex128 )
f = cp.asarray(a)
fft = cp.fft.fftn(f)
s = complex(cp.sum(fft))
t0 += tm.time()    # <--- time measurement
print(f"S:{s*1e-5:.0f}", end='')
print(f" | T:{t0:.4f}")

Overwriting gnb715.py


In [41]:
! python gnb715.py

S:270-0j | T:22.6863


In [42]:
! cp gnb715.py /scratch${PWD#/prj}

In [34]:
%%writefile gnb715.srm
#!/bin/bash
# 1,0 UA partitions:
#   cpu,       96 h,    21-50 nodes, 4/24  tasks
#   cpu_dev,   20 min., 1-4   nodes, 1/1   tasks
#   cpu_small, 72 h,    1-20  nodes, 16/96 tasks
#   nvidia_dev: 0:20, 1 (24+2), 4 (96+8), 1, 1
#SBATCH --partition nvidia_dev # Select partition
#SBATCH --ntasks=1             # Total tasks
#SBATCH --job-name gnb715      # Job name
#SBATCH --time=00:05:00        # Limit execution time
#SBATCH --exclusive            # Exclusive acccess to nodes

echo '========================================'
echo '- Job ID:' $SLURM_JOB_ID
echo '- Tasks per node:' $SLURM_NTASKS_PER_NODE
echo '- # of nodes in the job:' $SLURM_JOB_NUM_NODES
echo '- # of tasks:' $SLURM_NTASKS
echo '- Dir from which sbatch was invoked:' ${SLURM_SUBMIT_DIR##*/}
cd $SLURM_SUBMIT_DIR
echo -n '- List of nodes allocated to the job: '
nodeset -e $SLURM_JOB_NODELIST
                                              
# Environment
#cd
#cd /scratch${PWD#"/prj"}/
# module load anaconda3/2020.11
#source /scratch/app/anaconda3/2020.11/etc/profile.d/conda.sh
#conda activate ./env2
cd
dir=/scratch${PWD#"/prj"}
cd $dir
source $dir/env2/etc/profile.d/conda.sh
conda activate $dir/env2
conda activate --stack $dir/env3
cd $dir/fft
#module load sequana/current
#module load sdbase
export NUMBAPRO_NVVM=/usr/local/cuda-10.1/nvvm/lib64/libnvvm.so
export NUMBAPRO_LIBDEVICE=/usr/local/cuda-10.1/nvvm/libdevice/

# Executable
EXEC="python gnb715.py"

# Start
echo '$ srun  --mpi=pmi2  -n' $SLURM_NTASKS  ${EXEC##*/}
echo '-- output -----------------------------'
srun  --mpi=pmi2  -n $SLURM_NTASKS  $EXEC
echo '~~ end ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'

Overwriting gnb715.srm


In [43]:
! sbatch gnb715.srm

Submitted batch job 1493003


In [44]:
! squeue -n gnb715 -o "%.18i  %.9P  %.2t %.5M %.5D %.4C"

             JOBID  PARTITION  ST  TIME NODES CPUS
           1493003  nvidia_de   R  0:00     1   24


In [1]:
! squeue -n gnb715 -o "%.18i  %.9P  %.2t %.5M %.5D %.4C"

             JOBID  PARTITION  ST  TIME NODES CPUS


In [2]:
! cat /scratch${PWD#/prj}/slurm-1493003.out

- Job ID: 1493003
- Tasks per node:
- # of nodes in the job: 1
- # of tasks: 1
- Dir from which sbatch was invoked: fft
- List of nodes allocated to the job: sdumont3173
$ srun  --mpi=pmi2  -n 1 python gnb715.py
-- output -----------------------------
S:270-0j | T:39.5394
~~ end ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


In [3]:
! sbatch gnb715.srm

Submitted batch job 1493644


In [4]:
! squeue -n gnb715 -o "%.18i  %.9P  %.2t %.5M %.5D %.4C"

             JOBID  PARTITION  ST  TIME NODES CPUS
           1493644  nvidia_de   R  0:01     1   24


In [6]:
! squeue -n gnb715 -o "%.18i  %.9P  %.2t %.5M %.5D %.4C"

             JOBID  PARTITION  ST  TIME NODES CPUS


In [7]:
! cat /scratch${PWD#/prj}/slurm-1493644.out

- Job ID: 1493644
- Tasks per node:
- # of nodes in the job: 1
- # of tasks: 1
- Dir from which sbatch was invoked: fft
- List of nodes allocated to the job: sdumont3131
$ srun  --mpi=pmi2  -n 1 python gnb715.py
-- output -----------------------------
S:270-0j | T:40.6159
~~ end ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


In [8]:
! sbatch gnb715.srm

Submitted batch job 1493703


In [9]:
! squeue -n gnb715 -o "%.18i  %.9P  %.2t %.5M %.5D %.4C"

             JOBID  PARTITION  ST  TIME NODES CPUS


In [None]:
! squeue -n gnb715 -o "%.18i  %.9P  %.2t %.5M %.5D %.4C"

In [10]:
! cat /scratch${PWD#/prj}/slurm-1493703.out

- Job ID: 1493703
- Tasks per node:
- # of nodes in the job: 1
- # of tasks: 1
- Dir from which sbatch was invoked: fft
- List of nodes allocated to the job: sdumont3131
$ srun  --mpi=pmi2  -n 1 python gnb715.py
-- output -----------------------------
S:270-0j | T:34.3320
~~ end ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
