# Stencil Cython Serial

Checking directives and parameters:

### Compiler directives
- **boundscheck (True / False)**: If set to False, Cython is free to assume that indexing operations ([]-operator) in the code will not cause any IndexErrors to be raised. Lists, tuples, and strings are affected only if the index can be determined to be non-negative (or if wraparound is False). Conditions which would normally trigger an IndexError may instead cause segfaults or data corruption if this is set to False. Default is True.
- **wraparound (True / False)**: In Python, arrays and sequences can be indexed relative to the end. For example, A[-1] indexes the last value of a list. In C, negative indexing is not supported. If set to False, Cython is allowed to neither check for nor correctly handle negative indices, possibly causing segfaults or data corruption. If bounds checks are enabled (the default, see boundschecks above), negative indexing will usually raise an IndexError for indices that Cython evaluates itself. However, these cases can be difficult to recognise in user code to distinguish them from indexing or slicing that is evaluated by the underlying Python array or sequence object and thus continues to support wrap-around indices. It is therefore safest to apply this option only to code that does not process negative indices at all. Default is True.
- **cdivision (True / False)**: If set to False, Cython will adjust the remainder and quotient operators C types to match those of Python ints (which differ when the operands have opposite signs) and raise a ZeroDivisionError when the right operand is 0. This has up to a 35% speed penalty. If set to True, no checks are performed. See CEP 516. Default is False.
- **initializedcheck (True / False)**: If set to True, Cython checks that a memoryview is initialized whenever its elements are accessed or assigned to. Setting this to False disables these checks. Default is True.
- **language_level (2/3/3str)**: Globally set the Python language level to be used for module compilation. Default is compatibility with Python 2. To enable Python 3 source code semantics, set this to 3 (or 3str) at the start of a module or pass the “-3” or “–3str” command line options to the compiler. The 3str option enables Python 3 semantics but does not change the str type and unprefixed string literals to unicode when the compiled code runs in Python 2.x. Note that cimported files inherit this setting from the module being compiled, unless they explicitly set their own language level. Included source files always inherit this setting.
- **infer_types (True / False)**: Infer types of untyped variables in function bodies. Default is None, indicating that only safe (semantically-unchanging) inferences are allowed. In particular, inferring integral types for variables used in arithmetic expressions is considered unsafe (due to possible overflow) and must be explicitly requested.

Source: https://cython.readthedocs.io/en/latest/src/userguide/source_files_and_compilation.html

In [2]:
%reload_ext Cython

In [5]:
%%cython?

[0;31mDocstring:[0m
::

  %cython [-a] [-+] [-3] [-2] [-f] [-c COMPILE_ARGS]
              [--link-args LINK_ARGS] [-l LIB] [-n NAME] [-L dir] [-I INCLUDE]
              [-S SRC] [--pgo] [--verbose]

Compile and import everything from a Cython code cell.

The contents of the cell are written to a `.pyx` file in the
directory `IPYTHONDIR/cython` using a filename with the hash of the
code. This file is then cythonized and compiled. The resulting module
is imported and all of its symbols are injected into the user's
namespace. The usage is similar to that of `%%cython_pyximport` but
you don't have to pass a module name::

    %%cython
    def f(x):
        return 2.0*x

To compile OpenMP codes, pass the required  `--compile-args`
and `--link-args`.  For example with gcc::

    %%cython --compile-args=-fopenmp --link-args=-fopenmp
    ...

To enable profile guided optimisation, pass the ``--pgo`` option.
Note that the cell itself needs to take care of establishing a suitable
profile when

Write source file to disk:

In [6]:
%%writefile scs.pyx
#cython: boundscheck=False, wraparound=False, cdivision=True
#cython: initializedcheck=False, language_level=3, infer_types=True

cpdef st(int n, double energy, int niters):
    from time import time
    import numpy as np

    # definição de variáveis
    cdef double      heat      = 0.0
    cdef double      t         = 0.0
    cdef Py_ssize_t  size      = n + 2
    cdef Py_ssize_t  sizeStart = 1
    cdef Py_ssize_t  sizeEnd   = n + 1
    cdef Py_ssize_t  iters, i, j

    t = time()
    
    # cria e inicializa as matrizes com zeros e memoryview
    cdef double[:,::1] mvaold = np.zeros((size, size), np.double)
    cdef double[:,::1] mvanew = np.zeros((size, size), np.double)
    cdef Py_ssize_t    nsources  = 3      # qde de fontes
    cdef    int[:,::1] mvsources = np.empty( (nsources,2), np.intc)

    # inicializa 3 fontes de calor
    mvsources[0,0] = mvsources[0,1] = n/2
    mvsources[1,0] = mvsources[1,1] = n/3
    mvsources[2,0] = n*4/5
    mvsources[2,1] = n*8/9

    niters = (niters + 1) // 2
    for iters in range(niters) :
        # iteracao impar
        for i in range(sizeStart, sizeEnd) :
            for j in range(sizeStart, sizeEnd) :
                mvanew[i,j] = ( mvaold[i,j] / 2.0 +
                              ( mvaold[i-1,j] + mvaold[i+1,j] +
                                mvaold[i,j-1] + mvaold[i,j+1] ) / 8.0 )
        for i in range(nsources) :
            mvanew[mvsources[i,0], mvsources[i,1]] += energy
        # iteracao par
        for i in range(sizeStart, sizeEnd) :
            for j in range(sizeStart, sizeEnd) :
                mvaold[i,j] = ( mvanew[i,j] / 2.0 +
                              ( mvanew[i-1,j] + mvanew[i+1,j] +
                                mvanew[i,j-1] + mvanew[i,j+1] ) / 8.0 )
        for i in range(nsources) :
            mvaold[mvsources[i,0], mvsources[i,1]] += energy
    # calcula o total de energia
    for i in range(sizeStart, sizeEnd) :
        for j in range(sizeStart, sizeEnd) :
            heat += mvaold[i,j]
    t = time() - t
#    print("Heat = %0.4f | Tempo = %0.4f" %(heat, t))
    return heat, t

Overwriting scs.pyx


In [7]:
%%writefile setup.py
from setuptools import setup
from Cython.Build import cythonize

setup(
    ext_modules = cythonize("scs.pyx", force=True)
)

Overwriting setup.py


Python core code that calls the Cython module:

In [9]:
%%writefile st-cy-seq.py
from time import time
tp = time()
import scs

n            = 4800    # nxn grid; 4800,1,500→1500; 100,1,10→30 [4800]
energy       = 1.0     # energy to be injected per iteration [1.0]
niters       = 500     # number of iterations [500]

heat, t = scs.st(n, energy, niters)
tp = time() - tp
print("Heat = %0.4f | Tempo = %0.4f | TempoPyt = %0.4f" %(heat, t, tp))

Overwriting st-cy-seq.py


Build (GCC):

In [11]:
%%bash
rm scs.*.so  # clean
python setup.py build_ext --inplace

[1/1] Cythonizing scs.pyx
running build_ext
building 'scs' extension
gcc -pthread -B /scratch/app/anaconda3/2018.12/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/scratch/app/anaconda3/2018.12/include/python3.7m -c scs.c -o build/temp.linux-x86_64-3.7/scs.o
gcc -pthread -shared -B /scratch/app/anaconda3/2018.12/compiler_compat -L/scratch/app/anaconda3/2018.12/lib -Wl,-rpath=/scratch/app/anaconda3/2018.12/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.7/scs.o -o build/lib.linux-x86_64-3.7/scs.cpython-37m-x86_64-linux-gnu.so
copying build/lib.linux-x86_64-3.7/scs.cpython-37m-x86_64-linux-gnu.so -> 


In [15]:
%%writefile scriptshell.sh
#!/bin/sh
time python st-cy-seq.py

Overwriting scriptshell.sh


Copy the files to be executed to the scratch area:

In [17]:
%%bash
s='/prj/ampemi/xxxx.xxxx/stnc/Cython'
d='/scratch/ampemi/xxxx.xxxx/stnc/Cython'
rm $d/scs.*.so
cp  $s/scs.*.so  $s/st-cy-seq.py  $s/scriptshell.sh  $d

In [19]:
%%writefile st-cy-seq.srm
#!/bin/bash
#SBATCH -p cpu_small           # Select partition
#SBATCH --ntasks=1             # Total tasks(CPUs)
#SBATCH --nodes=1              # Number of nodes
#SBATCH --ntasks-per-node=1    # Number of tasks per node
#SBATCH -J stcyseq             # Job name
#SBATCH --time=00:02:00        # Limit execution time

echo '========================================'
echo '- Stencil Cython Serial'
echo '- Job ID:' $SLURM_JOB_ID
echo '- Tasks per node:' $SLURM_NTASKS_PER_NODE
echo '- Number of nodes:' $SLURM_JOB_NUM_NODES
echo '- Total tasks:' $SLURM_NTASKS
echo '- Nodes alocated:' $SLURM_JOB_NODELIST
echo '- Directory where sbatch was called ($SLURM_SUBMIT_DIR):'
echo $SLURM_SUBMIT_DIR
cd $SLURM_SUBMIT_DIR

# Working dir
cd /scratch/ampemi/xxxx.xxxx/stnc/Cython

# Executable
EXEC='sh scriptshell.sh'

# Run
echo '-- srun -------------------------------'
echo '$ time srun -n ' $SLURM_NTASKS $EXEC
srun -n $SLURM_NTASKS $EXEC
echo '-- END --------------------------------'

Overwriting st-cy-seq.srm


<hr style="height:10px;border-width:0;background-color:green">

In [20]:
%%bash
sbatch st-cy-seq.srm
sbatch st-cy-seq.srm
sbatch st-cy-seq.srm
squeue -n stcyseq

Submitted batch job 772612
Submitted batch job 772613
Submitted batch job 772614
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
            772612 cpu_small  stcyseq xxxx. PD       0:00      1 (Priority)
            772613 cpu_small  stcyseq xxxx. PD       0:00      1 (Priority)
            772614 cpu_small  stcyseq xxxx. PD       0:00      1 (Priority)


In [1]:
%%bash
b='/stnc/Cython'
d='/scratch/ampemi/xxxx.xxxx'$b
cat $d/slurm-772612.out
cat $d/slurm-772613.out
cat $d/slurm-772614.out

- Stencil Cython Serial
- Job ID: 772612
- Tasks per node: 1
- Number of nodes: 1
- Total tasks: 1
- Nodes alocated: sdumont1407
- Directory where sbatch was called ($SLURM_SUBMIT_DIR):
/prj/ampemi/xxxx.xxxx/stnc/Cython
-- srun -------------------------------
$ time srun -n  1 sh scriptshell.sh
Heat = 1500.0000 | Tempo = 23.9847 | TempoPyt = 29.4011

real	0m29.715s
user	0m24.100s
sys	0m0.335s
-- END --------------------------------
- Stencil Cython Serial
- Job ID: 772613
- Tasks per node: 1
- Number of nodes: 1
- Total tasks: 1
- Nodes alocated: sdumont1407
- Directory where sbatch was called ($SLURM_SUBMIT_DIR):
/prj/ampemi/xxxx.xxxx/stnc/Cython
-- srun -------------------------------
$ time srun -n  1 sh scriptshell.sh
Heat = 1500.0000 | Tempo = 23.9444 | TempoPyt = 24.6716

real	0m24.811s
user	0m24.077s
sys	0m0.255s
-- END --------------------------------
- Stencil Cython Serial
- Job ID: 772614
- Tasks per node: 1
- Number of nodes: 1
- Total tasks: 1
- Nodes alocated: sdumont1407

<hr style="height:10px;border-width:0;background-color:red">

In [2]:
! cython --version

Cython version 0.29.20


In [2]:
! ifort --version

ifort (IFORT) 19.0.3.199 20190206
Copyright (C) 1985-2019 Intel Corporation.  All rights reserved.



In [3]:
! icc --version

icc (ICC) 19.0.3.199 20190206
Copyright (C) 1985-2019 Intel Corporation.  All rights reserved.

