# PARF - SDumont

Parallel Random Forest (RF) Algorithm, MPI-enabled, implemented in Fortran, CLI, linkage with gnuplot is also provided

Source: https://www.irb.hr/eng/Scientific-Support-Centres/Centre-for-Informatics-and-Computing/Projects2/IT-projects/PARF

Last revision: 2021-05-11

Compilind and running on SDumont, using Intel Fortran, and with the `module load intel_psxe/2020` preloaded.

In [6]:
%%bash
ifort --version
mpiifort --version
mpirun --version
icc --version

ifort (IFORT) 19.1.2.254 20200623
Copyright (C) 1985-2020 Intel Corporation.  All rights reserved.

ifort (IFORT) 19.1.2.254 20200623
Copyright (C) 1985-2020 Intel Corporation.  All rights reserved.

Intel(R) MPI Library for Linux* OS, Version 2019 Update 8 Build 20200624 (id: 4f16ad915)
Copyright 2003-2020, Intel Corporation.
icc (ICC) 19.1.2.254 20200623
Copyright (C) 1985-2020 Intel Corporation.  All rights reserved.



Get PARF

In [1]:
! wget https://storage.googleapis.com/google-code-archive-downloads/v2/code.google.com/parf/parf_2008-09-30.tgz

--2021-05-10 21:40:33--  https://storage.googleapis.com/google-code-archive-downloads/v2/code.google.com/parf/parf_2008-09-30.tgz
Resolving storage.googleapis.com (storage.googleapis.com)... 172.217.30.16, 142.250.79.48, 142.250.79.240, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|172.217.30.16|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 49983 (49K) [application/octet-stream]
Saving to: ‘parf_2008-09-30.tgz’


2021-05-10 21:40:33 (3.17 MB/s) - ‘parf_2008-09-30.tgz’ saved [49983/49983]



Unpacking

In [2]:
! tar zxvf parf_2008-09-30.tgz

parf/Makefile
parf/bitvectors.f90
parf/bootstraps.f90
parf/forests.f90
parf/graphics.f90
parf/importances.f90
parf/instancesets.f90
parf/main.f90
parf/options.f90
parf/prototypes.f90
parf/trees.f90
parf/utilities.f90
parf/support.c
parf/farg/
parf/farg/farg.f
parf/parallel/
parf/parallel/none.f90
parf/parallel/mpi.f90
parf/merge.pl
parf/splitrows.pl
parf/splitset.pl
parf/LICENSE


Configure Makefile

In [3]:
%%writefile parf/Makefile
##### Configuration section

### Choose a Fortran 90 compiler and options
# FC = /opt/intel_fc_80/bin/ifort
# FFLAGS = -g -pg -CB -traceback --static
FC = ifort
FFLAGS = -O3

### Choose a C compiler and options
# CC = cc
# CFLAGS = -Wall -g -pg --static
CC = icc
CFLAGS = -O3

### Choose parallelisation library, comment for no parallelisation
PAR = mpi

### For MPI: the MPI Fortran compilation command
# MPIFC = mpif90
MPIFC = mpiifort

##### End of configuration section
# 
# No changes should be necessary below this line
# ---------------------------------------------------------------------

PAR ?= none
ifeq (${PAR},mpi)
	FC = ${MPIFC}
endif
MODSOURCES=trees.f90 bitvectors.f90 instancesets.f90 options.f90 \
	utilities.f90 bootstraps.f90 forests.f90 importances.f90 \
	prototypes.f90 graphics.f90
CSOURCES=support.c
COBJECTS=${CSOURCES:.c=.o}
MODOBJECTS=${MODSOURCES:.f90=.o}
ADDOBJECTS=${ADDSOURCES:.f=.o}
PROJECT=parf
DIR=$(notdir ${PWD})

${PROJECT}: main.f90 parallel.o ${MODOBJECTS} ${ADDOBJECTS} ${COBJECTS}
	${FC} ${FFLAGS} -o ${PROJECT} $+

parallel.o: parallel/${PAR}.f90
	${FC} ${FFLAGS} -c -o parallel.o $<

%.o: %.f90
	${FC} ${FFLAGS} -c $<

%.o: %.c
	${CC} ${CFLAGS} -c $<

main.o: Makefile options.o instancesets.o utilities.o forests.o \
	importances.o prototypes.o parallel.o
forests.o: Makefile trees.o instancesets.o bootstraps.o bitvectors.o \
	importances.o prototypes.o
trees.o: Makefile bitvectors.o instancesets.o bootstraps.o utilities.o
instancesets.o: Makefile utilities.o bitvectors.o \
	options.o parallel.o support.o
importances.o: Makefile instancesets.o graphics.o
bitvectors.o: Makefile utilities.o
utilities.o: Makefile support.o
options.o: Makefile support.o utilities.o parallel.o
#compatibility.o: Makefile
parallel.o: Makefile
bootstraps.o: Makefile instancesets.o utilities.o
prototypes.o: Makefile instancesets.o utilities.o options.o
graphics.o: Makefile utilities.o options.o
support.o: Makefile

clean:
	rm -f *.mod *.o ${PROJECT} gmon.out

#dist:
#	rm -f ${PROJECT}.tgz
#	cd .. && \
#		tar zcf ${DIR}/${PROJECT}.tgz ${DIR}/Makefile \
#		${DIR}/*.f90 ${DIR}/*.c ${DIR}/farg ${DIR}/parallel \
#		${DIR}/*.pl ${DIR}/LICENSE

.PHONY: clean dist

Overwriting parf/Makefile


Compiling

In [8]:
%%bash
cd parf
make

mpiifort -O3 -c -o parallel.o parallel/mpi.f90
icc -O3 -c support.c
mpiifort -O3 -c utilities.f90
mpiifort -O3 -c bitvectors.f90
mpiifort -O3 -c options.f90
mpiifort -O3 -c instancesets.f90
mpiifort -O3 -c bootstraps.f90
mpiifort -O3 -c trees.f90
mpiifort -O3 -c graphics.f90
mpiifort -O3 -c importances.f90
mpiifort -O3 -c prototypes.f90
mpiifort -O3 -c forests.f90
mpiifort -O3 -o parf main.f90 parallel.o trees.o bitvectors.o instancesets.o options.o utilities.o bootstraps.o forests.o importances.o prototypes.o graphics.o support.o


forests.f90(994): remark #8291: Recommended relationship between field width 'W' and the number of fractional digits 'D' in this edit descriptor is 'W>=D+7'.
      WRITE(handle, '(E10.4, 1X, E10.4, 1X, A30)'), med, dev, &
-----------------------^
forests.f90(994): remark #8291: Recommended relationship between field width 'W' and the number of fractional digits 'D' in this edit descriptor is 'W>=D+7'.
      WRITE(handle, '(E10.4, 1X, E10.4, 1X, A30)'), med, dev, &
----------------------------------^


In [9]:
! ldd parf/parf

	linux-vdso.so.1 =>  (0x00007ffc5f9a9000)
	libmpifort.so.12 => /opt/intel/parallel_studio_xe_2020/compilers_and_libraries_2020.2.254/linux/mpi/intel64/lib/libmpifort.so.12 (0x00007fec3704b000)
	libmpi.so.12 => /opt/intel/parallel_studio_xe_2020/compilers_and_libraries_2020.2.254/linux/mpi/intel64/lib/release/libmpi.so.12 (0x00007fec35e2f000)
	libdl.so.2 => /usr/lib64/libdl.so.2 (0x00007fec35c2b000)
	librt.so.1 => /usr/lib64/librt.so.1 (0x00007fec35a23000)
	libpthread.so.0 => /usr/lib64/libpthread.so.0 (0x00007fec35807000)
	libm.so.6 => /usr/lib64/libm.so.6 (0x00007fec35505000)
	libc.so.6 => /usr/lib64/libc.so.6 (0x00007fec35138000)
	libgcc_s.so.1 => /usr/lib64/libgcc_s.so.1 (0x00007fec34f22000)
	libfabric.so.1 => /opt/intel/parallel_studio_xe_2020/compilers_and_libraries_2020.2.254/linux/mpi/intel64/libfabric/lib/libfabric.so.1 (0x00007fec34ce0000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fec3740a000)


Check run

In [11]:
! parf/parf

PARF (C) 2005 Rudjer Boskovic Institute
Goran Topic, Tomislav Smuc; algorithm by Leo Breiman and Adele Cutler
Licensed under GNU GPL 2.0
 
Usage: rf [OPTION...]
-h | --help   show this message
-t file       file to use as training set
-a file       file to analyse and classify
-tv [file]    training set votes output file
-tc [file]    training set confusion matrix output file
-av [file]    test set votes output file
-ac [file]    test set confusion matrix output file
-ar [file]    test set classification results output file
-aa [file]    test set ARFF output file
-ta [file]    train + test set ARFF output file
-c class      the class attribute, or NEW, or LAST (default)
-cq [n[%]]    quantity of generated class instances (only with -c NEW)
-cp category  positive category
-n trees      the number of trees to grow
-f n          the fill method: 0=none, 1=rough, 2+=# of passes
-v n          redo the forest with n most important variables
-vs n         redo the forest with variables more s

Get example dataset

In [14]:
%%bash
mkdir datasets
cd datasets
wget https://raw.githubusercontent.com/efurlanm/ml/master/datasets/glass.arff

--2021-05-10 21:56:23--  https://raw.githubusercontent.com/efurlanm/ml/master/datasets/glass.arff
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.109.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 17850 (17K) [text/plain]
Saving to: ‘glass.arff’

     0K .......... .......                                    100% 66.5M=0s

2021-05-10 21:56:23 (66.5 MB/s) - ‘glass.arff’ saved [17850/17850]



In [17]:
! ls datasets/glass.arff

datasets/glass.arff


To train a forest in an example dataset (glass.arff), where -t file is the file to use as training set:

In [18]:
%%bash
parf/parf --verbose -t datasets/glass.arff > output.txt
head -n 15 output.txt

Seed:   1753208397
Loading training set
Number of training cases:    214
Number of attributes:         10
Counting classes
Number of used attributes:     9
Attributes to split on:        3
Sorting and ranking
Growing forest
        Tree #     1
        Tree #     2
        Tree #     3
        Tree #     4
        Tree #     5
        Tree #     6


Check MPI

https://www.osc.edu/supercomputing/batch-processing-at-osc/slurm_migration/slurm_migration_issues
* unset I_MPI_PMI_LIBRARY 
* export I_MPI_JOB_RESPECT_PROCESS_PLACEMENT=0

In [31]:
%%timeit -n 1 -r 1
! unset I_MPI_PMI_LIBRARY ; export I_MPI_JOB_RESPECT_PROCESS_PLACEMENT=0 ; mpirun -np 1 parf/parf --verbose -t datasets/glass.arff

Seed:  -2014858288
Loading training set
Number of training cases:    214
Number of attributes:         10
Counting classes
Number of used attributes:     9
Attributes to split on:        3
Sorting and ranking
Growing forest
        Tree #     1
        Tree #     2
        Tree #     3
        Tree #     4
        Tree #     5
        Tree #     6
        Tree #     7
        Tree #     8
        Tree #     9
        Tree #    10
        Tree #    11
        Tree #    12
        Tree #    13
        Tree #    14
        Tree #    15
        Tree #    16
        Tree #    17
        Tree #    18
        Tree #    19
        Tree #    20
        Tree #    21
        Tree #    22
        Tree #    23
        Tree #    24
        Tree #    25
        Tree #    26
        Tree #    27
        Tree #    28
        Tree #    29
        Tree #    30
        Tree #    31
        Tree #    32
        Tree #    33
        Tree #    34
        Tree #    35
        Tree #    36
        Tree #    37

In [29]:
%%timeit -n 1 -r 1
! unset I_MPI_PMI_LIBRARY ; export I_MPI_JOB_RESPECT_PROCESS_PLACEMENT=0 ; mpirun -np 6 parf/parf --verbose -t datasets/glass.arff

Seed:  -2058490808
Loading and distributing training set
        Tree #    17 on     1
        Tree #    33 on     2
        Tree #    50 on     3
        Tree #    67 on     4
        Tree #    84 on     5
Number of training cases:    214
Number of attributes:         10
Counting classes
Number of used attributes:     9
Attributes to split on:        3
Sorting and ranking
        Tree #    18 on     1
        Tree #    68 on     4
        Tree #    34 on     2
        Tree #    85 on     5
        Tree #    51 on     3
Growing forest
        Tree #     1 on     0
        Tree #    19 on     1
        Tree #    35 on     2
        Tree #    52 on     3
        Tree #    69 on     4
        Tree #    86 on     5
        Tree #     2 on     0
        Tree #    20 on     1
        Tree #    36 on     2
        Tree #    87 on     5
        Tree #    53 on     3
        Tree #    70 on     4
        Tree #     3 on     0
        Tree #    21 on     1
        Tree #    37 on     2
        T