## Reduction: the sum of the elements of an array

In [None]:
import sys

if len(sys.argv) > 1:
    value = int(sys.argv[1])
else:
    print ("Debe introducir un valor entero")

In [6]:
import numpy as np

def reduc_operation(A):
    """Compute the sum of the elements of Array A."""
    s = 0
    for i in range(A.size):
        s += A[i]
    return s

# Secuencial

#value = 5*10**7
X = np.random.rand(value)

# Para imprimir los primeros valores del array

#print(X[0:12])

# Utilizando las operaciones mágicas de ipython
print ("----------------- CODIGO ORIGINAL -----------------")
tiempo = %timeit -r 2 -o -q reduc_operation(X)

print("Time taken by reduction operation using a function:", tiempo)

print(f"And the result of the sum of numbers in the range [0, value) is: {reduc_operation(X)}\n")


# Utilizando numpy.sum()

tiempo = %timeit -r 2 -o -q np.sum(X)

print("Time taken by reduction operation using numpy.sum():", tiempo)

print("Now, the result using numpy.sum():", np.sum(X),"\n ")


Time taken by reduction operation using a function: 4.74 s ± 45.2 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
And the result of the sum of numbers in the range [0, value) is: 24999875.049707804

Time taken by reduction operation using numpy.sum(): 18.6 ms ± 2.75 μs per loop (mean ± std. dev. of 2 runs, 100 loops each)
Now, the result using numpy.sum(): 24999875.049701925 
 


## Apartado a)

In [6]:
import numpy as np
from multiprocessing import Pool

def reduc_operation(A):
    """Compute the sum of the elements of Array A."""
    s = 0
    for i in range(A.size):
        s += A[i]
    return s


#Obligatorio poner lo siguiente para que no cree procesos infinitos
if __name__ == "__main__":
    def parallel_reduction(A, n_cores):
      
    #Dividimos en funcion del num procesadores
        seccion = A // n_cores
        secciones = [] #Inicializamos lista vacía
    
        #Vamos añadiendo a una lista los rangos de numeros obtenidos
        for i in range(n_cores):
            inicio = i * seccion
            if i == n_cores - 1 :  
                fin = number
            else:
                fin = (i + 1) * seccion
            secciones.append(A[inicio:fin])
    
        #Creamos el pool
        with Pool(n_cores) as p:
            sumas_parciales = p.map(reduc_operation, seccion)
    
        #Reducir resultados
        return sum(sumas_parciales)



# -----------------------------
# DATOS y RANDOMIZADO
#value = 5*10**7
#n_cores = 2
X = np.random.rand(value)

# Para imprimir los primeros valores del array
#print(X[0:12])


# Utilizando las operaciones mágicas de ipython
print ("----------------- MULTIPROCESSING CON POOL -----------------")
tiempo = %timeit -r 2 -o -q reduc_operation(X)

print("Time taken by reduction operation using a function:", tiempo)

print(f"And the result of the sum of numbers in the range [0, value) is: {reduc_operation(X)}\n")


# Utilizando numpy.sum()
tiempo = %timeit -r 2 -o -q np.sum(X)

print("Time taken by reduction operation using numpy.sum():", tiempo)

print("Now, the result using numpy.sum():", np.sum(X),"\n ")

Time taken by reduction operation using a function: 4.84 s ± 75.8 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
And the result of the sum of numbers in the range [0, value) is: 24999177.02043322

Time taken by reduction operation using numpy.sum(): 18.6 ms ± 190 ns per loop (mean ± std. dev. of 2 runs, 100 loops each)
Now, the result using numpy.sum(): 24999177.020432606 
 


## Apartado b)

### Empleando @njit(parallel=False)

In [10]:
import numpy as np
from numba import njit

@njit 
def reduc_operation(A):
    """Compute the sum of the elements of Array A."""
    s = 0
    for i in range(A.size):
        s += A[i]
    return s

# Secuencial

#value = 5*10**7

X = np.random.rand(value)

# Para imprimir los primeros valores del array

#print(X[0:12])

# Utilizando las operaciones mágicas de ipython
print ("----------------- @NJIT (IGNORAR) -----------------")
tiempo = %timeit -r 2 -o -q reduc_operation(X)

print("Time taken by reduction operation using a function:", tiempo)

print(f"And the result of the sum of numbers in the range [0, value) is: {reduc_operation(X)}\n")


# Utilizando numpy.sum()

tiempo = %timeit -r 2 -o -q np.sum(X)

print("Time taken by reduction operation using numpy.sum():", tiempo)

print("Now, the result using numpy.sum():", np.sum(X),"\n ")
print ("---------------------------------------------------")

Time taken by reduction operation using a function: 49.4 ms ± 29.9 μs per loop (mean ± std. dev. of 2 runs, 10 loops each)
And the result of the sum of numbers in the range [0, value) is: 25000072.745078065

Time taken by reduction operation using numpy.sum(): 18.8 ms ± 41.4 μs per loop (mean ± std. dev. of 2 runs, 100 loops each)
Now, the result using numpy.sum(): 25000072.745078616 
 


### Empleando @njit(parallel=True)

In [11]:
import numpy as np
from numba import njit, prange

@njit(parallel=True)
def reduc_operation_parallel(A):
    """Compute the sum of the elements of Array A."""
    s = 0
    for i in prange(A.size):
        s += A[i]
    return s



# -----------------------------
# DATOS y RANDOMIZADO
#value = 5*10**7
#n_cores = 2
X = np.random.rand(value)

# Para imprimir los primeros valores del array
#print(X[0:12])


# Compilación previa de Numba
reduc_operation_parallel(X[:10])


# MEDIDAS TIEMPO 
# Utilizando las operaciones mágicas de ipython
print ("----------------- NUMBA CON PRANGE -----------------")
tiempo = %timeit -r 2 -o -q reduc_operation_parallel(X)

print("Time taken by reduction operation using a function:", tiempo)

print(f"And the result of the sum of numbers in the range [0, value) is: {reduc_operation_parallel(X)}\n")


# Utilizando numpy.sum()
tiempo = %timeit -r 2 -o -q np.sum(X)

print("Time taken by reduction operation using numpy.sum():", tiempo)

print("Now, the result using numpy.sum():", np.sum(X),"\n ")

Time taken by reduction operation using a function: 11.4 ms ± 21.1 μs per loop (mean ± std. dev. of 2 runs, 100 loops each)
And the result of the sum of numbers in the range [0, value) is: 25000512.118170958

Time taken by reduction operation using numpy.sum(): 18.6 ms ± 2.6 μs per loop (mean ± std. dev. of 2 runs, 100 loops each)
Now, the result using numpy.sum(): 25000512.118171137 
 


# RESULTADOS 

## CPUS = 1
### Ejecutando con numero: 100000000
#### -------------- CÓDIGO ORIGINAL --------------
Time taken by reduction operation using a function: 17.2 s ± 210 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)

And the result of the sum of numbers in the range [0, value) is: 49996982.96635376

Time taken by reduction operation using numpy.sum(): 74.3 ms ± 3.14 ms per loop (mean ± std. dev. of 2 runs, 10 loops each)

Now, the result using numpy.sum(): 49996982.966367364

#### -------------- Multiprocessing con Pool --------------
Time taken by reduction operation using a function: 16.9 s ± 47.9 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)

And the result of the sum of numbers in the range [0, value) is: 50002300.765394

Time taken by reduction operation using numpy.sum(): 65.3 ms ± 590 μs per loop (mean ± std. dev. of 2 runs, 10 loops each)

Now, the result using numpy.sum(): 50002300.765411116

#### -------------- Numba con prange --------------
Time taken by reduction operation using a function: 119 ms ± 309 μs per loop (mean ± std. dev. of 2 runs, 10 loops each)

And the result of the sum of numbers in the range [0, value) is: 49994308.84608965

Time taken by reduction operation using numpy.sum(): 64.3 ms ± 152 μs per loop (mean ± std. dev. of 2 runs, 10 loops each)

Now, the result using numpy.sum(): 49994308.84609552



### Ejecutando con numero: 100000000
#### -------------- CÓDIGO ORIGINAL --------------
Time taken by reduction operation using a function: 2min 54s ± 289 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)

And the result of the sum of numbers in the range [0, value) is: 500003733.2576759

Time taken by reduction operation using numpy.sum(): 655 ms ± 419 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)

Now, the result using numpy.sum(): 500003733.25784916

#### -------------- Multiprocessing con Pool --------------
Time taken by reduction operation using a function: 2min 52s ± 606 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)

And the result of the sum of numbers in the range [0, value) is: 499998549.6664798

Time taken by reduction operation using numpy.sum(): 682 ms ± 9.98 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)

Now, the result using numpy.sum(): 499998549.6660171

#### -------------- Numba con prange --------------
Time taken by reduction operation using a function: 1.16 s ± 286 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)

And the result of the sum of numbers in the range [0, value) is: 500011382.6093039

Time taken by reduction operation using numpy.sum(): 644 ms ± 317 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)

Now, the result using numpy.sum(): 500011382.60930663


## CPUS = 2
### Ejecutando con numero: 100000000
#### -------------- CÓDIGO ORIGINAL --------------
Time taken by reduction operation using a function: 17 s ± 98 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)

And the result of the sum of numbers in the range [0, value) is: 50002223.80835671

Time taken by reduction operation using numpy.sum(): 73.3 ms ± 3.81 ms per loop (mean ± std. dev. of 2 runs, 10 loops each)

Now, the result using numpy.sum(): 50002223.808360256

#### -------------- Multiprocessing con Pool --------------
Time taken by reduction operation using a function: 16.9 s ± 33.4 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)

And the result of the sum of numbers in the range [0, value) is: 50003677.81613994

Time taken by reduction operation using numpy.sum(): 67.6 ms ± 1.8 ms per loop (mean ± std. dev. of 2 runs, 10 loops each)

Now, the result using numpy.sum(): 50003677.81611796

#### -------------- Numba con prange --------------
Time taken by reduction operation using a function: 66.3 ms ± 6.3 ms per loop (mean ± std. dev. of 2 runs, 10 loops each)

And the result of the sum of numbers in the range [0, value) is: 50002727.69792643

Time taken by reduction operation using numpy.sum(): 71.8 ms ± 2.18 ms per loop (mean ± std. dev. of 2 runs, 10 loops each)

Now, the result using numpy.sum(): 50002727.69791478



### Ejecutando con numero: 100000000
#### -------------- CÓDIGO ORIGINAL --------------
Time taken by reduction operation using a function: 2min 52s ± 2.53 s per loop (mean ± std. dev. of 2 runs, 1 loop each)

And the result of the sum of numbers in the range [0, value) is: 500006452.62328076

Time taken by reduction operation using numpy.sum(): 641 ms ± 307 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)

Now, the result using numpy.sum(): 500006452.62379396

#### -------------- Multiprocessing con Pool --------------
Time taken by reduction operation using a function: 2min 48s ± 4.28 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)

And the result of the sum of numbers in the range [0, value) is: 500001393.41184926

Time taken by reduction operation using numpy.sum(): 641 ms ± 313 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)

Now, the result using numpy.sum(): 500001393.4113889

#### -------------- Numba con prange --------------
Time taken by reduction operation using a function: 616 ms ± 13.6 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)

And the result of the sum of numbers in the range [0, value) is: 500011382.36753476

Time taken by reduction operation using numpy.sum(): 688 ms ± 7.95 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)

Now, the result using numpy.sum(): 500011382.3678243


## CPUS = 4
### Ejecutando con numero: 100000000
#### -------------- CÓDIGO ORIGINAL --------------
Time taken by reduction operation using a function: 17.4 s ± 74.5 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)

And the result of the sum of numbers in the range [0, value) is: 50003088.73101376

Time taken by reduction operation using numpy.sum(): 68.6 ms ± 2.23 ms per loop (mean ± std. dev. of 2 runs, 10 loops each)

Now, the result using numpy.sum(): 50003088.73100576

#### -------------- Multiprocessing con Pool --------------
Time taken by reduction operation using a function: 17.4 s ± 11.3 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)

And the result of the sum of numbers in the range [0, value) is: 49998156.896956

Time taken by reduction operation using numpy.sum(): 64.2 ms ± 46.5 μs per loop (mean ± std. dev. of 2 runs, 10 loops each)

Now, the result using numpy.sum(): 49998156.89694881



#### -------------- Numba con prange --------------
Time taken by reduction operation using a function: 41.2 ms ± 1.65 ms per loop (mean ± std. dev. of 2 runs, 10 loops each)

And the result of the sum of numbers in the range [0, value) is: 49995269.4675535

Time taken by reduction operation using numpy.sum(): 75.5 ms ± 73.2 μs per loop (mean ± std. dev. of 2 runs, 10 loops each)

Now, the result using numpy.sum(): 49995269.46755491



### Ejecutando con numero: 100000000
#### -------------- CÓDIGO ORIGINAL --------------
Time taken by reduction operation using a function: 2min 54s ± 1.43 s per loop (mean ± std. dev. of 2 runs, 1 loop each)

And the result of the sum of numbers in the range [0, value) is: 499994543.34291667

Time taken by reduction operation using numpy.sum(): 654 ms ± 217 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)

Now, the result using numpy.sum(): 499994543.3425887

#### -------------- Multiprocessing con Pool --------------
Time taken by reduction operation using a function: 2min 51s ± 370 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)

And the result of the sum of numbers in the range [0, value) is: 500020466.8083857

Time taken by reduction operation using numpy.sum(): 702 ms ± 301 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)

Now, the result using numpy.sum(): 500020466.8076685

#### -------------- Numba con prange --------------
Time taken by reduction operation using a function: 319 ms ± 3.22 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)

And the result of the sum of numbers in the range [0, value) is: 500001525.94071114

Time taken by reduction operation using numpy.sum(): 675 ms ± 174 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)

Now, the result using numpy.sum(): 500001525.94088966


## CPUS = 8
### Ejecutando con numero: 100000000
#### -------------- CÓDIGO ORIGINAL --------------
Time taken by reduction operation using a function: 17.5 s ± 72.2 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)

And the result of the sum of numbers in the range [0, value) is: 49999440.14455078

Time taken by reduction operation using numpy.sum(): 64 ms ± 3.18 μs per loop (mean ± std. dev. of 2 runs, 10 loops each)

Now, the result using numpy.sum(): 49999440.14455183

#### -------------- Multiprocessing con Pool --------------
Time taken by reduction operation using a function: 17.2 s ± 241 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)

And the result of the sum of numbers in the range [0, value) is: 49997490.398592226

Time taken by reduction operation using numpy.sum(): 64.1 ms ± 18.3 μs per loop (mean ± std. dev. of 2 runs, 10 loops each)

Now, the result using numpy.sum(): 49997490.39860623


#### -------------- Numba con prange --------------
Time taken by reduction operation using a function: 24.9 ms ± 2.17 ms per loop (mean ± std. dev. of 2 runs, 10 loops each)

And the result of the sum of numbers in the range [0, value) is: 50001246.382065885

Time taken by reduction operation using numpy.sum(): 64 ms ± 1.88 μs per loop (mean ± std. dev. of 2 runs, 10 loops each)

Now, the result using numpy.sum(): 50001246.382064395



### Ejecutando con numero: 100000000
#### -------------- CÓDIGO ORIGINAL --------------
Time taken by reduction operation using a function: 2min 50s ± 1.09 s per loop (mean ± std. dev. of 2 runs, 1 loop each)

And the result of the sum of numbers in the range [0, value) is: 500005945.22715646

Time taken by reduction operation using numpy.sum(): 642 ms ± 4.43 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)

Now, the result using numpy.sum(): 500005945.2265876

#### -------------- Multiprocessing con Pool --------------
Time taken by reduction operation using a function: 2min 48s ± 21.2 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)

And the result of the sum of numbers in the range [0, value) is: 500005587.53612226

Time taken by reduction operation using numpy.sum(): 643 ms ± 125 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)

Now, the result using numpy.sum(): 500005587.53613186

#### -------------- Numba con prange --------------
Time taken by reduction operation using a function: 251 ms ± 21.2 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)

And the result of the sum of numbers in the range [0, value) is: 499992525.3752769

Time taken by reduction operation using numpy.sum(): 677 ms ± 5.41 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)

Now, the result using numpy.sum(): 499992525.3752307


# Interpretación de resultados

Cuando analizamos el **codigo original**, usando las funcione mágicas de Python, se obtiene una media de 17s para 10^8 y casi 3 minutos para 10^9. En este caso, el incremento del orden del valor introducido se traduce en un gran aumento del tiempo de carga, así como vemos que no varía al incrementar las CPUs (puesto que no se paraleliza). El rendimiento así no es el más adecuado.

Cuando lo observamos con **numpy.sum()** se obtienen valores de ms, aún cuando hablamos de valores de 10^9, lo cual destaca teniendo en cuenta que el código original necesita de casi 3 minutos para realizar el cálculo. Esto, como sabemos, se debe por el método de actuar de numpy, mucho más veloz al evitar la manera del lenguaje interpretado de Python. De igual forma, el incremento de CPUs no se relaciona con una disminución del tiempo.

Usando **Multiprocessing con Pool** se obtienen tiempos similares al código original (10^8 = 17s; 10^9 = 2'50''). A pesar de que empleemos un método de paralelización, vemos que el tiempo no disminuye. Esto se debe, igual que en el apartado 3.2, porque estamos analizando arrays enormes entre procesos, y la serialización y copia de memoria es tan alta que acaba tardando el mismo tiempo. Así, el proceso de reducción es tan simple que no merece la pena llevar a cabo la paralelización mediante este método. El empleo de la función numpy.sum() ayuda a reducir este tiempo.

Por último, usando **Numba con prange**, sí que vemos una clara reducción del tiempo de procesado incluso usando la función mágica de iPython. Así, se consiguen tiempos de 119 ms para 10^8 elementos y de 1,16s para 10^9 elementos (en ambos casos, usando una CPU). De igual forma, cuando incrementamos las CPUs involucradas en el prceso, vemos cómo el tiempo disminuye aún más, llegando a alcanzar 25 ms y 251 ms para 10^8 y 10^9 elementos, respectivamente.

Igual que se explicó en el ejercicio anterior, Numba reduce mucho el tiempo al no haber una copia de memoria y compartir todos los hilos el mismo array. Bien es cierto que pueden encontrarse pequeñas diferencias en la suma (debido a variaciones en los dígitos), pues puede deberse al distinto orden de las sumas o algún error en la coma; pero los resultados obtenidos son claros.