# Lanzamiento de un cluster local para iPython parallel

En la máquina virtual:

- usa la pestaña `Clusters` del navegador de ficheros

En guane:

- abre un terminal y lanza (usando, en vez de 4, el número de engines que quieras):

        /usr/local/anaconda/bin/ipcluster start -n 4 


- **ACUERDATE DE APAGAR EL CLUSTER CUANDO NO LO NECESITES** (CONTROL-C o `/usr/local/anaconda/bin/ipcluster stop`)



### Crea un objeto cliente para empezar a interactuar con el cluster

In [5]:
import os,sys,time
import numpy as np

from IPython import parallel
rc = parallel.Client()
print "available engines", rc.ids

available engines [0, 1, 2, 3]


In [6]:
dv = rc.direct_view()

## 4. Ejecuta comandos en el cluster

Mandamos ejecutar una función a cada engine e inspeccionamos su ejecución

In [7]:
dv = rc.direct_view()
dr = dv.apply(lambda: "hello")

In [8]:
dr.get()

['hello', 'hello', 'hello', 'hello']

In [9]:
dr.metadata

[{'after': [],
  'completed': datetime.datetime(2016, 8, 13, 18, 20, 45, 757821),
  'data': {},
  'engine_id': 0,
  'engine_uuid': '801d9e0a-a44e-49c1-9cd2-af6987c166be',
  'error': None,
  'execute_input': None,
  'execute_result': None,
  'follow': [],
  'msg_id': '2c45aa24-e71b-448e-b950-0b6611920096',
  'outputs': [],
  'received': datetime.datetime(2016, 8, 13, 18, 20, 45, 759445),
  'started': datetime.datetime(2016, 8, 13, 18, 20, 45, 560688),
  'status': 'ok',
  'stderr': '',
  'stdout': '',
  'submitted': datetime.datetime(2016, 8, 13, 18, 20, 45, 556561)},
 {'after': [],
  'completed': datetime.datetime(2016, 8, 13, 18, 20, 45, 755587),
  'data': {},
  'engine_id': 1,
  'engine_uuid': 'cf3ffbea-debf-43d0-b9f9-d6e50c241ec2',
  'error': None,
  'execute_input': None,
  'execute_result': None,
  'follow': [],
  'msg_id': '4f052d82-0ba3-4ca1-a6f6-29cc3109dc08',
  'outputs': [],
  'received': datetime.datetime(2016, 8, 13, 18, 20, 45, 757392),
  'started': datetime.datetime(2016, 

Con `map` mapeamos todos hacemos una operación sobre todos los elementos de una lista en paralelo

In [10]:
dr = dv.map_sync(lambda x: x**2, range(10))

In [11]:
dr

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

accedemos y definimos variables en cada engine

In [12]:
dv.clear()
dv["a"]

CompositeError: one or more exceptions from call to method: _pull
[Engine Exception]NameError: name 'a' is not defined
[Engine Exception]NameError: name 'a' is not defined
[Engine Exception]NameError: name 'a' is not defined
[Engine Exception]NameError: name 'a' is not defined

In [13]:
a = np.array([1,2,3,4])
dv.push({"a": a+1})

<AsyncResult: _push>

In [14]:
dv["a"]

[array([2, 3, 4, 5]),
 array([2, 3, 4, 5]),
 array([2, 3, 4, 5]),
 array([2, 3, 4, 5])]

Distribuimos y recogemos datos y exploramos el espacio de nombres de cada engine

In [15]:
import numpy as np
data = np.random.randint(10, size=16)
print data
dv.scatter('a',data);

[2 9 9 0 4 3 1 9 4 1 6 6 7 8 1 4]


In [16]:
dv['a']

[array([2, 9, 9, 0]),
 array([4, 3, 1, 9]),
 array([4, 1, 6, 6]),
 array([7, 8, 1, 4])]

In [17]:
dv.apply (lambda: a+1).get()

[array([ 3, 10, 10,  1]),
 array([ 5,  4,  2, 10]),
 array([5, 2, 7, 7]),
 array([8, 9, 2, 5])]

In [18]:
dv.execute("import numpy as np")    
def create_b():
    global b
    b = np.copy(a)+1

In [19]:
dv.apply(create_b)

<AsyncResult: create_b>

In [20]:
dv['b']

[array([ 3, 10, 10,  1]),
 array([ 5,  4,  2, 10]),
 array([5, 2, 7, 7]),
 array([8, 9, 2, 5])]

In [21]:
dv.gather("b").get()

array([ 3, 10, 10,  1,  5,  4,  2, 10,  5,  2,  7,  7,  8,  9,  2,  5])

Fíjate cómo un array se distribuye por filas

In [22]:
data = np.random.randint(10, size=(10,5))
print data
dv.scatter('a',data);
print dv['a']

[[9 1 5 4 8]
 [8 9 5 2 9]
 [9 6 3 1 0]
 [4 8 5 5 1]
 [1 9 5 1 5]
 [5 7 8 5 5]
 [1 0 2 9 7]
 [7 4 2 6 0]
 [4 4 2 1 5]
 [1 7 5 1 6]]
[array([[9, 1, 5, 4, 8],
       [8, 9, 5, 2, 9],
       [9, 6, 3, 1, 0]]), array([[4, 8, 5, 5, 1],
       [1, 9, 5, 1, 5],
       [5, 7, 8, 5, 5]]), array([[1, 0, 2, 9, 7],
       [7, 4, 2, 6, 0]]), array([[4, 4, 2, 1, 5],
       [1, 7, 5, 1, 6]])]


Para distribuir por columnas usamos la transpuesta

In [23]:
dv.scatter('a',data.T);
print dv['a']

[array([[9, 8, 9, 4, 1, 5, 1, 7, 4, 1],
       [1, 9, 6, 8, 9, 7, 0, 4, 4, 7]]), array([[5, 5, 3, 5, 5, 8, 2, 2, 2, 5]]), array([[4, 2, 1, 5, 1, 5, 9, 6, 1, 1]]), array([[8, 9, 0, 1, 5, 5, 7, 0, 5, 6]])]


Observa que las estructuras compartidas son de solo lectura

In [24]:
def incr_a():
    global a
    a += 1

In [25]:
dv.apply(incr_a).get()

CompositeError: one or more exceptions from call to method: incr_a
[Engine Exception]ValueError: output array is read-only
[Engine Exception]ValueError: output array is read-only
[Engine Exception]ValueError: output array is read-only
[Engine Exception]ValueError: output array is read-only

Mide la escalabilidad, comparando la ejecución local con la distribuida

In [26]:
def gen_data(its):
    r = 0
    for i in xrange(int(its)):
        r += 1
    return r
        

In [27]:
%time gen_data(1e7)

CPU times: user 436 ms, sys: 0 ns, total: 436 ms
Wall time: 423 ms


10000000

In [28]:
%time dv.apply(gen_data, 1e7).get()

CPU times: user 32 ms, sys: 0 ns, total: 32 ms
Wall time: 596 ms


[10000000, 10000000, 10000000, 10000000]

### Parallel magics

In [29]:
%px print "hola"

[stdout:0] hola
[stdout:1] hola
[stdout:2] hola
[stdout:3] hola


In [30]:
%px import numpy as np

In [31]:
%px r = np.random.randint(10)

In [32]:
dv.gather("r").get()

[5, 9, 5, 3]

ejecuta un código en cada engine. Observa cómo recoge el stdout y la salida final. **ESTO ES ÚTIL PARA DEBUGGING**

In [33]:
%%px
def get_rnd_vector(l):
    print "calling with arg",l
    return np.random.randint(10, size=l)
get_rnd_vector(10)

[stdout:0] calling with arg 10
[stdout:1] calling with arg 10
[stdout:2] calling with arg 10
[stdout:3] calling with arg 10


[0;31mOut[0:4]: [0marray([3, 9, 5, 3, 3, 9, 3, 9, 8, 0])

[0;31mOut[1:4]: [0marray([2, 8, 8, 2, 7, 2, 0, 1, 1, 2])

[0;31mOut[2:4]: [0marray([5, 9, 1, 4, 6, 7, 4, 5, 6, 9])

[0;31mOut[3:4]: [0marray([8, 1, 2, 5, 5, 0, 1, 3, 1, 0])

In [34]:
dv.map(lambda x: get_rnd_vector(x), [2,3,4,5]).get()

[array([8, 9]), array([8, 4, 8]), array([3, 9, 3, 3]), array([8, 2, 1, 9, 8])]

observa que la función no existe en el cliente

In [35]:
get_rnd_vector(5)

NameError: name 'get_rnd_vector' is not defined

### Midiendo el tiempo de ejecución en IPython

In [36]:
def long_loop(N):
    for i in xrange(int(N)):
        a = 1

In [37]:
%timeit long_loop(1e3)

10000 loops, best of 3: 25.7 µs per loop


In [38]:
%timeit -n 100 -r 4 long_loop(1e3)

100 loops, best of 4: 47.3 µs per loop


In [39]:
t = %timeit -n 2000 -r 4 -o long_loop(1e3)

2000 loops, best of 4: 25.6 µs per loop


In [40]:
import numpy as np
print "loops       ", t.loops
print "repeats     ", t.repeat
print "compile time", t.compile_time
print "best        ", t.best
print "all         ", t.all_runs
print "all/nloops  ", np.array(t.all_runs)/t.loops

loops        2000
repeats      4
compile time 0.0
best         2.56325006485e-05
all          [0.08162617683410645, 0.07692599296569824, 0.05469989776611328, 0.05126500129699707]
all/nloops   [  4.08130884e-05   3.84629965e-05   2.73499489e-05   2.56325006e-05]


observa que cada run ejecuta 2000 veces (_loops_) el código. `t.all_runs` reporta el tiempo total de cada run. al dividir `t.all_runs` por el número total de _loops_ obtenemos el tiempo medio de ejecución del código.