
# Tutorial de Numpy


"Numpy" es la abreviatura de "Numerical Python".

Numpy nos proporciona operaciones sobre arrays multi-dimensionales muy eficientes.

Numpy nos permite operar con Vectores o Matrices.

La características clave de Numpy son:
- ndarrays: array de n dimensiones con el mismo tipo de datos, optimizado para un cálculo rápido y eficiente.
- Broadcasting: herramienta que facilita la operación con arrays de diferentes dimensiones.
- Vectorización: permite operaciones aritméticas con ndarrays
- Input/Output: simplifica la lectura y escritura de datos en ficheros.

Otros Recursos sobre Numpy:
- Manual de Referencia: https://docs.scipy.org/doc/numpy-1.13.0/reference/
- Python for Data Analysis, Wes McKinney
- Python Data Science Handbook, Jave VanderPlas


# Introducción

**ndarrays** son arrays multi-dimensionales optimizados para un cálculo rápido y eficiente.


## Creación de un Vector o Array de 1 dimensión

In [1]:
import numpy as np
np.__version__

'1.15.4'

In [2]:
# Creamos un array de 1 dimensión
an_array = np.array([3, 33, 333])

print(type(an_array))

<class 'numpy.ndarray'>


In [3]:
# Consultamos las dimensiones del array
print(an_array.shape)

(3,)


In [4]:
# Accedemos al array con 1 indice, ya que tiene 1 dimensión
print(an_array[0], an_array[1], an_array[2]) 

3 33 333


In [5]:
# ndarrays son mutables, podemos modificar los elementos del array
an_array[0] = 888

print(an_array)

[888  33 333]


## Creación de una Matriz o Array de 2 dimensiones

Un Array de 2 dimensiones es una Matriz y proporciona todas las operaciones habituales para el cálculo matricial.

In [6]:
# Crear una Matriz
another = np.array([[11,12,13],[21,22,23]])

print(another)

print("Las dimensión de la mátriz creada es (filas, columnas):", another.shape)

print("Accediendo a los elementos [0,0], [0,1] y [1,0]: ", another[0,0], ", ", another[0,1], ", ", another[1,0])

[[11 12 13]
 [21 22 23]]
Las dimensión de la mátriz creada es (filas, columnas): (2, 3)
Accediendo a los elementos [0,0], [0,1] y [1,0]:  11 ,  12 ,  21



## Creación de Arrays

Podemos crear Arrays con diferentes funciones de Numpy.

In [7]:
import numpy as np

# Crear un array 2x2 con ceros
ex1 = np.zeros((2,2))      
print(ex1)                              

[[0. 0.]
 [0. 0.]]


In [8]:
# Crear un array 2x2 con 9.0
ex2 = np.full((2,2), 9.0)  
print(ex2)   

[[9. 9.]
 [9. 9.]]


In [9]:
# Crear una matriz identidad (con todos los elementos de la diagonal igual a 1)
ex3 = np.eye(2,2)
print(ex3)  

[[1. 0.]
 [0. 1.]]


In [10]:
# Crear un arrar con 1
ex4 = np.ones((1,2))
print(ex4)    

[[1. 1.]]


In [11]:
# El array ex4, que hemos creado, es un array de dimension 1x2
print(ex4.shape)

# Tenemos que acceder usando 2 indices
print()
print(ex4[0,1])

(1, 2)

1.0


In [12]:
# Crear un array con números aleatorios entre 0 y 1
ex5 = np.random.random((2,3))
print(ex5)    

[[0.17428694 0.58665458 0.90320157]
 [0.30896931 0.25503498 0.64074487]]


## Indexación de Arrays

La indexación nos permite obtener sub-arrays de un array.

In [13]:
import numpy as np

# Crear una matriz de dimensiones 3 x 4
an_array = np.array([[11,12,13,14], [21,22,23,24], [31,32,33,34]])
print(an_array)

[[11 12 13 14]
 [21 22 23 24]
 [31 32 33 34]]


In [14]:
# Obtener un subarray de 2 x 2
a_slice = an_array[:2, 1:3]
print(a_slice)

[[12 13]
 [22 23]]


In [15]:
# Modificar un subarray también modifica el array original
print("Before:", an_array[0, 1]) 
a_slice[0, 0] = 1000
print("After:", an_array[0, 1])
print(an_array)

Before: 12
After: 1000
[[  11 1000   13   14]
 [  21   22   23   24]
 [  31   32   33   34]]


In [16]:
# Crear un nuevo array copiando de un array
an_array[0:1] = 12
a_slice = np.array(an_array[:2, 1:3])
print(a_slice)

[[12 12]
 [22 23]]


In [17]:
# Modificar una copia no modifica el array original
print("Before:", an_array[0, 1]) 
a_slice[0, 0] = 1000
print("After:", an_array[0, 1])
print(an_array)

Before: 12
After: 12
[[12 12 12 12]
 [21 22 23 24]
 [31 32 33 34]]


La indexación nos permite obtener filas o columnas de un array.

In [18]:
# Crear una matriz de dimensiones 3 x 4
an_array = np.array([[11,12,13,14], [21,22,23,24], [31,32,33,34]])
print(an_array)

[[11 12 13 14]
 [21 22 23 24]
 [31 32 33 34]]


In [19]:
# Obtener una Fila de un Array como Vector de dimensión 4
row_rank1 = an_array[1, :]

print(row_rank1, row_rank1.shape)
print(row_rank1[2])

[21 22 23 24] (4,)
23


In [20]:
# Obtener una Fila de un Array como Matriz de dimensión 1 x 4
row_rank2 = an_array[1:2, :]

print(row_rank2, row_rank2.shape)
print(row_rank2[0,2])

[[21 22 23 24]] (1, 4)
23


In [21]:
# Obtener una Columna de un Array
print()
col_rank1 = an_array[:, 1]
col_rank2 = an_array[:, 1:2]

print(col_rank1, col_rank1.shape)
print()
print(col_rank2, col_rank2.shape)


[12 22 32] (3,)

[[12]
 [22]
 [32]] (3, 1)


## Indexación Avanzada con Array de Indices

Nos permite acceder/modificar elementos de un array de indices

In [22]:
# Crear un nuevo array
an_array = np.array([[11,12,13], [21,22,23], [31,32,33], [41,42,43]])

print('Array:')
print(an_array)

Array:
[[11 12 13]
 [21 22 23]
 [31 32 33]
 [41 42 43]]


In [23]:
# Crear un array de indices
col_indices = np.array([0, 1, 2, 2])
print('\nIndices de Columna: ', col_indices)

row_indices = np.arange(4)
print('\nIndices de Fila : ', row_indices)


Indices de Columna:  [0 1 2 2]

Indices de Fila :  [0 1 2 3]


In [24]:
# Obtener un array de filas y columnas
for row,col in zip(row_indices,col_indices):
    print(row, ", ",col)

0 ,  0
1 ,  1
2 ,  2
3 ,  2


In [25]:
# Consultar los valores del array
print('Consultar valores del array: ',an_array[row_indices, col_indices])

Consultar valores del array:  [11 22 33 43]


In [26]:
# Modificar los valores del array
an_array[row_indices, col_indices] += 100000

print('\nArray:')
print(an_array)


Array:
[[100011     12     13]
 [    21 100022     23]
 [    31     32 100033]
 [    41     42 100043]]


## Indexación Avanzada con Array de Booleanos

Nos permite acceder/modificar elementos de un array de Booleanos

In [27]:
# Crear un array de 3x2
an_array = np.array([[11,12], [21, 22], [31, 32]])
print(an_array)

[[11 12]
 [21 22]
 [31 32]]


In [28]:
# Crear un filtro o array de Booleanos
filter = (an_array < 15)
filter

array([[ True,  True],
       [False, False],
       [False, False]])

In [29]:
# Consultar valores de un array
print(an_array[filter])

[11 12]


In [30]:
# Crear un filtro
filter = ((an_array > 20) & (an_array < 30))
filter

array([[False, False],
       [ True,  True],
       [False, False]])

In [31]:
# Consultar valores de un array
print(an_array[filter])

[21 22]


In [32]:
# Consultar valores de un array
an_array[(an_array % 2 == 0)]

array([12, 22, 32])

In [33]:
# Modificar valores de un array
an_array[an_array % 2 == 0] +=100
print(an_array)

[[ 11 112]
 [ 21 122]
 [ 31 132]]


# Tipos de Datos y Operaciones con Arrays

## Tipos de Datos

Un **ndarray** tiene un tipo de datos único

In [34]:
ex1 = np.array([11, 12])
print(ex1.dtype)

int32


In [35]:
ex2 = np.array([11.0, 12.0])
print(ex2.dtype)

float64


In [36]:
# Asignación explícita del tipo de datos
ex3 = np.array([11, 21], dtype=np.int64)
print(ex3.dtype)

int64


In [37]:
# Asignación explícita del tipo de datos convirtiendo los reales a enteros
ex4 = np.array([11.1,12.7], dtype=np.int64)
print(ex4.dtype)
print()
print(ex4)

int64

[11 12]


In [38]:
# Asignación explícita del tipo de datos convirtiendo los enteros a reales
ex5 = np.array([11, 21], dtype=np.float64)
print(ex5.dtype)
print()
print(ex5)

float64

[11. 21.]


# Operaciones Aritméticas con Arrays

In [39]:
# Creamos 2 Arrays
x = np.array([[111,112],[121,122]], dtype=np.int)
y = np.array([[211.1,212.1],[221.1,222.1]], dtype=np.float64)

print(x)
print()
print(y)

[[111 112]
 [121 122]]

[[211.1 212.1]
 [221.1 222.1]]


In [40]:
# Suma
print(x + y)
print()
print(np.add(x, y))

[[322.1 324.1]
 [342.1 344.1]]

[[322.1 324.1]
 [342.1 344.1]]


In [41]:
# Resta
print(x - y)
print()
print(np.subtract(x, y))

[[-100.1 -100.1]
 [-100.1 -100.1]]

[[-100.1 -100.1]
 [-100.1 -100.1]]


In [42]:
# Multiplicación
print(x * y)
print()
print(np.multiply(x, y))

[[23432.1 23755.2]
 [26753.1 27096.2]]

[[23432.1 23755.2]
 [26753.1 27096.2]]


In [43]:
# División
print(x / y)
print()
print(np.divide(x, y))

[[0.52581715 0.52805281]
 [0.54726368 0.54930212]]

[[0.52581715 0.52805281]
 [0.54726368 0.54930212]]


In [44]:
# Raiz Cuadrada
print(np.sqrt(x))

[[10.53565375 10.58300524]
 [11.         11.04536102]]


In [45]:
# Exponencial (e ** x)
print(np.exp(x))

[[1.60948707e+48 4.37503945e+48]
 [3.54513118e+52 9.63666567e+52]]


# Operaciones Estadísticas con Arrays

In [46]:
# Crear una matriz aleatoría de 2 x 5
arr = 10 * np.random.randn(2,5)
print(arr)

[[ -0.43835135  10.63793962  12.10990923  -2.60476339  -4.15761084]
 [-10.45503317   1.70937009  -8.23120121  20.82428815 -13.01083837]]


In [47]:
# Calcular la Media
print(arr.mean())

0.638370874564124


In [48]:
# Calcular la Media por Filas
print(arr.mean(axis = 1))

[ 3.10942465 -1.8326829 ]


In [49]:
# Calcular la Media por Columnas
print(arr.mean(axis = 0))

[-5.44669226  6.17365485  1.93935401  9.10976238 -8.58422461]


In [50]:
# Calcular la Suma
print(arr.sum())

6.38370874564124


In [51]:
# Calcular la Mediana
print(np.median(arr, axis = 1))

[-0.43835135 -8.23120121]


# Ordenación de Arrays

In [52]:
# Crear un Array de 10 elementos
unsorted = np.random.randn(10)

print(unsorted)

[-0.59487471  1.32891422 -0.06836227  2.00925945 -3.02010785 -0.17603435
 -1.3013887   0.57805793 -0.52748747 -0.01819598]


In [53]:
# Crear una Copia y Ordenar
sorted = np.array(unsorted)
sorted.sort()

print(sorted)
print()
print(unsorted)

[-3.02010785 -1.3013887  -0.59487471 -0.52748747 -0.17603435 -0.06836227
 -0.01819598  0.57805793  1.32891422  2.00925945]

[-0.59487471  1.32891422 -0.06836227  2.00925945 -3.02010785 -0.17603435
 -1.3013887   0.57805793 -0.52748747 -0.01819598]


In [54]:
# Ordenar
unsorted.sort() 

print(unsorted)

[-3.02010785 -1.3013887  -0.59487471 -0.52748747 -0.17603435 -0.06836227
 -0.01819598  0.57805793  1.32891422  2.00925945]


# Búsqueda de valores únicos en un Array

In [55]:
array = np.array([1,2,1,4,2,1,4,2])

print(np.unique(array))

[1 2 4]


# Operaciones de conjuntos con un Array

In [56]:
s1 = np.array(['desk','chair','bulb'])
s2 = np.array(['lamp','bulb','chair'])
print(s1, s2)

['desk' 'chair' 'bulb'] ['lamp' 'bulb' 'chair']


In [57]:
print( np.intersect1d(s1, s2) ) 

['bulb' 'chair']


In [58]:
print( np.union1d(s1, s2) )

['bulb' 'chair' 'desk' 'lamp']


In [59]:
# Elementos del conjunto s1 que no están en el conjunto s2
print( np.setdiff1d(s2, s1) )

['lamp']


In [60]:
# ¿Qué elementos de s1 están en s2?
print( np.in1d(s1, s2) )

[False  True  True]


# *Broadcasting*: expansión o adaptación de matrices

Manual de Referencia: https://docs.scipy.org/doc/numpy-1.13.0/user/basics.broadcasting.html

In [61]:
import numpy as np

start = np.zeros((4,3))
print(start)

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


In [62]:
# Crear una Fila con 3 valores
add_rows = np.array([1, 0, 2])
print(add_rows)

[1 0 2]


In [63]:
# Sumar Filas
y = start + add_rows
print(y)

[[1. 0. 2.]
 [1. 0. 2.]
 [1. 0. 2.]
 [1. 0. 2.]]


In [64]:
# Crear una Columna con 4 valores
add_cols = np.array([[0,1,2,3]])
# Calcular la matriz traspuesta
print(add_cols)
add_cols = add_cols.T
print(add_cols)

[[0 1 2 3]]
[[0]
 [1]
 [2]
 [3]]


In [65]:
# Sumar Columnas
y = start + add_cols 
print(y)

[[0. 0. 0.]
 [1. 1. 1.]
 [2. 2. 2.]
 [3. 3. 3.]]


In [66]:
# Sumar un escalar
add_scalar = np.array([1])  
print(start+add_scalar)

[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]


# Test de Velocidad: ndarrays vs lists

In [67]:
# Import Modules
from numpy import arange

# Definir Parámetros
size    = 1000000

In [68]:
# Crear un array con valores 0,1,2,...,size-1
nd_array = arange(size)
print( type(nd_array) )

<class 'numpy.ndarray'>


In [69]:
# Sumar los elementos de un Array
%timeit nd_array.sum()

506 µs ± 52.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [70]:
# Crear una lista con valores 0,1,2,...,size-1
a_list = list(range(size))
print (type(a_list) )

<class 'list'>


In [71]:
# Sumar los elementos de una Lista
%timeit sum(a_list)

29.2 ms ± 3.02 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


# Lectura/Escritura de datos en Ficheros

## Formato Binario

In [72]:
x = np.array([ 23.23, 24.24] )

In [73]:
np.save('an_array', x)

In [74]:
np.load('an_array.npy')

array([23.23, 24.24])

## Formato Texto

In [75]:
np.savetxt('array.txt', X=x, delimiter=',')

In [76]:
!more array.txt

Hay demasiados argumentos en la l¡nea de comandos.


In [77]:
np.loadtxt('array.txt', delimiter=',')

array([23.23, 24.24])

# Otras Operaciones con Arrays

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

Dot Product on Matrices and Inner Product on Vectors:

</p>

In [78]:
# determine the dot product of two matrices
x2d = np.array([[1,1],[1,1]])
y2d = np.array([[2,2],[2,2]])

print(x2d.dot(y2d))
print()
print(np.dot(x2d, y2d))

[[4 4]
 [4 4]]

[[4 4]
 [4 4]]


In [79]:
# determine the inner product of two vectors
a1d = np.array([9 , 9 ])
b1d = np.array([10, 10])

print(a1d.dot(b1d))
print()
print(np.dot(a1d, b1d))

180

180


In [80]:
# dot produce on an array and vector
print(x2d.dot(a1d))
print()
print(np.dot(x2d, a1d))

[18 18]

[18 18]


<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

Sum:
</p>

In [81]:
# sum elements in the array
ex1 = np.array([[11,12],[21,22]])

print(np.sum(ex1))          # add all members

66


In [82]:
print(np.sum(ex1, axis=0))  # columnwise sum

[32 34]


In [83]:
print(np.sum(ex1, axis=1))  # rowwise sum

[23 43]


<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

Element-wise Functions: </p>

For example, let's compare two arrays values to get the maximum of each.

In [84]:
# random array
x = np.random.randn(8)
x

array([ 0.09564294,  1.46713357, -0.26315612, -1.15802877, -0.57740535,
       -0.14820351,  0.29848076,  0.25095652])

In [85]:
# another random array
y = np.random.randn(8)
y

array([-1.4596354 ,  0.76100875, -0.05287518,  1.08407807, -0.4441135 ,
        0.93863326,  1.32119617,  0.69704281])

In [86]:
# returns element wise maximum between two arrays

np.maximum(x, y)

array([ 0.09564294,  1.46713357, -0.05287518,  1.08407807, -0.4441135 ,
        0.93863326,  1.32119617,  0.69704281])

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

Reshaping array:
</p>

In [87]:
# grab values from 0 through 19 in an array
arr = np.arange(20)
print(arr)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]


In [88]:
# reshape to be a 4 x 5 matrix
arr.reshape(4,5)

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

Transpose:

</p>

In [89]:
# transpose
ex1 = np.array([[11,12],[21,22]])

ex1.T

array([[11, 21],
       [12, 22]])

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

Indexing using where():</p>

In [90]:
x_1 = np.array([1,2,3,4,5])

y_1 = np.array([11,22,33,44,55])

filter = np.array([True, False, True, False, True])

In [91]:
out = np.where(filter, x_1, y_1)
print(out)

[ 1 22  3 44  5]


In [92]:
mat = np.random.rand(5,5)
mat

array([[0.85690904, 0.33159364, 0.03051342, 0.41956247, 0.65452244],
       [0.9390225 , 0.36825302, 0.97942977, 0.57279563, 0.14822629],
       [0.62932527, 0.01381795, 0.70992861, 0.98146379, 0.00764663],
       [0.5015502 , 0.78129893, 0.55716885, 0.42771044, 0.71763319],
       [0.71836609, 0.22656518, 0.5318114 , 0.04364471, 0.67703102]])

In [93]:
np.where( mat > 0.5, 1000, -1)

array([[1000,   -1,   -1,   -1, 1000],
       [1000,   -1, 1000, 1000,   -1],
       [1000,   -1, 1000, 1000,   -1],
       [1000, 1000, 1000,   -1, 1000],
       [1000,   -1, 1000,   -1, 1000]])

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

"any" or "all" conditionals:</p>

In [94]:
arr_bools = np.array([False , False, False, True, False ])

In [95]:
print(arr_bools.any())

True


In [96]:
arr_bools.all()

False

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

Random Number Generation:
</p>

In [97]:
Y = np.random.normal(size = (3,5))
print(Y)

[[ 0.06709616  0.11019443 -0.28133371 -0.13706717 -1.24934365]
 [ 0.58675697  2.94742275 -0.09274158 -0.79129189 -1.0930597 ]
 [-0.04229955 -0.4073905  -0.19934833  0.29167681 -0.80552946]]


In [98]:
Z = np.random.randint(low=2,high=50,size=12)
print(Z)

[48 36 26 30 33  5 25 28 25  8 20  7]


In [99]:
np.random.permutation(Z) #return a new ordering of elements in Z

array([ 8,  5, 26, 25, 36, 25, 48,  7, 20, 33, 28, 30])

In [100]:
np.random.uniform(size=4) #uniform distribution

array([0.45767137, 0.6033951 , 0.50772943, 0.22577735])

In [101]:
np.random.normal(size=4) #normal distribution

array([-0.9100793 , -0.48630962, -0.11909927, -0.85066862])

## numpy.random vs random

From Python for Data Analysis, the module numpy.random supplements the Python random with functions for efficiently generating whole arrays of sample values from many kinds of probability distributions.

By contrast, Python's built-in random module only samples one value at a time, while numpy.random can generate very large sample faster. Using IPython magic function %timeit one can see which module performs faster:

In [102]:
from random import normalvariate
N = 1000000

print("Time using Python random:")
%timeit samples = [normalvariate(0, 1) for i in range(N)]

print("Time using Numpy random:")
%timeit np.random.normal(size=N)


Time using Python random:
1.16 s ± 262 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Time using Numpy random:
66.6 ms ± 110 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

Merging data sets:
</p>

In [103]:
K = np.random.randint(low=2,high=50,size=(2,2))
print(K)

print()
M = np.random.randint(low=2,high=50,size=(2,2))
print(M)

[[44 22]
 [ 6  8]]

[[39 12]
 [48 33]]


In [104]:
np.vstack((K,M))

array([[44, 22],
       [ 6,  8],
       [39, 12],
       [48, 33]])

In [105]:
np.hstack((K,M))

array([[44, 22, 39, 12],
       [ 6,  8, 48, 33]])

In [106]:
np.concatenate([K, M], axis = 0)

array([[44, 22],
       [ 6,  8],
       [39, 12],
       [48, 33]])

In [107]:
np.concatenate([K, M], axis = 1)

array([[44, 22, 39, 12],
       [ 6,  8, 48, 33]])