# Numpy (numerical arrays for numeric computation)

Numpy is the basic *Python* module for scientific computing in *Python*.
Its most used object is the multidimensional array. 
These objects can have any number of dimensions with an efficient storage in the computer's RAM which makes data easy to handle and pass to other libraries. Furthermore, most ot numpy is implemented in C which makes it efficient and fast.


## Multidimensional arrays

This is how `numpy` is usually imported and used to generate an `numpy array`

In [1]:
import numpy as np

In [2]:
data = [1, 10 , 2, 3, 8.0] # data is a list
a = np.array(data) # a is now a numpy array

In [11]:
type(data[0])

int

In [4]:
a

array([ 1., 10.,  2.,  3.,  8.])

This gives the shape of the array

In [5]:
a.shape

(5,)

the number of dimensions

In [6]:
a.ndim

1

the number of elements

In [7]:
a.size

5

the number of bytes

In [8]:
a.nbytes

40

The attribute `dtype` describes the element data type

In [9]:
a.dtype

dtype('float64')

## Creating new arrays

Arrays can be created with nested lists

In [12]:
data = [[0.0, 2.0, 4.0, 6.0], [1.0, 3.0, 5.0, 7.0]]
b = np.array(data)

In [13]:
b

array([[0., 2., 4., 6.],
       [1., 3., 5., 7.]])

In [14]:
b.shape, b.ndim, b.size, b.nbytes

((2, 4), 2, 8, 64)

The function `arange` is similar to `range` but it creates an array and not a list

In [15]:
c = np.arange(10) 
c

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

the function `linspace` allows for the creation of equally spaced points

In [22]:
e = np.linspace(0.0, 10, 21) # 11 points
e

array([ 0. ,  0.5,  1. ,  1.5,  2. ,  2.5,  3. ,  3.5,  4. ,  4.5,  5. ,
        5.5,  6. ,  6.5,  7. ,  7.5,  8. ,  8.5,  9. ,  9.5, 10. ])

Similar to matlab, there are also functions like `empty`, `zeros` and `ones`.

In [27]:
np.empty?
np.empty((4,4), dtype = object, order= 'F')

array([[None, None, None, None],
       [None, None, None, None],
       [None, None, None, None],
       [None, None, None, None]], dtype=object)

In [28]:
np.zeros((3,3))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [29]:
np.ones((3,3))

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

## dtype

`dtype` (for data type) is the attribute with the data type for each element. 
This data type is usually implicit but can be enforced at the moment of creating the array

For instance, this is implicitly defined as an integer `dtype` 

In [30]:
a = np.array([0, 1, 2, 3])

In [31]:
a, a.dtype

(array([0, 1, 2, 3]), dtype('int64'))

But you could force the creation of a complex array

In [32]:
b = np.zeros((2,2), dtype=np.complex64)
b

array([[0.+0.j, 0.+0.j],
       [0.+0.j, 0.+0.j]], dtype=complex64)

or a float array

In [33]:
c = np.arange(0, 10, 2, dtype=np.float)
c

array([0., 2., 4., 6., 8.])

## Operations over arrays

Mathematical operations can be performed over the whole array without running a `for` loop.

For instance

In [34]:
a = np.linspace(0.0, 10.0, 5)
print('a =', a)

b = np.ones(5)
print('b =',b)

a = [ 0.   2.5  5.   7.5 10. ]
b = [1. 1. 1. 1. 1.]


In [35]:
a * 2 # every element in the array is multiplied by 2

array([ 0.,  5., 10., 15., 20.])

In [36]:
a + b   #addition works element by element. The same goes for every operation

array([ 1. ,  3.5,  6. ,  8.5, 11. ])

## Slicing

Slicing also works on arrays, only that this time it can be multidimensional

In [37]:
a = np.random.rand(5, 5)#this creates a two dimensional array of random numbers

In [38]:
print(a)

[[0.47465917 0.89313399 0.53382838 0.36268273 0.15642838]
 [0.01453137 0.31812961 0.90007059 0.76576755 0.32663578]
 [0.76700724 0.17550305 0.15686667 0.49043175 0.17920973]
 [0.13711315 0.09515501 0.64640762 0.77995367 0.27745355]
 [0.63212735 0.74732798 0.35758856 0.31968123 0.71327964]]


Each dimension has its own index

In [39]:
print(a[0,0], a[0,1]) # first index corresponds to file, the second to columns

0.4746591729515499 0.8931339885355809


to extract the values of a whole column the following syntax can be used

In [40]:
a[:,0] # this is the first column

array([0.47465917, 0.01453137, 0.76700724, 0.13711315, 0.63212735])

The last row could be extracted as follows

In [41]:
a[-1,:] #this is the last row

array([0.63212735, 0.74732798, 0.35758856, 0.31968123, 0.71327964])

slicing also works in ranges

In [42]:
a[0:2,0:3]

array([[0.47465917, 0.89313399, 0.53382838],
       [0.01453137, 0.31812961, 0.90007059]])

assignation also works with slicing

In [43]:
a[0:2,0:3] = -4.0

In [44]:
a

array([[-4.        , -4.        , -4.        ,  0.36268273,  0.15642838],
       [-4.        , -4.        , -4.        ,  0.76576755,  0.32663578],
       [ 0.76700724,  0.17550305,  0.15686667,  0.49043175,  0.17920973],
       [ 0.13711315,  0.09515501,  0.64640762,  0.77995367,  0.27745355],
       [ 0.63212735,  0.74732798,  0.35758856,  0.31968123,  0.71327964]])

### Exercise 1.1

Create an bidimensional array of random numbers with shape (4,8).

First, set the last column to `-1` and then set the second row to `2`

In [83]:
b = np.random.rand(4,8)
b[:,-1] = -1
b[:,1] = 2
print(b)

[[ 0.74204569  2.          0.95519555  0.07416258  0.33215016  0.02811318
   0.35971768 -1.        ]
 [ 0.0805176   2.          0.18266815  0.36758774  0.90029855  0.82092258
   0.99295061 -1.        ]
 [ 0.62711034  2.          0.4271149   0.51389525  0.84137909  0.87841082
   0.02486162 -1.        ]
 [ 0.31772356  2.          0.71819452  0.89477172  0.38786584  0.43105308
   0.49735127 -1.        ]]


## Boolean indexing

Arrays can be indexed using other boolean arrays.

For instance consider these two arrays with the age and gender of a set of 10 people

In [45]:
age = np.array([23, 56, 67, 89, 23, 56, 27, 12, 2, 72])
gender= np.array(['m', 'o', 'f', 'f', 'm', 'f', 'm', 'o' ,'m', 'o'])

Suppose that we want to select only the gender of people marked as `'o'` (other).

The following statement gives the new boolean array. Each element tells me whether the condition is True or False 

In [46]:
ii = (gender == 'o')
print(ii)

[False  True False False False False False  True False  True]


Now if we want to have the ages of the people with gender `o` all I have to do is:

In [47]:
age[ii]

array([56, 12, 72])

This logic can be extended to different conditions, for instance, let's select the items with age larger than 10 and smaller than 50

In [48]:
ii = (age > 10) & (age < 50) # & is the symbol for the logical AND
print(age[ii])
print(gender[ii])

[23 23 27 12]
['m' 'm' 'm' 'o']


The following is also a valid syntax

In [49]:
age[age>30]

array([56, 67, 89, 56, 72])

### Exercise 1.2

Using `a=np.random.normal(size=1000)` generate an array of 1000 thousand random numbers generated from a normal (i.e. gaussian) distribution with mean zero and standard deviation of one.

Print the number of elements with values larger than `2.0`. Is this number close to what you expected from the properties of a gaussian distribution?

In [75]:
a = np.random.normal(size=100)
print( a[a>2.0])

[2.61061766 2.75945925]


## Universal functions

Universal functions (or `ufuncs`) are functions that take arrays as inputs and return either arrays or scalar. They are characterized for being fast (implemented in C) and allowing to write simpler python code without using `for` loops. 
Here is a list of [all universal functions in numpy](http://docs.scipy.org/doc/numpy/reference/ufuncs.html#available-ufuncs)

For instance one could generate an array of values

In [76]:
t = np.linspace(0.0, np.pi, 10)
print(t)

[0.         0.34906585 0.6981317  1.04719755 1.3962634  1.74532925
 2.0943951  2.44346095 2.7925268  3.14159265]


and the compute the values of the `sin` function

In [77]:
print(np.sin(t))

[0.00000000e+00 3.42020143e-01 6.42787610e-01 8.66025404e-01
 9.84807753e-01 9.84807753e-01 8.66025404e-01 6.42787610e-01
 3.42020143e-01 1.22464680e-16]


### Exercise 1.3

Using `a=np.random.normal(size=1000)` generate an array of 1000 thousand random numbers generated from a normal (i.e. gaussian) distribution with mean zero and standard deviation of one.

Then using only `ufuncs` on `a` generate a new array `b` that is `-1` wherever `a` is negative and `1` wherever `a` is positive.

In [98]:
a=np.random.normal(size=1000)
b=np.sign(a)
print('Se cambió de {} a {}'.format(a,b))

Se cambió de [ 3.64629050e-01 -2.12231304e+00 -7.34246470e-01  5.46332270e-01
  1.23337376e-01  8.56868343e-01 -1.08149287e+00 -6.41159762e-02
  1.65693591e+00 -4.43457915e-01 -6.36872667e-02  6.03781638e-01
 -1.07545112e+00  5.67614549e-01  2.60911535e+00  4.66441012e-01
 -2.53510453e-01  1.33776324e-01 -4.60147460e-02  5.69635076e-01
  1.34490294e+00  8.10678410e-01 -4.19511008e-01  3.30404762e-01
  6.32746439e-01  8.34830780e-01  2.93556433e-01  3.74932004e-03
  6.17024602e-01 -3.51485925e-01 -2.45170765e+00 -1.06998856e+00
  7.54559726e-01  1.05697493e+00 -9.40610288e-01  5.87090833e-01
  5.73695142e-02 -1.63992526e+00 -5.57170463e-01 -1.32983873e+00
 -1.78809061e+00  9.94393788e-01 -7.71323231e-01 -1.45266109e-01
  1.01208516e+00 -6.79516565e-01 -2.81320976e-01  1.20550308e+00
  3.22807686e-02 -1.86525633e-01 -5.84437441e-02 -1.76817611e-01
  4.02689786e-01 -7.27898077e-01  1.55718405e+00 -6.76905546e-01
  3.02416715e-01  1.28222190e+00 -2.62833188e-02 -1.16350349e+00
 -1.74610167