# Chapter 4 - Numpy Basics

Numpy library stands for Numeric Python as it was especially built for working with numeric data. 

As many of its algorithms are C-based, many computations can be even 100 times faster using Numpy than using the classic built-in Python sequences.

Numpy is highly used for data analysis, statistics, etc. However, more and more data scientists are choosing pandas as its main tool.

One of its mayor advantages is its **n-dimensional array object** which can save you, for example, many for loops.

**Percibo que la mayor ventaja de Numpy es el poder hacer operaciones numéricas entre grandes cantidades de datos de manera rápida y sencilla, evitando el uso de for loops**

## The Numpy array object

Can be seen as a linear algebra matrix. It is fast, flexible and mutable:

In [2]:
import numpy as np 
data=np.array([[3,4,8,11],[4,5,6,1],[9,6,4,1]])

All of its elements must be the same type.

It's possible to sum them or to multiply them by a scalar as we do with matrix.

Some of its more important atributes are:

In [7]:
data.shape 

(3, 4)

In [8]:
data.ndim

2

In [9]:
data.dtype

dtype('int64')

**Creating ndarrays**

You can use the next np functions to create arrays:

*np.arange(), np.ones(), np.zeros(), np.empty(), np.eye()*

**dtype**

As said before, all of the elements of a Numpy array must be the same type. The array's type is saved as the attribute *dtype* and you can explicitly define it like this:

In [5]:
x=np.array([45,68,98],dtype="int64")
print(x)

[45 68 98]


Arrays also have a method call *astype* which allows you to change its type:

In [6]:
x.astype(np.float64)

array([45., 68., 98.])

**Array's arithmetics**

The same used in linear algebra.

**Array's indexing and slicing**

Pretty similar to the one used in lists and tuples. The main distinction is that data is not copied then all changes done using indexing will affect the original array:

In [7]:
arr=np.array([65,98,12,9])
arr[1]=0
print(arr)

[65  0 12  9]


**Boolean indexing**

This tool is kind of a query. The boolean array can be use as an index to look for values:

In [8]:
names=np.array(["Carolina","Samantha","Catalina","Clara","Carolina"])
consumo=np.array([43,56,98,54,67])
print(consumo[names=="Carolina"])

[43 67]


**Fancy indexing**

You index with a list of integers, this method gives you great precision.


In [11]:
m=np.array([[34,76,903],[45,61,90],[28,41,9]])
print(m)

[[ 34  76 903]
 [ 45  61  90]
 [ 28  41   9]]


In [12]:
m[[0,2]]

array([[ 34,  76, 903],
       [ 28,  41,   9]])

In [13]:
m[[1,1,2],[0,1,2]]

array([45, 61,  9])

**Transposing**

We use the attribute *T*

In [14]:
m.T

array([[ 34,  45,  28],
       [ 76,  61,  41],
       [903,  90,   9]])

**Pseudorandom number generation with Numpy**

The numpy.random module supplements the built-in Python random module. It's just much faster.

In [2]:
import numpy as np
arr=np.random.standard_normal((4,4))
print(arr)

[[ 0.57527852  0.40404982  1.92311247 -0.15691741]
 [-1.01561983 -0.02586233 -0.02514502 -0.97243448]
 [ 0.67225984  1.35728598  0.25697882 -1.28131362]
 [ 0.1250978   0.74039677 -0.94063157  0.20081559]]


**Universal Functions: Element-wise array functions**

Help you avoiding lists and loops.

**Function where**

Used to express easy and fast conditional logic:

In [6]:
arr2=np.where(arr>=0,1,-1)
print(arr2)

[[ 1  1  1 -1]
 [-1 -1 -1  1]
 [ 1 -1 -1 -1]
 [-1 -1  1 -1]]


**Math and statistical methods**

Can be found as array's methods or as high-level Numpy functions.

In [8]:
print(arr.sum())
print(np.sum(arr))

-3.685916831477228
-3.685916831477228


You add the axis argument to apply them by row or by column. The result is a lower dimensional array.

In [11]:
arr.sum(axis=0)

array([ 1.05604162, -2.34307127, -1.10960832, -1.28927886])

Previous code means: compute sum accross the rows

**Methods for Boolean Arrays**

*.sum()* to calculate the number of True values.

*.any()* to check if there is at least one True value.

*.all()* to check if all are True values

**Sorting arrays**

The *.sort()* method can be used to sort in place. (Can be used by axis)
The high-level Numpy function *numpy.sort()* creates a copy (As the Python built-in function *sorted()*)

**Set logic for arrays**

Just available for one-dimensional arrays.

*.unique()* which deletes duplicates.

*.in1d()* allows you to check if the elements of an array are in another array.

**Saving a loading arrays**

Can be done with *save()* and *load()*

**In this context, vectorized computation means to use arrays on the code and its calculations*

**Distributions**

Uniformly or evenly distribution.shape = square

Normal distribution.shape = Gauss bell