## Numpy

Numpy is a numerical library that makes it easy to work with big arrays and matrices.

It can be used to make fast arithmetic operations with matrixes. Pandas and Numpy are usually used together, as Pandas builds on NumPy functionality to work with DataFrames.

Since Pandas is designed to work with Numpy, almost any Numpy function will work with Pandas Series and DataFrames, lets run some examples.

In [1]:
import numpy as np
import pandas as pd

In [8]:
# Create a container for a pseudo random generator
rng = np.random.RandomState()
# Create a Pandas Series from our random generator
series = pd.Series(rng.randint(0, 10, 4))
series

0    7
1    6
2    3
3    8
dtype: int32

In [9]:
# Create a Pandas Dataframe with our Numpy random generator
# Integer numbers between 0 and 10, 3 rows and 4 columns
df = pd.DataFrame(rng.randint(0, 10, (3, 4)),
                 columns=['A', 'B', 'C', 'D'])
df

Unnamed: 0,A,B,C,D
0,7,5,0,2
1,1,2,7,1
2,9,4,1,3


We can use a Numpy function on these Pandas objects and still keep our indexes for order

In [10]:
# Calculate e^x, where x is every element of our array
np.exp(series)

0    1096.633158
1     403.428793
2      20.085537
3    2980.957987
dtype: float64

In [16]:
# Calculate sin() of every value in the DataFrame multiplied by pi and divided by 4
np.sin(df * np.pi / 4)

Unnamed: 0,A,B,C,D
0,-0.707107,-0.7071068,0.0,1.0
1,0.707107,1.0,-0.707107,0.707107
2,0.707107,1.224647e-16,0.707107,0.707107


### The Numpy Array
Numpy provides mutidimentional arrays, with high efficiency and designed for scientific calculations.

An array is similar to a list in Python and can be created from a list.

Array have useful atributes we can use. Lets start by defining three random arrays, a single dimension, a two dimension and a tri dimensional array. We will use Numpy random number generator.

In [1]:
# Import our Numpy package
import numpy as np
np.random.seed(0) #this will generate the same random arrays every time

x1 = np.random.randint(10, size=6) # one dimension
x2 = np.random.randint(10, size=(3, 4)) # two dimensions
x3 = np.random.randint(10, size=(3, 4, 5)) # tri dimensional array

All arrays have the `ndim` (number of dimensions) attribute, `shape` the size of each dimension, and `size` the total array size

In [3]:
print("x3 ndim:", x3.ndim)
print("x3 sahpe:", x3.shape)
print("x3 size:", x3.size)

x3 ndim: 3
x3 sahpe: (3, 4, 5)
x3 size: 60


Other attributes for arrays are `itemsize` shows the byte size of every element in the array, and `nbytes` shows the total bytes size of the array:

In [4]:
print("x3 itemsize:", x3.itemsize, "bytes")
print("x3 nbytes", x3.nbytes, "bytes")

x3 itemsize: 4 bytes
x3 nbytes 240 bytes


#### Array indexing
Similar to Python lists we can access individual elements in the array. For single dimensional arrays we can use the indexing format using `[]`

In [5]:
x1

array([5, 0, 3, 3, 7, 9])

In [6]:
x1[0]

5

In [7]:
x1[4]

7

In [8]:
x1[-6]

5

We can use a similar logic for multi dimensional arrays

In [9]:
x2

array([[3, 5, 2, 4],
       [7, 6, 8, 8],
       [1, 6, 7, 7]])

In [10]:
x2[0, 3] # Access row 0, column index 3

4

In [11]:
x2[2, -1] # Access row index 2, column index -1

7

We can use the same logic to change values using array indexing

In [12]:
x2[2, -1] = 2

#### Sub arrays (slicing)
We can also use a similar syntax to Python list slicing to access only parts of the array. The syntax goes as follows:

`x[start:stop:step]`
Where default start value = 0, stop is the non inclusive stop index, and step the number of items we want to count

In [15]:
x = np.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [16]:
x[:5] # first five elements

array([0, 1, 2, 3, 4])

In [17]:
x[5:] # elements starting from the fifth index

array([5, 6, 7, 8, 9])

In [18]:
x[4:7] # elements between 4 and non inclusive 7

array([4, 5, 6])

In [19]:
x[::2] # all elements but step size 2

array([0, 2, 4, 6, 8])

In [20]:
x[1::2] # elements every two steps, starting from index 1

array([1, 3, 5, 7, 9])

We can also use negative step index. In this case the default start and stop values are inverted. This makes it a convenient way to invert an array.

In [21]:
x[::-1] # reverse the array

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

In [22]:
x[5::-2] # inverted array starting from the fifth index at minus two step interval

array([5, 3, 1])

We can also select multi dimension sectors of an array. The syntax is similar, with every sector separated by a comma

In [23]:
x2

array([[3, 5, 2, 4],
       [7, 6, 8, 8],
       [1, 6, 7, 2]])

In [24]:
x2[:2, :3] # rows with index 0 and 1, and columns with index 0, 1 and 2

array([[3, 5, 2],
       [7, 6, 8]])

In [25]:
x2[:3, ::2] # all rows but step size 2

array([[3, 2],
       [7, 8],
       [1, 7]])

#### Reshaping arrays
Another useful operation is reshaping, we can use the `reshape()` method. If we wanted to reshape an array by a 3 x 3 array we can use the following syntax

In [26]:
grid = np.arange(1, 10).reshape((3, 3))
grid

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

For this to work the size of the initial array must match the reshaped array.

Another common form of reshaping is converting an unidimensional array of rows or columns. We can either use `reshape` or the `nexaxis` keyword inside a slicing operation

In [27]:
x = np.array([1, 2, 3])

# row vector via reshape
x.reshape(1, 3)

array([[1, 2, 3]])

In [28]:
# row vector via newaxis
x[np.newaxis, :]

array([[1, 2, 3]])

In [29]:
# column vector via reshape
x.reshape((3, 1))

array([[1],
       [2],
       [3]])

In [30]:
# column vector via newaxis
x[:, np.newaxis]

array([[1],
       [2],
       [3]])

#### Array concatenation and division
We can also combine multiple arrays into one and vice versa

Concatenation can be achieved using the `np.concatenate`, `np.vstack` and `np.hstack`. `np.concatenate` takes a list of arrays as its first argument

In [31]:
x = np.array([1, 2, 3]) # create an array from a list
y = np.array([3, 2, 1]) # create a second array from a list
np.concatenate([x, y]) # array list to concatenate

array([1, 2, 3, 3, 2, 1])

In [32]:
# we can concatenate more than one array at a time
z = [99, 88, 77]
print(np.concatenate([x, y, z]))

[ 1  2  3  3  2  1 99 88 77]


We can use the same logic for multidimensional arrays

In [33]:
grid = np.array([[1, 2, 3],
                [4, 5, 6]])
np.concatenate([grid, grid]) # concatenate on the first axis. Rows

array([[1, 2, 3],
       [4, 5, 6],
       [1, 2, 3],
       [4, 5, 6]])

In [34]:
grid

array([[1, 2, 3],
       [4, 5, 6]])

In [35]:
# Concatenate on the second axis (zero indexed)
np.concatenate([grid, grid], axis=1) # concatenate on columns

array([[1, 2, 3, 1, 2, 3],
       [4, 5, 6, 4, 5, 6]])

To work with diferent sized arrays it may be easier to work with `np.vstack` vertical stack, and `np.hstack` horizontal stack

In [36]:
x = np.array([1, 2, 3])
grid = np.array([[9, 8, 7],
                [6, 5, 4]])
# stack vertically
np.vstack([x, grid])

array([[1, 2, 3],
       [9, 8, 7],
       [6, 5, 4]])

In [37]:
# stack horizontally
y = np.array([[99],
              [99]])
np.hstack([grid, y])

array([[ 9,  8,  7, 99],
       [ 6,  5,  4, 99]])

#### Split arrays
Finally we can split arrays using the `np.split`, `np.hplit` and `np.vsplit`. For each of these we need to pass a list of indexes that divide/split our array

In [38]:
x = [1, 2, 3, 99, 88, 77, 4, 5, 6]
x1, x2, x3 = np.split(x, [3, 5]) # split at index 3 and 5, non inclusive
print(x1, x2, x3)

[1 2 3] [99 88] [77  4  5  6]


Observe that N division/split points lead to N + 1 sub arrays. Similarly `np.hsplit` and `np.vsplit` can be used

In [39]:
grid = np.arange(16).reshape((4, 4))

In [40]:
grid

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [41]:
upper, lower = np.vsplit(grid, [2])
print(upper)
print(lower)

[[0 1 2 3]
 [4 5 6 7]]
[[ 8  9 10 11]
 [12 13 14 15]]


In [42]:
left, right = np.hsplit(grid, [2])
print(left)
print(right)

[[ 0  1]
 [ 4  5]
 [ 8  9]
 [12 13]]
[[ 2  3]
 [ 6  7]
 [10 11]
 [14 15]]
