# Introduction to Numpy - multidimensional data arrays

#### Numpy is a Linear Algebra library for Python. Almost all of the data science libraries in PyData Ecosystem rely on NumPy as one of their main building blocks: 
- Numpy Introduces objects for multidimensional arrays and matrices, as well as functions that allow to easily perform advanced mathematical and statistical operations on those objects 
- Provides vectorization of mathematical operations on arrays and matrices which significantly improves the performance 
- In the `Numpy` package the terminology used for vectors, matrices, and higher-dimensional data sets is **array**
- Numpy array is alternative to Python List and the calculations over entire arrays so **Easy and Fast**
- Many other python libraries are built on Numpy http://www.numpy.org/


#### NumPy Arrays are the main way we will use and come in two ways:

- vectors: 1 dimensional

- matrices: 2 dimentional

##  Core Python vs. Numpy:
- In _Core_ Python (Python without external libraries), working with collections of numbers requires the use of loops, list comprehensions, or map/filter/reduce functions. 
- In Numpy, collections of numbers are the default and easy to work with since it is a package that provides high-performance vector, matrix, and higher-dimensional data structures for Python. 

### Installation Instructions
It is highly recommended you install Python using the Anaconda distribution to make sure all underlying dependencies (such as Linear Algebra libraries) all sync up with the use of a conda install. If you have Anaconda, install Numpy by going to your terminal or command prompt and typing:
```markdown
conda install numpy

pip install numpy
```
If you do not have Anaconda and can not install it, please refer to Numpy's official documentation on various installation instructions.

### Using `Numpy`
Once you've installed Numpy you can import it as a library:

In [1]:
from numpy import *  # Less common way to import

In [2]:
import numpy as np    # <= "np" is the standard abbreviation for numpy - this is the common way to import

In [3]:
height = [1.81, 1.79, 1.90]
weight = [65.4, 34,   63.6]

In [4]:
Ratio = weight / height ** 2   # Don't work

TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int'

In [100]:
# Create numpy array based on list
np_height = np.array(height)
np_height

array([1.81, 1.79, 1.9 ])

In [101]:
np_weight = np.array(weight)
np_weight

array([65.4, 34. , 63.6])

In [102]:
Ratio = np_weight / np_height ** 2   # Element-wise calculations work in Numpy
Ratio

array([19.9627606 , 10.61140414, 17.61772853])

In [103]:
# Numpy Subsetting
Ratio > 11

array([ True, False,  True])

In [104]:
Ratio[Ratio > 11]

array([19.9627606 , 17.61772853])

In [105]:
Ratio[1]

10.611404138447615

### `Numpy` is great for doing vector arithmetic. If you compare its functionality with regular Python lists, however, some things have changed.

- 1) First of all, **`Numpy` arrays cannot contain elements with different types.** If you try to build such a list, some of the elements' types are changed to end up with a homogeneous list. This is known as **type coercion**.

- 2) Second, the typical arithmetic operators, such as `+, -, * `and `/` have **a different meaning for regular `Python lists` and `Numpy arrays`.**

In [106]:
# NumPy arrays: contain only one type
np.array([1.0, "is", True])

array(['1.0', 'is', 'True'], dtype='<U32')

In [107]:
python_list = [1, 2, 3]
numpy_array = np.array([1, 2, 3])

In [108]:
python_list + python_list

[1, 2, 3, 1, 2, 3]

In [109]:
# Different types: different behavior
numpy_array + numpy_array

array([2, 4, 6])

In [110]:
numpy_array2 = np.array([1, 2, 3,4])

In [111]:
numpy_array + numpy_array2

ValueError: operands could not be broadcast together with shapes (3,) (4,) 

In [112]:
# Mixed types
python_list + numpy_array

array([2, 4, 6])

### Numerical Programming in Python

In the early 90s, some programmers wanted to use Python for their scientific work, but couldn't do so for several reasons:
1. Python is extremely slow, compared to faster languages. Numpy mitigates this problem by compiling important code in C, but exposing a Python API
2. Numeric code, which can consist of multi-dimensional matrices, had no counterpart in Python. As we have seen above, Numpy can handle an array of numbers just fine (later we will see examples of multiple dimensions)
3. Python, at the time, had no collection of high quality functions to operate on arrays or matrices of numbers. Numpy is that collection of functions of numbers.

#### Quick performance comparison between core Python and Numpy

In [113]:
core_python = list(range(0,10000))
numpy_python = np.arange(0,10000)

In [114]:
%timeit sum(core_python)
%timeit np.sum(numpy_python)

447 µs ± 7.09 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
6.09 µs ± 118 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


#### Note: Numpy is an order of magnitude much faster than core Python!

In [115]:
# a vector: the argument to the array function is a Python list
v = np.array([1,2,3,4])
v

array([1, 2, 3, 4])

In [21]:
# a matrix: the argument to the array function is a nested Python list
M = np.array([[1, 2], [3, 4]])
M.shape

(2, 2)

The v and M objects are both of the type ndarray that the numpy module provides.

In [22]:
type(v), type(M)

(numpy.ndarray, numpy.ndarray)

The difference between the v and M arrays is only their shapes. We can get information about the shape of an array by using the ndarray.shape property.

In [23]:
v.shape, M.shape

((4,), (2, 2))

In [24]:
shape(v), shape(M)

((4,), (2, 2))

The number of elements in the array is available through the **ndarray.size** property:

In [25]:
v.size, M.size

(4, 4)

Equivalently, we could use the function numpy.shape() and numpy.size()

In [26]:
np.shape(M)

(2, 2)

In [27]:
np.size(M)

4

### Summary:

There are several reasons to use in `Numpy arrays` instead of `Python lists`:

- Python lists are very general. They can contain any kind of object. They are `dynamically typed`. They do not support mathematical functions such as matrix and dot multiplications, etc. Implementing such functions for Python lists would not be very efficient because of the dynamic typing.

- Numpy arrays are `statically typed and homogeneous`. The type of the elements is determined when the array is created.

- Numpy arrays are `memory efficient`.

- Because of the static typing, fast implementation of mathematical functions such as multiplication and addition of numpy arrays can be implemented in a compiled language (C and Fortran is used).

Using the dtype (data type) property of an ndarray, we can see what type the data of an array has:

In [28]:
M.dtype

dtype('int32')

We get an error if we try to assign a value of the wrong type to an element in a Numpy array:

In [29]:
M

array([[1, 2],
       [3, 4]])

In [30]:
M[0,0] = "hello"

ValueError: invalid literal for int() with base 10: 'hello'

In [31]:
M

array([[1, 2],
       [3, 4]])

In [32]:
M[0,0] = 0
M

array([[0, 2],
       [3, 4]])

If we want, we can explicitly define the type of the array data when we create it, using the dtype keyword argument: 

In [1]:
M = array([[1, 2], [3, 4]], dtype=complex)
M

NameError: name 'array' is not defined

In [34]:
M = array([[0, 2], [3, 4]], dtype=bool)
M

array([[False,  True],
       [ True,  True]])

Common data types that can be used with dtype are: **int, float, complex, bool, object,** etc.
We can also explicitly define the bit size of the data types, for example: `int64`, `int16`, `float128`, `complex128`.

Numpy has many built-in functions and capabilities. We will focus on some of the most important aspects of Numpy: `vectors`, `arrays`, `matrices`, and `numbers` generation. Let's start by discussing arrays.
## numpy arrays
Numpy arrays are the main way we will use Numpy throughout the course. **Numpy arrays essentially come in two types**: `vectors` and `matrices`. Vectors are 1-dimensional arrays and matrices are 2-dimensional.

### Creating numpy arrays
From a Python list, we can create an array by directly converting a list or list of lists:

1) Cast a list to np.array as vector

In [35]:
my_list = [10,20,30]
my_list

[10, 20, 30]

In [36]:
np.array(my_list)

array([10, 20, 30])

2) Cast a list of lists to np.array as matrix

In [37]:
my_matrix = [[10,20,30],[40,50,60],[70,80,90]]
my_matrix

[[10, 20, 30], [40, 50, 60], [70, 80, 90]]

In [38]:
np.array(my_matrix)

array([[10, 20, 30],
       [40, 50, 60],
       [70, 80, 90]])

## Using array-generating functions

For larger arrays it is inpractical to initialize the data manually, using explicit Python lists. Instead we can use one of the many built-in functions in `Numpy` that generate arrays of different forms. Some of the more common are:

### `arange()` function

Return evenly spaced values within a given interval - **Only includes the lower end**.

In [39]:
np.arange(0,6)

array([0, 1, 2, 3, 4, 5])

In [40]:
# create a range
x = arange(0, 10, 3) # arguments: start, stop, step
x

array([0, 3, 6, 9])

In [41]:
x = arange(-1, 1, 0.1)
x

array([-1.00000000e+00, -9.00000000e-01, -8.00000000e-01, -7.00000000e-01,
       -6.00000000e-01, -5.00000000e-01, -4.00000000e-01, -3.00000000e-01,
       -2.00000000e-01, -1.00000000e-01, -2.22044605e-16,  1.00000000e-01,
        2.00000000e-01,  3.00000000e-01,  4.00000000e-01,  5.00000000e-01,
        6.00000000e-01,  7.00000000e-01,  8.00000000e-01,  9.00000000e-01])

In [42]:
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [43]:
np.zeros((4,4))  # pass a tuple for two dimensional array

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [44]:
# Reshape
m = np.array([[1,2,3],[4,5,6]])
m

array([[1, 2, 3],
       [4, 5, 6]])

In [45]:
m.shape

(2, 3)

In [46]:
n = np.arange(0,30,3)
n

array([ 0,  3,  6,  9, 12, 15, 18, 21, 24, 27])

In [47]:
n=n.reshape(2,5)
n

array([[ 0,  3,  6,  9, 12],
       [15, 18, 21, 24, 27]])

In [48]:
o=np.linspace(0,4,9)
o

array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. ])

In [49]:
o.resize(3,3)
o

array([[0. , 0.5, 1. ],
       [1.5, 2. , 2.5],
       [3. , 3.5, 4. ]])

In [50]:
np.ones(10)

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

In [51]:
np.ones((3,5))

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

### `linspace()` funtion 
Return evenly spaced numbers over a specified interval - **both end points ARE included**

In [52]:
linspace(0, 10, 10)

array([ 0.        ,  1.11111111,  2.22222222,  3.33333333,  4.44444444,
        5.55555556,  6.66666667,  7.77777778,  8.88888889, 10.        ])

In [53]:
# Exercise:
linspace(-5,10,5)

array([-5.  , -1.25,  2.5 ,  6.25, 10.  ])

### `diag()` function

In [54]:
# a diagonal matrix
diag([1,2,3])

array([[1, 0, 0],
       [0, 2, 0],
       [0, 0, 3]])

In [55]:
# diagonal with offset from the main diagonal
diag([1,2,3], k=0) 

array([[1, 0, 0],
       [0, 2, 0],
       [0, 0, 3]])

In [56]:
diag([1,2,3], k=-1) 

array([[0, 0, 0, 0],
       [1, 0, 0, 0],
       [0, 2, 0, 0],
       [0, 0, 3, 0]])

### `zeros()` and `ones()` functions
Generate arrays of zeros or ones

In [57]:
np.zeros(2)

array([0., 0.])

In [58]:
np.zeros((3,4))

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [59]:
np.ones(4)

array([1., 1., 1., 1.])

In [60]:
np.ones((4,3))

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

### `eye()` function
Creates an identity matrix

In [61]:
np.eye(8)

array([[1., 0., 0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 0., 0., 1.]])

In [62]:
# Repeat
np.array([1,2,3]*3)

array([1, 2, 3, 1, 2, 3, 1, 2, 3])

In [63]:
np.repeat([1,2,3],3)

array([1, 1, 1, 2, 2, 2, 3, 3, 3])

In [64]:
p=np.ones([2,3],int)
p

array([[1, 1, 1],
       [1, 1, 1]])

In [65]:
np.vstack([p,2*p])

array([[1, 1, 1],
       [1, 1, 1],
       [2, 2, 2],
       [2, 2, 2]])

In [66]:
np.hstack([p,2*p])

array([[1, 1, 1, 2, 2, 2],
       [1, 1, 1, 2, 2, 2]])

In [67]:
p.T

array([[1, 1],
       [1, 1],
       [1, 1]])

In [68]:
p.T.shape

(3, 2)

In [69]:
p.dtype

dtype('int32')

In [70]:
p.astype('f')

array([[1., 1., 1.],
       [1., 1., 1.]], dtype=float32)

In [71]:
a=np.array([1,2,3,4,5,6])

In [72]:
a.max()

6

In [73]:
a.min()

1

In [74]:
a.sum()

21

In [75]:
a.mean()

3.5

In [76]:
a.std()

1.707825127659933

In [77]:
# return index location of min or max
a.argmax()

5

In [78]:
a.argmin()

0

## Random 

Numpy also has lots of ways to create random number arrays:

### `rand()` function
Create an array of the given shape and populate it with
random samples from a uniform distribution
over ``[0, 1)``.

In [79]:
from numpy import random
np.random.seed(123)
# uniform random numbers in [0,1]
np.random.rand(6,5)  # two dimensional array

array([[0.69646919, 0.28613933, 0.22685145, 0.55131477, 0.71946897],
       [0.42310646, 0.9807642 , 0.68482974, 0.4809319 , 0.39211752],
       [0.34317802, 0.72904971, 0.43857224, 0.0596779 , 0.39804426],
       [0.73799541, 0.18249173, 0.17545176, 0.53155137, 0.53182759],
       [0.63440096, 0.84943179, 0.72445532, 0.61102351, 0.72244338],
       [0.32295891, 0.36178866, 0.22826323, 0.29371405, 0.63097612]])

In [80]:
np.random.rand(4)   # one dimensional array from uniform distribution

array([0.09210494, 0.43370117, 0.43086276, 0.4936851 ])

In [81]:
np.random.rand(6,5)

array([[0.42583029, 0.31226122, 0.42635131, 0.89338916, 0.94416002],
       [0.50183668, 0.62395295, 0.1156184 , 0.31728548, 0.41482621],
       [0.86630916, 0.25045537, 0.48303426, 0.98555979, 0.51948512],
       [0.61289453, 0.12062867, 0.8263408 , 0.60306013, 0.54506801],
       [0.34276383, 0.30412079, 0.41702221, 0.68130077, 0.87545684],
       [0.51042234, 0.66931378, 0.58593655, 0.6249035 , 0.67468905]])

### `randn()` function
Return a sample (or samples) from the "standard normal" distribution. Unlike rand which is uniform:

In [82]:
# standard normal distributed random numbers
np.random.randn(3)

array([-0.77270871,  0.79486267,  0.31427199])

In [83]:
np.random.randn(5,5)

array([[-1.32626546,  1.41729905,  0.80723653,  0.04549008, -0.23309206],
       [-1.19830114,  0.19952407,  0.46843912, -0.83115498,  1.16220405],
       [-1.09720305, -2.12310035,  1.03972709, -0.40336604, -0.12602959],
       [-0.83751672, -1.60596276,  1.25523737, -0.68886898,  1.66095249],
       [ 0.80730819, -0.31475815, -1.0859024 , -0.73246199, -1.21252313]])

### `randint()` function
Return random integers from `low (inclusive)` to high (exclusive).

In [116]:
from numpy.random import randint
np.random.seed(111)
randint(1,100)  # Return random integers from `low` (inclusive) to `high` (exclusive).

85

In [117]:
np.random.seed(112)
randint(1,100,100)

array([44, 91, 66, 70, 45, 40, 41, 21, 42, 43, 49, 30, 69, 64,  7, 30, 49,
       52,  6, 29,  9,  4, 27, 75, 75, 38, 19, 53, 79, 49, 62, 43, 49, 97,
       41, 99, 60, 78, 15, 79, 25, 67,  2, 96,  4, 31, 53, 35, 36, 88, 33,
       78, 40, 35, 56, 74, 75,  8,  7, 44, 26, 75, 22, 27, 99, 87, 14, 18,
       90, 35, 53, 54, 53,  9, 40, 53, 40, 97, 44, 46,  6, 75, 64, 37, 76,
       28, 93, 45, 89, 35, 15, 72, 77, 98, 98, 80, 84, 80, 27, 10])

### Iterating Over Array

In [86]:
x=np.random.randint(0,10,(3,4))
x

array([[2, 5, 5, 3],
       [9, 6, 1, 1],
       [6, 3, 5, 5]])

In [87]:
for row in x:
    print(row)

[2 5 5 3]
[9 6 1 1]
[6 3 5 5]


In [88]:
for i in range(len(x)):
    print(x[i])

[2 5 5 3]
[9 6 1 1]
[6 3 5 5]


In [89]:
# enumerate gives row and index of the rows
for i, row in enumerate(x):
    print('row', i, 'is', row )

row 0 is [2 5 5 3]
row 1 is [9 6 1 1]
row 2 is [6 3 5 5]


In [90]:
x2=x**2
x2

array([[ 4, 25, 25,  9],
       [81, 36,  1,  1],
       [36,  9, 25, 25]], dtype=int32)

In [91]:
# zip allows iterate through both arrays
for i, j in zip(x,x2):
    print(i, '+', j, '=', i+j)

[2 5 5 3] + [ 4 25 25  9] = [ 6 30 30 12]
[9 6 1 1] + [81 36  1  1] = [90 42  2  2]
[6 3 5 5] + [36  9 25 25] = [42 12 30 30]


### More properties of the numpy arrays

In [92]:
x=np.arange(0,36)
x.reshape(6,6)
x[::7]

array([ 0,  7, 14, 21, 28, 35])

In [93]:
twentyfive_vals=np.random.randn(5,5)
twentyfive_vals

array([[-0.02856808,  0.83223241, -0.33678178, -0.24208949, -1.91390247],
       [-0.01010871,  1.50859366,  0.71465326, -0.7022137 , -0.07004497],
       [-0.38296827,  0.34599716, -0.20110946, -0.35874516, -0.19939992],
       [-0.27687587,  0.61349559, -1.04863587, -0.08338839, -1.26965402],
       [-1.05250444,  0.66382866, -1.28564276, -0.45574653, -0.58132363]])

In [94]:
twentyfive_vals > 0.1

array([[False,  True, False, False, False],
       [False,  True,  True, False, False],
       [False,  True, False, False, False],
       [False,  True, False, False, False],
       [False,  True, False, False, False]])

In [95]:
cond1=twentyfive_vals[twentyfive_vals > 0.1]
cond1

array([0.83223241, 1.50859366, 0.71465326, 0.34599716, 0.61349559,
       0.66382866])

In [96]:
cond2=cond1[cond1<0.5]
cond2

array([0.34599716])

In [97]:
M.nbytes # number of bytes

4

In [98]:
M.ndim # number of dimensions

2

## Further reading

- http://numpy.scipy.org
- http://scipy.org/Tentative_NumPy_Tutorial
- http://scipy.org/NumPy_for_Matlab_Users - A Numpy guide for MATLAB users.

### Exercise

Generate a random array of size 25

In [71]:
twentyfive_vals = np.random.random((25,))
twentyfive_vals

array([0.1571934 , 0.12173568, 0.40006612, 0.45715492, 0.14028059,
       0.44422354, 0.70865683, 0.33593724, 0.40838247, 0.41093462,
       0.3678663 , 0.71647983, 0.0821403 , 0.72357782, 0.44074518,
       0.43527578, 0.11612512, 0.07675086, 0.68004116, 0.41726872,
       0.88568784, 0.85259186, 0.72833918, 0.94214444, 0.13864917])

### Exercise

From the above array, find all values less than 0.5 _and_ greater than 0.1

In [72]:
twentyfive_vals[(twentyfive_vals > 0.1) & (twentyfive_vals < 0.5)]

array([0.1571934 , 0.12173568, 0.40006612, 0.45715492, 0.14028059,
       0.44422354, 0.33593724, 0.40838247, 0.41093462, 0.3678663 ,
       0.44074518, 0.43527578, 0.11612512, 0.41726872, 0.13864917])

In [73]:
twentyfive_vals[(twentyfive_vals > 0.1) | (twentyfive_vals < 0.5)]

array([0.1571934 , 0.12173568, 0.40006612, 0.45715492, 0.14028059,
       0.44422354, 0.70865683, 0.33593724, 0.40838247, 0.41093462,
       0.3678663 , 0.71647983, 0.0821403 , 0.72357782, 0.44074518,
       0.43527578, 0.11612512, 0.07675086, 0.68004116, 0.41726872,
       0.88568784, 0.85259186, 0.72833918, 0.94214444, 0.13864917])

In [74]:
type(twentyfive_vals)

numpy.ndarray

In [75]:
type(0.1)

float

From the above array, find all values greater than 0.5 and less than 0.7

In [76]:
twentyfive_vals[(twentyfive_vals > 0.5) & (twentyfive_vals < 0.7)]

array([0.68004116])

In [77]:
twentyfive_vals[(twentyfive_vals < 0.5) | (twentyfive_vals > 0.7)]

array([0.1571934 , 0.12173568, 0.40006612, 0.45715492, 0.14028059,
       0.44422354, 0.70865683, 0.33593724, 0.40838247, 0.41093462,
       0.3678663 , 0.71647983, 0.0821403 , 0.72357782, 0.44074518,
       0.43527578, 0.11612512, 0.07675086, 0.41726872, 0.88568784,
       0.85259186, 0.72833918, 0.94214444, 0.13864917])

#### Note: The course materials are developed mainly based on personal experience and contributions from the Python learning community
Referred book: Learning Python, 5ht Edition by Mark Lutz