# NumPy

## So what is NumPy?

**[NumPy](https://numpy.org/)** is a Python library built for numeric computing (linear algebra specifically). It's simple to use, and is the best at what it does. 

Now in many projects, you won't be using every nitty-gritty function of NumPy, which is why we'll mainly be using this as an opportunity to introduce some basic concepts and give you some time to get familiar with notebooks.

You're already probably very familiar with the lists data structure in python. For reasons including better versatility (has plently more methods defined on it as we will shortly see) and more importantly, performance (as it is built as a wrapper on C libraries), we will now switch to using NumPy **n-dimensional-arrays**, or _**'ndarrays'**_ in nearly all cases.

For more comparison on ndarrays vs. lists you cen check out this link: [Why numpy over lists?](http://stackoverflow.com/questions/993984/why-numpy-instead-of-python-lists) 

Another important reason for getting a hang of numpy is that most other libraries in th data science/ml domain are built on top of it (i.e, are dependant on it). So knowledge  on how it works will help a lot when working with them in the future!

**Note: For convinience sake, wherever we will be using the term **arrays**, we are actually referrring to *ndarrays***

In [1]:
# imports

import numpy as np

## Basics of NumPy

### Creating arrays

#### Method 1: from python lists

In [2]:
list_1d = [0, 1, 2, 3, 4]
list_2d = [[0,2], [4,6], [8,10]]

# this works exactly the same if we use tuples instead of lists

In [3]:
list_1d

[0, 1, 2, 3, 4]

In [4]:
list_2d

[[0, 2], [4, 6], [8, 10]]

In [5]:
vector = np.array(list_1d)
matrix = np.array(list_2d)

In [6]:
vector

array([0, 1, 2, 3, 4])

In [7]:
matrix

array([[ 0,  2],
       [ 4,  6],
       [ 8, 10]])

In [8]:
# comparing the types
type(list_2d), type(matrix)

(list, numpy.ndarray)

#### Method(s) 2: built in functions/shortcuts

>```np.arange()```: returns an evenly spaced values within a given interval.
        
arguments: **(start**(included), **end**(excluded) **\[,step]**)

In [9]:
np.arange(0,5)

array([0, 1, 2, 3, 4])

In [10]:
np.arange(0,5,0.5)

array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5])


>```np.zeros()``` and ```np.ones()``` : returns an array of specified number of 0s or 1s respectively.

arguments: **shape** (tuple with the dimensions of required array)

In [11]:
np.zeros(5)

array([0., 0., 0., 0., 0.])

In [12]:
np.ones(5)

array([1., 1., 1., 1., 1.])

In [13]:
np.ones((5,3)) # a 5x3 matrix/array with 1s

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

>```np.eye()```: returns an identity matrix of specified dimension

In [14]:
np.eye(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

>```np.diag()```: returns a diagnol matrix with given diagnol

In [15]:
np.diag([2, 3, 5])

array([[2, 0, 0],
       [0, 3, 0],
       [0, 0, 5]])

>```np.linspace()```: returns an array with 'n' linearly spaced elements within given limits (both inclusive)

arguments: (**start, stop, n**)

In [16]:
np.linspace(0,10,3)

array([ 0.,  5., 10.])

In [17]:
np.linspace(0,10,7)

array([ 0.        ,  1.66666667,  3.33333333,  5.        ,  6.66666667,
        8.33333333, 10.        ])

**Note: We can't create more than single dimension arrrays using linspace, but we can reshape these with functions we will learn soon below!!**

#### Random arrays: functions within np.random

Probably the most useful section uptill now, as this gives us easy access to random samples/starting points in various applications/problems.
Ex: weights in neural networks.



>```np.random.rand()```: creates and returns an array of the given shape and populate it with random samples from a uniform distribution over [0, 1)

arguments: **shape** (tuple with the dimensions of required array)

In [18]:
np.random.rand(2)

array([0.31663615, 0.32014245])

In [19]:
np.random.rand(5,4)

array([[0.85377102, 0.8786977 , 0.82931334, 0.16552887],
       [0.18340069, 0.62124266, 0.60406425, 0.41845591],
       [0.34934176, 0.58030205, 0.13673916, 0.77996222],
       [0.40774046, 0.49830935, 0.78100554, 0.17674754],
       [0.03770598, 0.42525314, 0.3777771 , 0.68570912]])


>Variation: ```np.random.uniform()```: creates and returns an array of the given shape and populate it with random samples from a uniform distribution over given range

arguments: **shape** (tuple with the dimensions of required array), **range**

In [20]:
np.random.uniform(-1, 1, (2,3))

array([[ 0.57933653, -0.41785422,  0.74257985],
       [ 0.89272914,  0.56052423, -0.15375608]])


>```np.random.randn()```: creates and returns an array of the given shape and populate it with random samples from a normal distribution

arguments: **shape** (tuple with the dimensions of required array)

In [21]:
np.random.randn(2)

array([-0.42646238, -0.43527716])

In [22]:
np.random.randn(5,4)

array([[-0.38613724, -0.15546191, -0.9121641 ,  0.98603834],
       [-1.29835943,  0.27586757, -1.39430098,  1.74227616],
       [-0.77932724,  0.92922864, -0.84706807,  0.29045682],
       [ 0.09381858,  1.29486593,  0.59146952,  0.03577325],
       [-0.47773024, -0.25641838, -0.02684277, -0.41553366]])


>```np.random.randint()```: creates and returns an array of the random integers in given range

arguments: **start**, **stop**, **shape**

In [23]:
np.random.randint(1, 100)

63

In [24]:
np.random.randint(1, 100, 10)

array([15, 85, 87, 65, 98, 50,  2, 16, 57, 91])

In [25]:
np.random.randint(1, 100, (5,4))

array([[15, 98, 98, 23],
       [73, 69, 82, 86],
       [63, 54, 94, 65],
       [28, 41, 23, 19],
       [66, 80, 66, 60]])

> **Note:** You wouldve noticed that in some cases we input the shape as a tuple (like in **randint** just above), and in some just a numbers separated by commas (like **rand**). The difference lies in what its arguments are. In **rand**, the only arguments it takes are the array's dimension(s), while functions like **randint** take more then just the dims/shape of the array to be created, and so has to be separated using paranthesis (i.e, a tuple)

### Dimensions/Axes

In numpy arrays, dimensions refers to the number of axes needed to index it, not the dimensionality of any geometrical space.

For example,

```python
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8],
       ])
```

has a dimension of 2, implying we can access any element with two indices, but it still describes a 3D-space as each row has 3 elements. In other words, it has 2 axes, and length of each row in that axes is 3

Let's look at some more examples using code:


In [26]:
arr = np.array(
    [[ 1., 2., 17.3], 
    [ 0., 12., 22.23]]
)

print(f"Type: {type(arr)}")
print(f"Dimensions: {arr.ndim}")
print(f"Shape: {arr.shape}") # Returns tuple (r, c), for matrix with r rows and c columms
print(f"Type: {arr.dtype}") # Returns data type of elements in array
print(f"Size: {arr.size}") # Returns total number of elements in array

Type: <class 'numpy.ndarray'>
Dimensions: 2
Shape: (2, 3)
Type: float64
Size: 6


#### Some useful methods to call on an array

In [27]:
# maximum element in arr
arr.max()

22.23

In [28]:
# position of maximum element in arr
arr.argmax()

5

In [29]:
# minimum element in arr
arr.min()

0.0

In [30]:
# position of minimum element in arr
arr.argmin()

3

In [31]:
arr.ravel() # retuening single dimension version of the array
# here returns a 1x8 from a 2x4 matrix

array([ 1.  ,  2.  , 17.3 ,  0.  , 12.  , 22.23])

In [32]:
# transpose of an array
arr.T

array([[ 1.  ,  0.  ],
       [ 2.  , 12.  ],
       [17.3 , 22.23]])

#### Reshaping an array

In [33]:
arr = np.arange(24)
arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23])

In [34]:
arr.shape

(24,)

> the ```reshape()``` method takes in the new shape of array (provided it is valid for the given array), in the order outer to inner dimension (i.e., indexing order, for example, ```planes->rows->columns```).

In [35]:
arr = arr.reshape(4,6) 
arr

array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23]])

In [36]:
arr.shape

(4, 6)

In [37]:
arr = arr.reshape(6,4)

In [38]:
arr 

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23]])

> Note: the argument value *-1*, is basically telling the function to choose that dimnesion value for us. Here were working with an array of 24 elements, so if we choose 8 and let the other argument be -1, the function chooses 24/8 whyich is 3 for us. For obvious reason we can use -1 only once in a funciton call

In [39]:
# a -1 argument is used to twll the function to choose this dimension for us
arr.reshape(8,-1)

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14],
       [15, 16, 17],
       [18, 19, 20],
       [21, 22, 23]])

### Indexing, Slicing and Iterating

These operations work very much like those on lists

In [40]:
a = np.array([1, 4, 9, 16, 25, 36, 49, 64, 81, 100])

print(a, end='\n\n')
print(a[2], end='\n\n') # Index based Selection
print(a[3:8], end='\n\n') # Range based Selection

a[:6:2] = -10 # What does this do?

# uncomment to check if youre guess matches what actually happens
# print(a, end='\n\n')

# print(a[::-1], end='\n\n') # Reversed List

[  1   4   9  16  25  36  49  64  81 100]

9

[16 25 36 49 64]



Simlarly with 2 dimensional arrays

In [41]:
arr = np.arange(24).reshape(3, 8)

print(arr, end='\n\n')

print(arr[2, 2], end='\n\n')

print(arr[2], end='\n\n') # is the same as arr[2, :], and arr[2, 0:8]

print(arr[1:3, -1]) # Last column of rows indexed 1, 2

# and so on with n dimensional arrays

[[ 0  1  2  3  4  5  6  7]
 [ 8  9 10 11 12 13 14 15]
 [16 17 18 19 20 21 22 23]]

18

[16 17 18 19 20 21 22 23]

[15 23]


### Copies and Views

We know that python passes mutable objects as references, so much like lists we know need to undertand the cases where ndarrays are different copies of the same object, or just a reference. Knowing when these cases arise and how to avoid them, will save us from a whole lot of bugs and annoying results.

#### Case 1: No copy made

> - Simple assignments do not create copies of the ndarray object

In [42]:
a = np.arange(5)
print(a)

b = a 
print(b is a)

[0 1 2 3 4]
True


```is``` checks the objects ```id``` (address), and since it shows that they're the same, we confirm that they are not 2 different copies.

> - Similarly, functions which deal with mutable objects work with references by default

In [43]:
def f(x):
    return x

b = f(a)
print(f"a: {id(a)}")
print(f"b: {id(b)}")

b is a

a: 1500990645040
b: 1500990645040


True

#### Case 2: Shallow copy (or view)

A shallow copy is a  references object, i.e., it is stored somewhere else, but has all the properties of the object it references. Additionally, changes in the original or in any copy affect all of them (much like pointers to string in C if you remember :p).

We use the ```view``` method to create a shallow copy/view

In [44]:
c = a.view()
print(a)
print(c)

[0 1 2 3 4]
[0 1 2 3 4]


In [45]:
c is a
# they are different objects as their id's are different

False

In [46]:
c[2] = 99
print(c)
print(a)

# notice that both change, though we modified only c

[ 0  1 99  3  4]
[ 0  1 99  3  4]


> **Note:** Slicing an array creates a view of it

#### Case 3: Deep Copy (or true copy)

Like the (alternative) name suggests, this creates an actual copy of the object, where the copies and original are independent from one another.

We use the ```copy``` method on an array to achieve this

In [47]:
d = a.copy()

print(d is a, end="\n\n")

d[0] = 911
print(a, end="\n\n")
print(d)

False

[ 0  1 99  3  4]

[911   1  99   3   4]


### Mathematics with ndarrays

#### Vectorization

The first and most important thing you should know about operation on vectors is that theyre *vectorized*, i.e., operations on the vector affect each and every element in it.

To highlight on this difference, let us contrast with that on a list.

In [48]:
a_list = [1, 2, 3, 4, 5]
a = np.array(a_list)

In [49]:
a_list*4

[1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5]

The list has been appended to itself 4 times

In [50]:
a*4

array([ 4,  8, 12, 16, 20])

Each element in the array has been multiplied by 4.
>**Note:** This does not modify the original array, it returns a new array with these elements, so we will need a LH element to catch it as follows:

```python
arr = a*4
```


In [51]:
# similar operations like addition, power, division etc. work this way too

a**4.3

array([1.00000000e+00, 1.96983106e+01, 1.12621523e+02, 3.88023441e+02,
       1.01291037e+03])

#### Products

>```np.dot()```

- If both a and b are 1-D arrays, it is inner product of vectors (without complex conjugation).

- If both a and b are 2-D arrays, it is matrix multiplication, but using ```matmul``` or ```a @ b``` is preferred.

- If either a or b is 0-D (scalar), it is equivalent to multiply and using ```numpy.multiply(a, b)``` or ```a * b``` is preferred.

- If a is an N-D array and b is a 1-D array, it is a sum product over the last axis of a and b.

- If a is an N-D array and b is an M-D array (where M>=2), it is a sum product over the last axis of a and the second-to-last axis of b

In [52]:
a = np.arange(1,11)
b = np.arange(1,20,2)

# since both are 1-D, it implements scalar product
np.dot(a, b)

715

In [53]:
c = np.arange(24).reshape(3,8)
d = np.arange(40).reshape(8,5)

# matrix multipication
np.dot(c, d)

array([[ 700,  728,  756,  784,  812],
       [1820, 1912, 2004, 2096, 2188],
       [2940, 3096, 3252, 3408, 3564]])

In [54]:
# using @

print(a@b, end="\n\n")
print(c@d)

715

[[ 700  728  756  784  812]
 [1820 1912 2004 2096 2188]
 [2940 3096 3252 3408 3564]]


Try the other variations on your own, but feel free to ask doubts whenever you're stuck.

Also how do you think cross product works here (try implementing it on your own before seeing the [numpy docs](https://numpy.org/doc/stable/index.html) for inbuilt function)

#### Sums

In [55]:
arr = np.arange(24).reshape(4, -1)
print(arr)

# sum of all elements in the array
arr.sum()

[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]]


276

In [56]:
# taking sums wrt a particular axis

print(arr.sum(axis=0), end="\n\n") # Sum of axis 0, i.e. row
print(arr.sum(axis=1)) # Sum of axis 1, i.e. columns

[36 40 44 48 52 56]

[ 15  51  87 123]
