# Libraries

A library is a collection of files (called modules) that contains functions for use by other programs. Libraries provide ways of extending Python's functionality in different ways. They may also contain data values (e.g., numerical constants), entire sample data sets, and other things. A library's contents are supposed to be related, although there's no actual way to enforce that.

# Numpy 
In this notebook, we'll encounter an important package for scientific computing in Python: NumPy.

**Learning Objectives**
* Install and import packages for Python
* Create NumPy arrays
* Perform indexing and slicing on NumPy arrays
* Execute methods & access attributes of arrays
* Understand rules of broadcasting in Numpy: how NumPy treats arrays with different shapes during arithmetic operations

## Importing packages

To use a library in a particular Jupyter notebook or other Python program, we must import it using the import statement, like this:
```python
import numpy 
```
We can also nickname the modules when we import them.

The convention is to import `numpy` as `np`

Then, when we want to use modules or functions in this library, we preface them with np.

In [1]:
!pip install numpy



In [3]:
import numpy as np

Once a library is imported, we can use functions and methods from it. But, for functions we have to tell Python that the function can be found in a particular library we imported. For example, numpy has a function called array for creating arrays. To run this command, we would need to type:
```python
np.array()
```

## Numpy

**Numpy** is the fundamental package for scientific computing with Python. It'll allow us to work with bigger datasets more efficiently.
- NumPy provides scientific computing tools in Python
- Core object of NumPy are homogenous multi-dimensional arrays
- All elements within a single NumPy array have the same data type

## Creating arrays with numpy

### Creating `numpy` arrays

A numpy **array** is a grid of values which are all the same type (they’re homogenous).

We can create a numpy array in a few different ways:

* from a Python list or tuples
* by using functions that are dedicated to generating numpy arrays, such as `arange`, `linspace`, `empty`,`zeroes`, etc.
* reading data from files

#### Step 1: Using lists

* From python list, the array can simply be created by passing the list to np.array() method
```python
a = np.array([1, 2, 3])
```

This creates the array we can see on the right here:

![](http://jalammar.github.io/images/numpy/create-numpy-array-1.png)

In [16]:
print(type(a), a.dtype, a.shape, a[0], a[1], a[2])
a[0] = 5                 # Change an element of the array
print(a)       

<class 'numpy.ndarray'> int64 (3,) 1 2 3
[5 2 3]


* You can explicitly set the data type: 
```python 
a = np.array([1, 2, 3], dtype='float32') # Force a particular datatype
```

You can read all about numpy datatypes in the [documentation](http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html).

* You can create a multidimensional array by passing nested lists, e.g.  
```python 
b = np.array([[1,2],[3,4]])
b = np.array([[[1,2],[3,4]], [[5,6],[7,8]]])
```

![](http://jalammar.github.io/images/numpy/numpy-array-create-2d.png)

![](http://jalammar.github.io/images/numpy/numpy-3d-array.png)

In [23]:
#Check 
print('Data type for array B',b.dtype)
print('Shape of array B',b.shape)

Data type for array B int64
Shape of array B (2, 2, 2)


#### Step 2: Using functions dedicated to generating numpy arrays

numpy provides methods like `ones()`, `zeros()`, and `random.random()` for these cases. We just pass them the number of elements we want it to generate:

![](http://jalammar.github.io/images/numpy/create-numpy-array-ones-zeros-random.png)

We can also use these methods to produce multi-dimensional arrays, as long as we pass them a tuple describing the dimensions of the matrix we want to create:

![](http://jalammar.github.io/images/numpy/numpy-matrix-ones-zeros-random.png)

![](http://jalammar.github.io/images/numpy/numpy-3d-array-creation.png)

Sometimes, we need an array of a specific shape with “placeholder” values that we plan to fill in with the result of a computation. The `zeros` or `ones` functions are handy for this 



Numpy also has two useful functions for creating sequences of numbers: arange and linspace.

The arange function accepts three arguments, which define the start value, stop value of a half-open interval, and step size. (The default step size, if not explicitly specified, is 1; the default start value, if not explicitly specified, is 0.)

The linspace function is similar, but we can specify the number of values instead of the step size, and it will create a sequence of evenly spaced values.
```python
f = np.arange(10,50,5)   # Create an array of values starting at 10 in increments of 5
g = np.linspace(0., 1., num=5) # Create an array of 5 numbers linearly spaced between 0 and 20
```


### Difference between lists and numpy arrays

Lists cannot directly handle mathematical operations but arrays can
```python 
a = [1,2,3] #list
b = [4,5,6] #list
print(a + b)  ##Operation on lists
print(np.array(a) + np.array(b)) ## Operation on NumPy arrays
```

### Indexing and slicing 

Numpy offers several ways to index into arrays.

#### Array indexing

We can index and slice numpy arrays in all the ways we can slice Python lists:

![](http://jalammar.github.io/images/numpy/numpy-array-slice.png)

And you can index and slice numpy arrays in multiple dimensions. If slicing an array with more than one dimension, you should specify a slice for each dimension:

![](http://jalammar.github.io/images/numpy/numpy-matrix-indexing.png)

```python
a = np.array([[1,2,3], [4,5,6], [7,8,9]])

#Indexing 
print('Indexed element 1: ',a[1,0]) #the element in second row and first column --Remember indexing starts with zero 
print('Indexed element 2: ',a[1,-1]) #Negative indices count from the end, i.e., -1 means last column, -2 means second-last column and so on


#Slicing into the array
# Use slicing to pull out the subarray consisting of extracting the first two rows and and all columns after the first column
b=a[:2,1:] 
print('Slice',b)
```

You can also mix integer indexing with slice indexing. However, doing so will yield an array of lower rank than the original array. 

In [25]:
# Create the following rank 2 array with shape (3, 4)
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(a)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


In [27]:
row_r1 = a[1, :]    # Rank 1 view of the second row of a  
row_r2 = a[1:2, :]  # Rank 2 view of the second row of a
print(row_r1, row_r1.shape)
print(row_r2, row_r2.shape)

[5 6 7 8] (4,)
[[5 6 7 8]] (1, 4)


#### Boolean indexing

Boolean array indexing: Boolean array indexing lets you pick out arbitrary elements of an array. Frequently this type of indexing is used to select the elements of an array that satisfy some condition. Here is an example


In [29]:
#Boolean indexing
a = np.array([[1,2,3], [4,5,6], [7,8,9]])
# 
bool_ix = a > 4 # Find the elements of a that are bigger than 4;
          # this returns a numpy array of Booleans of the same
          # shape as a, where each slot of bool_idx tells
          # whether that element of a is > 4.
print('Boolean indices', bool_ix)
d = a[bool_ix]
print('Boolean indexed array',d)

Boolean indices [[False False False]
 [False  True  True]
 [ True  True  True]]
Boolean indexed array [5 6 7 8 9]


* Boolean indexing can be very useful in assignment
```python 
a[bool_ix] = 0
print('Result after selective assignment', a)
```

#### Advanced integer indexing

Advanced integer indexing allows you to construct arbitrary arrays using the data from another array.

In [28]:

a = np.array([[1,2], [3, 4], [5, 6]])

# An example of integer array indexing.
# The returned array will have shape (3,): The first array provides the row index and the second array 
#specifies the element to choose for the corresponding row
print(a[[0, 1, 2], [0, 1, 0]]) # equivalent to np.array([a[0,0], a[1,1], a[2,0]])

# The above example of integer array indexing is equivalent to this:
print(np.array([a[0, 0], a[1, 1], a[2, 0]]))  

# When using integer array indexing, you can reuse the same element from input array
print(a[[0, 0], [1, 1]])  # Prints "[2 2]"

# Equivalent to the previous integer array indexing example
print(np.array([a[0, 1], a[0, 1]]))

[1 4 5]
[1 4 5]
[2 2]
[2 2]


### Views and copies
When manipulating arrays, data is sometimes copied into a new array and sometimes not <br>
There are 3 cases: 


#### Case 1: Simple assignment - No new object is created 
```python 
a = np.arange(12)
b = a
b[1]=13
b.shape=3,4
print('Simple assignment: A data', a) #Data of a is changed by changing b's data
print('Simple assignment: A shape', a.shape) #Shape of a is changed by changing b's shape

```

#### Case 2: View of an array - data is shared 
```python 
a = np.arange(12)
b = a.view()
b[1]=13
b.shape=3,4
print('View: A data', a) # Data is changed
print('View: A shape', a.shape) #Shape is not changed
```

#### Case 3: Copy of an array - a completely new array object with new data is created
```python 
a=np.arange(12)
b=a.copy()
b[1]=13
b.shape=3,4
print('Copy: A data', a) # Data is not changed
print('Copy: A shape', a.shape) #Shape is not changed
```

_Remember slicing gives a view of the same array--so changing the slice modifies the data of original array (Case 2 above)_

In [8]:
a = np.array([[1,2,3], [4,5,6], [7,8,9]])
print('Input array', a)
b=a[:2,1:] 
b[0,:]=[10,11]

print('Modified array',a)

Input array [[1 2 3]
 [4 5 6]
 [7 8 9]]
Modified array [[ 1 10 11]
 [ 4  5  6]
 [ 7  8  9]]


### Reshaping NumPy arrays

Apart from computing mathematical functions using arrays, we frequently need to reshape or otherwise manipulate data in arrays. The simplest example of this type of operation is transposing a matrix; to transpose a matrix, simply use the T attribute of an array object.

![](http://jalammar.github.io/images/numpy/numpy-transpose.png)

```python 
x = np.array([[1, 2], [3, 4], [5, 6]])

print(x)
print("transpose\n", x.T)
```

In more advanced use case, you may find yourself needing to change the dimensions of a certain matrix. This is often the case in machine learning applications where a certain model expects a certain shape for the inputs that is different from your dataset. numpy's `reshape()` method is useful in these cases.

![](http://jalammar.github.io/images/numpy/numpy-reshape.png)

In [10]:
a = np.array([[1,2,3], [4,5,6], [7,8,9],[10,11,12]])
b = a.reshape(6,2) #The product of row and column sizes in reshape function, i.e. 6 x 2 must match the product of input dimensions, i.e., 3x4  
print('Reshaped array', b)



#Automatic reshaping
c = a.reshape(6,-1) # -1 means "find whatever is needed for reshaping"

print('Automatically reshaped array', c)

#Flatten arrays
d = a.flatten() # or a.ravel()
print('Flattened array', d)

Reshaped array [[ 1  2]
 [ 3  4]
 [ 5  6]
 [ 7  8]
 [ 9 10]
 [11 12]]
Automatically reshaped array [[ 1  2]
 [ 3  4]
 [ 5  6]
 [ 7  8]
 [ 9 10]
 [11 12]]
Flattened array [ 1  2  3  4  5  6  7  8  9 10 11 12]


### Stacking

Sometimes, we may want to construct an array from existing arrays by “stacking” the existing arrays, either vertically or horizontally. We can use vstack() (or row_stack) and hstack() (or column_stack), respectively

In [67]:
a = np.array([[1,2,3], [4,5,6]])
b = np.array([[7,8,9],[10,11,12]])

#Stack arrays in sequence vertically, i.e., row-wise
print('Vertical stack', np.vstack([a,b]))   #or equivalently, np.concatenate((a,b),axis=0)

#Stack arrays in sequence horizontally, i.e., column-wise
print('Horizontal stack', np.hstack([a,b])) #or equivalently, np.concatenate((a,b),axis=1)

#Tiling 
c = np.tile(a,(4,1)) #Stack 4 copies of a vertically on top of each other
print(c)

Vertical stack [[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
Horizontal stack [[ 1  2  3  7  8  9]
 [ 4  5  6 10 11 12]]
[[1 2 3]
 [4 5 6]
 [1 2 3]
 [4 5 6]
 [1 2 3]
 [4 5 6]
 [1 2 3]
 [4 5 6]]


### Example math operations on NumPy multi-dimensional arrays

What makes working with `numpy` so powerful and convenient is that it comes with many *vectorized* math functions for computation over elements of an array. These functions are highly optimized and are *very* fast - much, much faster than using an explicit `for` loop.

For example, let’s create a large array of random values and then sum it both ways. We’ll use a `%%time` *cell magic* to time them.

In [37]:
a = np.random.random(100000000)

In [38]:
%%time
x = np.sum(a)

CPU times: user 59.8 ms, sys: 23.9 ms, total: 83.7 ms
Wall time: 79.6 ms


In [39]:
%%time
x = 0 
for element in a:
  x = x + element

CPU times: user 10.8 s, sys: 6.22 ms, total: 10.8 s
Wall time: 10.8 s


Look at the “Wall Time” in the output - note how much faster the vectorized version of the operation is! This type of fast computation is a major enabler of machine learning, which requires a *lot* of computation.

Whenever possible, we will try to use these vectorized operations.

For example, you can perform an elementwise sum on two arrays using either the + operator or the `add()` function.

![](http://jalammar.github.io/images/numpy/numpy-arrays-adding-1.png)

![](http://jalammar.github.io/images/numpy/numpy-matrix-arithmetic.png)

In [42]:
x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)

# Elementwise sum; both produce the array
print(x + y)
print(np.add(x, y))

[[ 6.  8.]
 [10. 12.]]
[[ 6.  8.]
 [10. 12.]]


In [43]:
a = np.array( [[1,1], [0,1]] )
b = np.array( [[2,5], [3,4]] )

print('Input matrix A', a)
print('Input matrix B', b)

Input matrix A [[1 1]
 [0 1]]
Input matrix B [[2 5]
 [3 4]]


In [44]:
# Elementwise sum: equaivalent to np.add(a,b)
R = a + b 
print('Elementwise sum', R)

Elementwise sum [[3 6]
 [3 5]]


In [45]:
# Elementwise subtraction: equaivalent to np.subtract(a,b)
R = a - b 
print('Elementwise subtraction', R)

Elementwise subtraction [[-1 -4]
 [-3 -3]]


In [46]:
# Elementwise product: equaivalent to np.multiply(a,b)
R = a * b 
print('Elementwise product', R)

Elementwise product [[2 5]
 [0 4]]


In [47]:
# Elementwise division: equaivalent to np.divide(a,b)
R = a / b 
print('Elementwise division', R)

Elementwise division [[0.5  0.2 ]
 [0.   0.25]]


Note that `*` is elementwise multiplication, not matrix multiplication. We instead use the `dot()` function to compute inner products of vectors, to multiply a vector by a matrix, and to multiply matrices. `dot()` is available both as a function in the numpy module and as an instance method of array objects:

![](http://jalammar.github.io/images/numpy/numpy-matrix-dot-product-1.png)

In [48]:
#Matrix products
R = a @ b #Matrix product: equivalently use R = a.dot(b)
print('Print Matrix Product', R)


Print Matrix Product [[5 9]
 [3 4]]


In [49]:
#Simple unary operations on NumPy arrays 
print('Max entry', np.max(R))
print('Min entry', np.min(R))
print('Sum of entries', np.sum(R))


##Useful Matrix operations
#Matrix Transpose
RT = R.T
print('Transpose of R', RT)

#Matrix Trace
print('Trace of R', np.trace(R))

#Matrix inverse
print('Inverse of R', np.linalg.inv(R)) ##

Max entry 9
Min entry 3
Sum of entries 21
Transpose of R [[5 3]
 [9 4]]
Trace of R 9
Inverse of R [[-0.57142857  1.28571429]
 [ 0.42857143 -0.71428571]]


Not only can we aggregate all the values in a matrix using these functions, but we can also aggregate across the rows or columns by using the `axis` parameter:

![](http://jalammar.github.io/images/numpy/numpy-matrix-aggregation-4.png)

In [51]:
print(np.max(R, axis=0))  # Compute max of each column; 
print(np.max(R, axis=1))  # Compute max of each row; 

[5 9]
[9 4]


When working with numpy arrays, it’s often helpful to get the *indices* (not only the values) of array elements that meet certain conditions. There are a few numpy functions that you’ll definitely want to remember:

-   [`argmax`](https://numpy.org/doc/stable/reference/generated/numpy.argmax.html) (get index of maximum element in array)
-   [`argmin`](https://numpy.org/doc/stable/reference/generated/numpy.argmin.html) (get index of minimum element in array)
-   [`argsort`](https://numpy.org/doc/stable/reference/generated/numpy.argsort.html) (get sorted list of indices, by element value, in ascending order)
-   [`where`](https://numpy.org/doc/stable/reference/generated/numpy.where.html) (get indices of elements that meet some condition)

### Broadcasting: dealing with inputs of different shapes

Broadcasting is a powerful mechanism that allows numpy to work with arrays of different shapes when performing arithmetic operations.

For example: basic linear algebra, we can only add (and perform similar element-wise operations) two matrics that have the *same* dimension. In numpy, if we want to add two matrics that have different dimensions, numpy will implicitly “extend” the dimension of one matrix to match the other so that we can perform the operation.

So these operations will work, instead of returning an error:

![](https://sebastianraschka.com/images/blog/2020/numpy-intro/broadcasting-1.png)

![](https://sebastianraschka.com/images/blog/2020/numpy-intro/broadcasting-2.png)

Broadcasting two arrays together follows these rules:

**Rule 1**: If the two arrays differ in their number of dimensions, the shape of the one with fewer dimensions is padded with ones on its leading (left) side.

For example, in the following cell, `a` will be implicitly extended to shape (1,3):

In [35]:
a = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]])
b = np.array([0, 1, 2])
print(a.shape, b.shape)
c = a + b # Add b to each row of a using NumPy broadcasting
print(c) 


(4, 3) (3,)
[[ 1  3  5]
 [ 4  6  8]
 [ 7  9 11]
 [10 12 14]]




**Rule 2**: If the shape of the two arrays does not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.

For example, in the following cell a will be implicitly extended to shape (3,2):


In [30]:
a = np.array([[1],[2],[3]])         # has shape (3,1)
b = np.array([[4,5], [6,7], [8,9]]) # has shape (3,2)
c = a + b                           # will have shape (3,2) 

**Rule 3**: If in any dimension the sizes disagree and neither is equal to 1, an error is raised:

In [32]:
a = np.array([[1],[2],[3],[4]])      # has shape (4,1)
b = np.array([[4,5], [6,7], [8,9]])  # has shape (3,2)
c = a + b                            # ValueError: operands could not be broadcast

Functions that support broadcasting are known as universal functions. You can find the list of all universal functions in the [documentation](http://docs.scipy.org/doc/numpy/reference/ufuncs.html#available-ufuncs).

Here are a few visual examples involving broadcasting.

![](http://jalammar.github.io/images/numpy/numpy-array-broadcast.png)

Note that these arrays are compatible in each dimension if they have either the same size in that dimension, or if one array has size 1 in that dimension.

![](http://jalammar.github.io/images/numpy/numpy-matrix-broadcast.png)

Broadcasting typically makes your code more concise and faster, so you should strive to use it where possible.

**Attribution**:

-   Parts of this notebook are adapted from a [tutorial from CS231N at Stanford University](https://cs231n.github.io/python-numpy-tutorial/), which is shared under the [MIT license]((https://opensource.org/licenses/MIT)).
-   Parts of this notebook are adapted from Jake VanderPlas’s [Whirlwind Tour of Python](https://colab.research.google.com/github/jakevdp/WhirlwindTourOfPython/blob/master/Index.ipynb), which is shared under the [Creative Commons CC0 Public Domain Dedication license](https://github.com/jakevdp/WhirlwindTourOfPython/blob/master/LICENSE).
-   The visualizations in this notebook are from [A Visual Intro to NumPy](http://jalammar.github.io/visual-numpy/) by Jay Alammar, which is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
-   Parts of this notebook (and some images) about `numpy` broadcasting are adapted from Sebastian Raschka’s [STATS451](https://github.com/rasbt/stat451-machine-learning-fs20) materials.