<p style="font-family: Arial; font-size:3.75em;color:#2462C0; font-style:bold"><br>
Introduction to numpy:
</p><br>

<p style="font-family: Arial; font-size:1.25em;color:#2462C0; font-style:bold"><br>
Package for scientific computing with Python
</p><br>

Numerical Python, or "Numpy" for short, is a foundational package on which many of the most common data science packages are built.  Numpy provides us with high performance multi-dimensional arrays which we can use as vectors or matrices.  

The key features of numpy are:

- ndarrays: n-dimensional arrays of the same data type which are fast and space-efficient.  There are a number of built-in methods for ndarrays which allow for rapid processing of data without using loops (e.g., compute the mean).
- Broadcasting: a useful tool which defines implicit behavior between multi-dimensional arrays of different sizes.
- Vectorization: enables numeric operations on ndarrays.
- Input/Output: simplifies reading and writing of data from/to file.

<b>Additional Recommended Resources:</b>
"https://docs.scipy.org/doc/numpy/reference/


<p style="font-family: Arial; font-size:2.75em;color:#2462C0; font-style:bold"><br>
Getting started with ndarray<br><br></p>

**ndarrays** are time and space-efficient multidimensional arrays at the core of numpy.  Like the data structures in Week 2, let's get started by creating ndarrays using the numpy package.

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>
Create Rank 1 numpy arrays:
</p>

In [None]:
import numpy as np

array_1d = np.array([31, 5, 45])

The type of an ndarray is: "<class 'numpy.ndarray'>"

In [None]:
print(type(array_1d))              

the shape of the array should have just one dimension (Rank 1)

In [None]:
print(array_1d.shape)

this is a 1-rank array, thus only one index is needed to accesss each element

In [None]:
print(array_1d[0], array_1d[1], array_1d[2]) 

ndarrays are mutable i.e. elements of the array can be changed

In [None]:
array_1d[0] =888            

print(array_1d)

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>
Create a Rank 2 numpy array:</p>

A rank 2 **ndarray** is one with two dimensions.  

Notice the format below of [ [row] , [row] ].  2 dimensional arrays may be used for representing matrices/datasets.

In [None]:
array_2d = np.array([[11,12,13],[21,22,23]])

print(array_2d)  # print the array

In [None]:
print("The shape of the array is 2r*3c: ", array_2d.shape)

In [None]:
print("Accessing elements at position [0,0], [0,1], [1,0] and [1,2] of the ndarray: ", 
      array_2d[0, 0], ", ",
      array_2d[0, 1],", ", 
      array_2d[1, 0],", ",
      array_2d[1, 2] )

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>
Creating ndarrays with numpy methods:
</p>

Generally we will not be creating array using the above methods. numpy has a number of built in methods which can quickly create multidimensional arrays.

### Create a 2x2 array of zeros

In [None]:
nd_0 = np.zeros((2,2))      
print(nd_0)                              

### create a 2x2 array filled with  value 5

In [None]:
nd_5 = np.full((2,2), 5.0)  
print(nd_5)   

### create a 2x2 matrix with the diagonal 1s and the others 0

In [None]:
nd_1d = np.eye(2,2)
print(nd_1d)  

### create an array of ones

In [None]:
nd_1 = np.ones((1,2))
print(nd_1)    

the above ndarray is a rank 2 array

In [None]:
print(nd_1.shape)

In [None]:
print(nd_1[0,1])

### create an array of random floats between 0 and 1

In [None]:
nd_rand = np.random.random((3,3))
print(nd_rand)    

<p style="font-family: Arial; font-size:2.75em;color:#2462C0; font-style:bold"><br>
Array Indexing
<br></p>

<p style="font-family: Arial; font-size:1.75em;color:#2462C1; font-style:bold"><br>
Slice indexing:
</p>

Similar to the use of slice indexing with lists and strings, we can use slice indexing to pull out sub-regions of ndarrays. Create Rank 2 array of shape (3, 4)

In [None]:
nd_rand = np.random.random((3,4))
print(nd_rand)

Use array slicing to get a subarray consisting of the first 3 rows x 3 columns.

In [None]:
slice_nd = nd_rand[0:3, 0:3]
print(slice_nd)

Modifying a slice will lead to modifying the underlying array

In [None]:
print("Before:", nd_rand[0, 1])
slice_nd[0, 1] = 1000
print("After:", nd_rand[0, 1])    

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>
Using both integer indexing & slice indexing
</p>

Combinations of integer indexing and slice indexing can be used to create different shaped matrices. 
Create a Rank 2 array of shape (3, 4)

In [None]:
an_array = np.array([[10,14,16,18], [24,26,28,29], [30,33,36,38]])
print(an_array)

Using both integer indexing & slicing generates an array of lower rank

In [None]:
row_rank1 = an_array[1,:]

print("\nRank 1 view. \n\n", row_rank1, "\n\nNotice only a single []")
print("The dimension is", row_rank1.shape) 

Slicing alone will generate an array of the same rank as the an_array

In [None]:
row_rank2 = an_array[1:2, :]

print("\nRank 2 view. \n\n", row_rank2, "\n\nNotice double [ [] ]")
print("The dimension is", row_rank2.shape)  

#### Exercise: Create  rank 1 and rank 2 column arrays using slicing

In [None]:
print()
col_rank1 = an_array[:, 1]
col_rank2 = an_array[:, 1:2]

print(col_rank1, col_rank1.shape)
print()
print(col_rank2, col_rank2.shape)

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>
Array Indexing for changing elements:
</p>

It at times may be useful to use an array of indexes to access elements or change elements.

In [None]:
an_array = np.array([[10,14,17], [25,27,29], [33,36,39], [40,45,47]])

print('Original Array:')
print(an_array)

Creating an array of indices

In [None]:
col_indices = np.array([0, 1, 2, 2])
print('\nCol indices picked : ', col_indices)

row_indices = np.arange(4)
print('\nRows indices picked : ', row_indices)

The pairings of row_indices and col_indices is shown below.  We will change the elements at these indexes.

In [None]:
for row,col in zip(row_indices,col_indices):
    print(row, ", ",col)

Selecting one element from each row


In [None]:
print('Values in the array at those indices: ',an_array[row_indices, col_indices])

Change one element from each row using the indices selected

In [None]:
an_array[row_indices, col_indices] += 10000

print('\nChanged Array:')
print(an_array)

<p style="font-family: Arial; font-size:2.75em;color:#2462C0; font-style:bold"><br>
Boolean Indexing
<br></p>

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>
Array Indexing for changing elements:
</p>

In [None]:
an_array = np.array([[16,18], [26, 29], [35, 37]])
print(an_array)

create a filter to find elements greater than 15. The filter will be boolean values for whether each element meets the condition or not.

In [None]:
filter = (an_array > 15)
filter

The `filter` is of the same size ndarray as `an_array` which is filled with True for each element whose corresponding element in an_array which is greater than 15 and False for those elements whose value is less than 15. Now, we can now select just those elements which meet that criteria

In [None]:
print(an_array[filter])

We can apply filter directly as below:

In [None]:
an_array[(an_array % 2 == 0)]

We can change the elements in the array by applying filter logic.

In [None]:
an_array[an_array % 2 == 0] +=100
print(an_array)

<p style="font-family: Arial; font-size:2.75em;color:#2462C0; font-style:bold"><br>
Datatypes and Array Operations
<br></p>

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>
Datatypes:
</p>

Python assigns the  data type itself.

In [None]:
ex1 = np.array([11, 12]) 
print(ex1.dtype)

In [None]:
ex2 = np.array([11.0, 12.0]) 
print(ex2.dtype)

We can explicitely mention the data type as well.

In [None]:
ex3 = np.array([11, 21], dtype=np.int64) 
print(ex3.dtype)

We can force data types as well. floats into integers

In [None]:
ex4 = np.array([11.1,12.7], dtype=np.int64)
print(ex4.dtype, ex4)

In [None]:
ex5 = np.array([11, 21], dtype=np.float64)
print(ex5.dtype)

print(ex5)

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>
Arithmetic Array Operations:
</p>

In [None]:
x = np.array([[111,112],[121,122]], dtype=np.int)
y = np.array([[211.1,212.1],[221.1,222.1]], dtype=np.float64)

print(x)
print()
print(y)

<p style="font-family: Arial; font-size:1.5em;color:#2462C0; font-style:bold"><br>
Adding numbers
</p>

In [None]:
print(x + y)        

In [None]:
print(np.add(x, y))

<p style="font-family: Arial; font-size:1.5em;color:#2462C0; font-style:bold"><br>
Substracting numbers
</p>

In [None]:
print(x - y)

In [None]:

print(np.subtract(x, y))

<p style="font-family: Arial; font-size:1.5em;color:#2462C0; font-style:bold"><br>
Multiplying numbers
</p>

In [None]:
print(x * y)

In [None]:
print(np.multiply(x, y))

<p style="font-family: Arial; font-size:1.5em;color:#2462C0; font-style:bold"><br>
Divide numbers
</p>

In [None]:
print(x / y)

In [None]:
print(np.divide(x, y))

<p style="font-family: Arial; font-size:1.5em;color:#2462C0; font-style:bold"><br>
Square root
</p>

In [None]:
print(np.sqrt(x))

<p style="font-family: Arial; font-size:1.5em;color:#2462C0; font-style:bold"><br>
Exponentiation
</p>

In [None]:
print(np.exp(x))

<p style="font-family: Arial; font-size:2.75em;color:#2462C0; font-style:bold"><br>
Statistical Methods, Sorting, and Set Operations:
<br><br>
</p>

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>
Basic Statistical Operations:
</p>

In [None]:
arr = 10 * np.random.randn(3,4)
print(arr)

The mean for all elements is

In [None]:
print(arr.mean())

Row wise mean



In [None]:
print(arr.mean(axis = 1))

Column wise mean

In [None]:
print(arr.mean(axis = 0))

Sum of elements

In [None]:
print(arr.sum())

Row wise Median

In [None]:
print(np.median(arr, axis = 1))

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>
Sorting:
</p>


In [None]:
unsorted = np.random.randn(15)

print(unsorted)

In [None]:
sorted = np.array(unsorted)
sorted.sort()

print(sorted)

Sorting in place

In [None]:
unsorted.sort() 

print(unsorted)

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>
Finding Unique elements:
</p>

In [None]:
array = np.array([1,2,1,4,2,1,4,2])

print(np.unique(array))

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>
Set Operations with np.array data type:
</p>

In [None]:
s1 = np.array(['data','science','AI'])
s2 = np.array(['ML','AI','data'])
print(s1, s2)

In [None]:
print( np.intersect1d(s1, s2) ) 

In [None]:
print( np.union1d(s1, s2) )

Element of s1 which are not in s2

In [None]:
print( np.setdiff1d(s1, s2) )

Element of s1 which is also in s2

In [None]:
print( np.in1d(s1, s2) )

<p style="font-family: Arial; font-size:2.75em;color:#2462C0; font-style:bold"><br>
Read or Write to Disk:
<br><br>
</p>

<p style="font-family: Arial; font-size:1.3em;color:#2462C0; font-style:bold"><br>
Binary Format:</p>

In [None]:
x = np.array([ 23.23, 24.24] )

In [None]:
np.save('an_array', x)

In [None]:
np.load('an_array.npy')

<p style="font-family: Arial; font-size:1.3em;color:#2462C0; font-style:bold"><br>
Text Format:</p>

In [None]:
np.savetxt('array.txt', X=x, delimiter=',')

In [None]:
np.loadtxt('array.txt', delimiter=',')

<p style="font-family: Arial; font-size:2.75em;color:#2462C0; font-style:bold"><br>
Additional Common ndarray Operations
<br><br></p>

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>
Dot Product on Matrices and Inner Product on Vectors:
</p>

Dot product of two matrices

In [None]:
x2d = np.array([[1,1],[1,1]])
y2d = np.array([[2,2],[2,2]])

In [None]:
print(x2d.dot(y2d))

In [None]:
print(np.dot(x2d, y2d))

inner product of two vectors

In [None]:
a1d = np.array([9 , 9 ])
b1d = np.array([10, 10])

In [None]:
print(a1d.dot(b1d))

In [None]:
print(np.dot(a1d, b1d))

Dot produce on an array and vector

In [None]:
print(x2d.dot(a1d))

In [None]:
print(np.dot(x2d, a1d))

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>
Sum:
</p>

sum elements in the array



In [None]:
ex1 = np.array([[10,12],[20,22]])

print(ex1)

In [None]:
print(np.sum(ex1))

Column wise sum

In [None]:
print(np.sum(ex1, axis=0))

Row wise sum

In [None]:
print(np.sum(ex1, axis=1))

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>
Element-wise Functions: </p>

Comparing two arrays values to get the maximum of each.

In [None]:
x1 = np.random.randn(10)
x1

In [None]:
y1 = np.random.randn(10)
y1

Element wise maximum between two arrays is

In [None]:

np.maximum(x1, y1)

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>
Reshaping array:
</p>

Values from 0 through 19 in an array

In [None]:
arr = np.arange(25)
print(arr)

Reshape to 5*5 matrix

In [None]:
arr.reshape(5,5)

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>
Transpose:

</p>

In [None]:
ex1 = np.array([[10,12],[20,22]])
print(ex1)
ex1.T

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>
Indexing using where():</p>

In [None]:
x1 = np.array([1,2,3,4,5])

y1 = np.array([10,21,35,44,50])

filter = np.array([True, False, True, False, True])

In [None]:
out = np.where(filter, x1, y1)
print(out)

In [None]:
matrix1 = np.random.rand(5,5)
matrix1

In [None]:
np.where( matrix1 > 0.5, 1000, -1)

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>
"any" or "all" conditionals:</p>

In [None]:
arr_bools = np.array([ True, False, True, True, False ])

In [None]:
arr_bools.any()

In [None]:
arr_bools.all()

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>
Random Number Generation:
</p>

Normal distribution

In [None]:
Y = np.random.normal(size = (2,5)) #2D array
print(Y)

In [None]:
np.random.normal(size=4) #1D array

Random intergers

In [None]:
Z = np.random.randint(low=2,high=50,size=4)
print(Z)

new ordering of elements in Z

In [None]:
np.random.permutation(Z)

Uniform distribution

In [None]:
np.random.uniform(size=4)

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>
Merging data sets:
</p>

In [None]:
K = np.random.randint(low=2,high=50,size=(2,2))
print(K)

M = np.random.randint(low=2,high=50,size=(2,2))
print("\n",M)

In [None]:
np.vstack((K,M))

In [None]:
np.hstack((K,M))

In [None]:
np.concatenate([K, M], axis = 0)

In [None]:
np.concatenate([K, M.T], axis = 1)

<p style="font-family: Arial; font-size:2.75em;color:#2462C0; font-style:bold"><br>
Speedtest: ndarrays vs lists
<br><br>
</p>

First setup paramaters for the speed test. We'll be testing time to sum elements in an ndarray versus a list.

In [None]:
from numpy import arange
from timeit import Timer

size    = 1000000
timeits = 1000

create the ndarray with values 0,1,2...,size-1

In [None]:
nd_array = arange(size)
print( type(nd_array) )

create the list with values 0,1,2...,size-1

In [None]:
a_list = list(range(size))
print (type(a_list) )

timer expects an operation as a parameter. Ex, Sum the numbers using nd.sum

In [None]:
timer_numpy = Timer("nd_array.sum()", "from __main__ import nd_array")

print("Time taken by numpy ndarray: %f seconds" % 
      (timer_numpy.timeit(timeits)/timeits))

In [None]:
timer_list = Timer("sum(a_list)", "from __main__ import a_list")

print("Time taken by list:  %f seconds" % 
      (timer_list.timeit(timeits)/timeits))

<p style="font-family: Arial; font-size:2.75em;color:#2462C0; font-style:bold"><br>
Broadcasting:Introduction to broadcasting
<br>
</p>


For more details:

https://docs.scipy.org/doc/numpy-1.10.1/user/basics.broadcasting.html

https://www.tutorialspoint.com/numpy/numpy_broadcasting.htm

In [None]:
import numpy as np

start = np.zeros((4,3))
print(start)

In [None]:
# create a rank 1 ndarray with 3 values
add_rows = np.array([1, 0, 2])
print(add_rows)

In [None]:
y = start + add_rows  # add to each row of 'start' using broadcasting
print(y)

In [None]:
# create an ndarray which is 4 x 1 to broadcast across columns
add_cols = np.array([[0,1,2,3]])
add_cols = add_cols.T

print(add_cols)

In [None]:
# add to each column of 'start' using broadcasting
y = start + add_cols 
print(y)

In [None]:
# this will just broadcast in both dimensions
add_scalar = np.array([1])  
print(start+add_scalar)

End of Document
***