<img src='../images/wcd_logo.png' width='50%'>
-------------

<center> <h1> Python for Data Science </h1> </center>
<br>
<center><h2> Lab 1: Numpy for Numeric Computation </h2> </center>
<center><img src='../images/numpy_logo.png' width='20%'> </center>

<br>
<center align="left"> Developed by: </center>
<center align="left"> WeCloudData Academy </center>



----------


Numpy is the core library for scientific computing in Python. It provides a high-performance **multidimensional array** object, and tools for working with these arrays. If you are already familiar with MATLAB, you might find this lab useful to get started with Numpy.

### Reference
 * Numpy Reference http://docs.scipy.org/doc/numpy-dev/reference/
 
> While the Python language is an excellent tool for general-purpose programming, with a highly readable syntax, rich and powerful data types (**`strings, lists, sets, dictionaries, arbitrary length integers, etc`**) and a very comprehensive standard library, it was not designed specifically for mathematical and scientific computing.  ***Neither the language nor its standard library have facilities for the efficient representation of multidimensional datasets, tools for linear algebra and general matrix manipulations *** (an essential building block of virtually all technical computing), nor any data visualization facilities.

> In particular, Python lists are very flexible containers that can be nested arbitrarily deep and which can hold any Python object in them, but they are poorly suited to represent efficiently common mathematical constructs like vectors and matrices.  In contrast, much of our modern heritage of scientific computing has been built on top of libraries written in the Fortran language, which has native support for vectors and matrices as well as a library of mathematical functions that can efficiently operate on entire arrays at once.


### Table of Contents
* 1 - Why Numpy? 
* 2 - Numpy Array Objects
  * 2.1 Array Objects
  * 2.2 Array Creation (Instantiation)
  * 2.3 Array Attributes
  * 2.4 Manipulating Arrays - Slicing, Indexing, Iteration
  * 2.5 Array Operations - Universal Function "unfuc"
  * 2.6 Basic Array Methods
* 3 - Numpy Routines
  * 3.1 Array Manipulation
  * 3.2 Linear Algebra
  * 3.3 Math
  * 3.4 Statistics

# $\Delta$ 1. Why Numpy? 

### Numpy Array vs. Python List
> - (**memory efficenciency**) Numpy's arrays are more compact than lists 
- (**convenience**) Array is more convenient. It allows you to work with vector and matrix operations more efficiently 
- (**speed**) Numpy array operation is much faster than list
- (**functionality**) Numpy array has a lot of built-in functions to work with fast searching, basic stats, linear algebra, histogram etc.

---

# $\Delta$ 2. Numpy Arrays and Attributes

### N-dimensional Array Objects
>* The primary building block of the numpy module is the class "ndarray" - a powerful array 
* A ndarray object represents a multidimensional, **homogeneous** array of **fixed-sized** items. An associated date-type object describes the format of each element in the array. 

## 2.1 Creating a numpy array (ndarray)

A numpy array is a grid of values, all of the same type, and is indexed by a tuple of nonnegative integers. The number of dimensions is the `rank` of the array; the `shape` of an array is a tuple of integers giving the size of the array along each dimension.

<img src='../images/arrays.png' width='50%'>

### Rank-1 Array (1-D Array)

In [3]:
import numpy as np

a = np.array([1, 2, 3])  # Create a rank 1 array
print (type(a))           # Prints "<type 'numpy.ndarray'>"
print (a.shape)            # Prints "(3,)"
print (a[0], a[1], a[2])   # Prints "1 2 3"

a[0] = 5                 # Change an element of the array
print (a)                  # Prints "[5, 2, 3]"

<class 'numpy.ndarray'>
(3,)
1 2 3
[5 2 3]


### Rank-2 Array (2-D Array)
<img src='../images/2darrayaxes.png' width='20%' align='left'>


In [4]:
b = np.array([[1,2,3],[4,5,6]])     # Create a rank 2 array
print (b)
print (b.shape)                     # Prints "(2, 3)"
print (b[0, 0], b[0, 1], b[1, 0] )  # Prints "1 2 4"

[[1 2 3]
 [4 5 6]]
(2, 3)
1 2 4


### Rank-3 Array (3-D Array)
<img src='../images/3darrayaxes.png' width='30%' align='left'>


In [8]:
c=np.array([[[ 0,  1,  2,  3], [ 4,  5,  6,  7], [ 8,  9, 10, 11]], 
          [[12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]])

In [10]:
print(c.shape)

(2, 3, 4)


### Other Functions to Create Arrays

In [11]:
# Create an array of ones
np.ones((3,4))

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [12]:
# Create an array of zeros
np.zeros((2,3,4),dtype=np.int16)

array([[[0, 0, 0, 0],
        [0, 0, 0, 0],
        [0, 0, 0, 0]],

       [[0, 0, 0, 0],
        [0, 0, 0, 0],
        [0, 0, 0, 0]]], dtype=int16)

> Each type of integer has a different range of storage capacity
  * Int16 -- (-32,768 to +32,767)
  * Int32 -- (-2,147,483,648 to +2,147,483,647)
  * Int64 -- (-9,223,372,036,854,775,808 to +9,223,372,036,854,775,807)

In [15]:
# Create random matrix/arrays
# 1. Return random floats  in the half-open interval [0.0, 1.0), with 2 by 2
print(np.random.random((2,2))) 
print("-"*5)

# 2. Return a sample (or samples) from the “standard normal” distribution.s
# The standard normal distribution is a normal distribution with a mean of 0 and a standard deviation of 1.
print(np.random.randn(5))
print("-"*5)

# 3. Return random integer from low to upper bound with size.
print(np.random.randint(low=0, high=10, size=5))

[[0.71188768 0.77488977]
 [0.62029615 0.02222538]]
-----
[-1.48596596  1.6948051  -0.84034971 -0.32871307  0.37377763]
-----
[3 7 8 3 2]


In [16]:
# Create an empty array
np.empty((3,2))

array([[0., 0.],
       [0., 0.],
       [0., 0.]])

In [17]:
# Create a full array
np.full((2,2),7)

array([[7, 7],
       [7, 7]])

In [22]:
# Create an array of evenly-spaced values with step size
np.arange(10,25,5)

array([10, 15, 20])

In [24]:
# Create an array of evenly-spaced values with number of samples
np.linspace(10,25,5)

array([10.  , 13.75, 17.5 , 21.25, 25.  ])

In [25]:
# Create an identity matrices
np.eye(4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

## 2.2 Inspecting your arrays

The following provide basic information about the size, shape and data in the array:

In [28]:
## make an array, then reshape the size
arr1 = np.arange(8)
arr2 = np.arange(8).reshape(2,4)
print(arr1)
print(" "*5)
print(arr2)

[0 1 2 3 4 5 6 7]
     
[[0 1 2 3]
 [4 5 6 7]]


In [30]:
## display basic info of an array
print ('Data type                :', arr2.dtype)
print ('Total number of elements :', arr2.size)
print ('Number of dimensions     :', arr2.ndim)
print ('Shape (dimensionality)   :', arr2.shape)
print ('Memory used (in bytes)   :', arr2.nbytes)

Data type                : int64
Total number of elements : 8
Number of dimensions     : 2
Shape (dimensionality)   : (2, 4)
Memory used (in bytes)   : 64
8


----
## Exercise:

#### Problem 1: Create an array from standard normal distribution with size 20 and sort it (by descending order).

In [63]:
######please insert your code here######







[3.216633564560564, 1.5210978940016695, 1.281388067465985, 1.1472953650403217, 1.0137382204785645, 0.8285305395134703, 0.7079616758452163, 0.3692023263560513, 0.2789525017751559, -0.033506178009087687, -0.08974011892354057, -0.0900526700144249, -0.25985808217328826, -0.4969947585882599, -0.7011280133868008, -0.8528049881945741, -1.6171944137193124, -1.918431433837055, -2.0358770637012693, -2.0652645788975628]


#### Problem 2: Create a 5*5 array with 6 on the border and 0 inside.
> Hint:output is expected to be:  
  
>  [[ 6.  6.  6.  6.  6.]  
>   [ 6.  0.  0.  0.  6.]  
>   [ 6.  0.  0.  0.  6.]  
>   [ 6.  0.  0.  0.  6.]  
>   [ 6.  6.  6.  6.  6.]])  

In [64]:
######please insert your code here######







[[6. 6. 6. 6. 6.]
 [6. 0. 0. 0. 6.]
 [6. 0. 0. 0. 6.]
 [6. 0. 0. 0. 6.]
 [6. 6. 6. 6. 6.]]


-----
## 2.3 Array Data Types


> Arrays can hold (almost) any type of data, as long as ** each individual element is identical** (i.e., requires the same amount of memory). The format of the ndarray can be specified with the "dtype" attribute. Individual elements may be "named" in a structured array.



> For more on numpy data types, refer to [https://docs.scipy.org/doc/numpy/user/basics.types.html](https://docs.scipy.org/doc/numpy/user/basics.types.html)


### Common Data Types
#### Integers:
int8, int16, int32, int64	 

#### Unsigned integers:
uint8, uint16,  uint32, uint64	 

#### Floating-point numbers:
float16, float32, float64, float96, float128

#### Objects
object_	any Python object	'O'



In [31]:
# type of an array element
zero = np.zeros((1,3), dtype='int32')

print ("zero: ", zero)
print ("object type is: ", type(zero))
print ("Array zero's data type is: ", zero.dtype)

zero:  [[0 0 0]]
object type is:  <class 'numpy.ndarray'>
Array zero's data type is:  int32


In [36]:
# short dtype notations
dt1 = np.dtype("int32")
dt2 = np.dtype("i")

dt3 = np.dtype("float32")
dt4 = np.dtype("f")

dt5 = np.dtype("object")
dt6 = np.dtype("O")

In [40]:
dt5

dtype('O')

---
# $\Delta$ 3. Array Manipulation: Indexing, Slicing, Iteration

> * ndarray objects can be indexed, sliced, and iterated over much like **lists**

## 3.1 Array Indexing & Slicing
Assigning to and accessing the elements of an array is similar to other sequential data types of Python, i.e. lists and tuples. We have also many options to indexing, which makes indexing in Numpy very powerful and similar to core Python.

### Create a 4x3 numpy array

<img src='../images/array_3x4.png' width='20%' align='left'>

In [49]:
import numpy as np

a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])

print(a)
print()
print(a.shape)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

(3, 4)


### Slicing the array

<img src='../images/array_3x4_1.png' width='20%' align='left'>

In [None]:
# Use slicing to pull out the subarray consisting of the first 2 rows
# and columns 1 and 2; b is the following array of shape (2, 2):
b = a[:2, 1:3]
b

### Modifying an array
> Note: A slice of an array is a view into the same data, so modifying it will modify the original array.

<img src='../images/array_3x4_2.png' width='20%' align='left'>

In [50]:
print (a)  
a[0, 1] = 77    # value at b[0, 1] gets updated from 2 to 77
print (" "*5)
print (a) 

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
     
[[ 1 77  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


### Integer Array Indexing

<img src='../images/array_3x4_3.png' width='20%' align='left'>

In [53]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
print(a)
print(" ")

# An example of integer array indexing.
# The returned array will have shape (3,) and 
print (a[[0, 1, 2], [0, 2, 0]])  # Prints "[1 7 9]"
print(" ")

# The above example of integer array indexing is equivalent to this:
print (np.array([a[0, 0], a[1, 2], a[2, 0]]))  # Prints "[1 4 5]"

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
 
[1 7 9]
 
[1 7 9]


### Mixing integer indexing with slice indexing
You can also mix integer indexing with slice indexing. However, doing so will yield an array of lower rank than the original array. 
> Mixing integer indexing with slices yields an array of lower rank, while using only slices yields an array of the same rank as the original array

#### Row slicing

In [54]:
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])

row_r1 = a[1, :]    # Rank 1 view of the second row of a  
row_r2 = a[1:2, :]  # Rank 2 view of the second row of a
print ("Output array: {0}, Output rank: {1}".format(row_r1, row_r1.shape))  # Prints "[5 6 7 8] (4,)"
print ("Output array: {0}, Output rank: {1}".format(row_r2, row_r2.shape))  # Prints "[[5 6 7 8]] (1, 4)"


Output array: [5 6 7 8], Output rank: (4,)
Output array: [[5 6 7 8]], Output rank: (1, 4)


### Boolean Array Indexing

<img src='../images/array_3x4_4.png' width='20%' align='left'>

In [58]:
import numpy as np

a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])

bool_idx = (a > 6)  # Find the elements of a that are bigger than 6;
                    # this returns a numpy array of Booleans of the same
                    # shape as a, where each slot of bool_idx tells
                    # whether that element of a is > 6.
            
print (bool_idx)     

# We use boolean array indexing to construct a rank 1 array
# consisting of the elements of a corresponding to the True values
# of bool_idx
print (a[bool_idx]) 


[[False False False False]
 [False False  True  True]
 [ True  True  True  True]]
[ 7  8  9 10 11 12]


> Event better, We can do all of the above in a single concise statement

In [61]:
print (a[a > 6])

[ 7  8  9 10 11 12]


## Exercise:
#### Problem 1: Generate a random 4 elements vector, then build a new vector with 2 consecutive zeros interleaved between each value. 

> [Hint: answer is expected as:[1,0,0,3,0,0,5,0,0,7]]

In [1]:
######please insert your code here#######





#### Problem 2: Generate a random 3 by 3 array and swap first 2 rows.


In [73]:
######please insert your code here#######







[[0 1 2]
 [3 4 5]
 [6 7 8]]

[[3 4 5]
 [0 1 2]
 [6 7 8]]


----

# $\Delta$ 4. Array Operations - Universal Function "unfuc"

> A universal function (or **"ufunc"** for short) is a function that operates on ndarrays in an **_element-by-element_** fashion, supporting array broadcasting, type casting, and several other standard features. That is, a ufunc is a **“vectorized” wrapper** for a function that takes a fixed number of scalar inputs and produces a fixed number of scalar outputs. 

> > **_Universal functions run much faster than for loops, which should be avoided whenever possible._**

* Examples include:
  * add
  * subtract
  * multiply
  * exp
  * log
  * power


## 4.1 Basic Operations

In [74]:
import numpy as np

x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)

# Elementwise sum; both produce the array
print (x + y)
print (np.add(x, y))
print("")

# Elementwise difference; both produce the array
print (x - y)
print (np.subtract(x, y))
print("")

# Elementwise product; both produce the array
print (x * y)
print (np.multiply(x, y))
print("")

# Elementwise division; both produce the array
print (x / y)
print (np.divide(x, y))
print("")

# Elementwise square root; produces the array
print (np.sqrt(x))

[[ 6.  8.]
 [10. 12.]]
[[ 6.  8.]
 [10. 12.]]

[[-4. -4.]
 [-4. -4.]]
[[-4. -4.]
 [-4. -4.]]

[[ 5. 12.]
 [21. 32.]]
[[ 5. 12.]
 [21. 32.]]

[[0.2        0.33333333]
 [0.42857143 0.5       ]]
[[0.2        0.33333333]
 [0.42857143 0.5       ]]

[[1.         1.41421356]
 [1.73205081 2.        ]]


> **_Universal functions run much faster than for loops, which should be avoided whenever possible._**

#### Matrix Element-wise Multiply vs Python For Loop

In [84]:
# matrix mulitplication
a = np.random.random((100,100))
b = np.random.random((100,100))

# Method 1: matrix multiplication using universal function
def mult1(a,b):
    return a * b

# Method 2: matrix multiplication using loops -  !!!should always avoid for loops!!
def mult1oop(a,b):
    c = np.empty(a.shape)
    
    for i in range(a.shape[0]):
        for j in range(a.shape[1]):
            c[i,j] = a[i,j] * b [i,j]
            
    return c

In [85]:
import numpy as np
import timeit

%timeit mult1(a,b) #Matrix Element-wise Multiply won!!!

7.72 µs ± 770 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [86]:
%timeit mult1oop(a,b)

4.94 ms ± 85.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


> One millisecond is 10^(-3)s; one microsecond is 10^(-6)s

-----
# $\Delta$ 5. Array Methods

#### *List of avaiable numpy array methods*
    arr.T             arr.copy          arr.getfield      arr.put           arr.squeeze
    arr.all           arr.ctypes        arr.imag          arr.ravel         arr.std
    arr.any           arr.cumprod       arr.item          arr.real          arr.strides
    arr.argmax        arr.cumsum        arr.itemset       arr.repeat        arr.sum
    arr.argmin        arr.data          arr.itemsize      arr.reshape       arr.swapaxes
    arr.argsort       arr.diagonal      arr.max           arr.resize        arr.take
    arr.astype        arr.dot           arr.mean          arr.round         arr.tofile
    arr.base          arr.dtype         arr.min           arr.searchsorted  arr.tolist
    arr.byteswap      arr.dump          arr.nbytes        arr.setasflat     arr.tostring
    arr.choose        arr.dumps         arr.ndim          arr.setfield      arr.trace
    arr.clip          arr.fill          arr.newbyteorder  arr.setflags      arr.transpose
    arr.compress      arr.flags         arr.nonzero       arr.shape         arr.var
    arr.conj          arr.flat          arr.prod          arr.size          arr.view
    arr.conjugate     arr.flatten       arr.ptp           arr.sort   

## 5.1 Array Statistics

### Simple aggregation

In [89]:
import numpy as np

x = np.array([[1,2],[3,4]])

print (np.sum(x))  # Compute sum of all elements; prints "10"
print (np.sum(x, axis=0))  # Compute sum of each column; prints "[4 6]"
print (np.sum(x, axis=1))  # Compute sum of each row; prints "[3 7]"

10
[4 6]
[3 7]


### Calculate min(), max(), standard deviation

In [93]:
arr = np.arange(8).reshape(2,4)
print(arr)
print("")

print ('Minimum and maximum             : {}, {}'.format(arr.min(), arr.max()))
print ('Sum and product of all elements : {}, {}'.format(arr.sum(), arr.prod()))
print ('Mean and standard deviation     : {}, {}'.format(arr.mean(), arr.std()))

[[0 1 2 3]
 [4 5 6 7]]

Minimum and maximum             : 0, 7
Sum and product of all elements : 28, 0
Mean and standard deviation     : 3.5, 2.29128784747792


### Cumulative sum

In [94]:
arr.cumsum()

array([ 0,  1,  3,  6, 10, 15, 21, 28])

### Array operations along an axis

In [95]:
print ('For the following array:\n', arr)
print ('The sum of elements along the rows is    :', arr.sum(axis=1))
print ('The sum of elements along the columns is :', arr.sum(axis=0))

For the following array:
 [[0 1 2 3]
 [4 5 6 7]]
The sum of elements along the rows is    : [ 6 22]
The sum of elements along the columns is : [ 4  6  8 10]


### Get mean value


In [96]:
a = np.array([[1, 2], [3, 4]])
print (np.mean(a))
print (np.mean(a, axis=0))  ## average along y-axis
print (np.mean(a, axis=1))  ## average along x-axis
print (a.mean())

2.5
[2. 3.]
[1.5 3.5]
2.5
