# Essentials of NumPy

- Numpy is a third-party library that facilitates numerical computing in Python by providing users with a versatile N-dimensional ARRAY object for storing data, and powerful mathematical functions for operating on those arrays of numbers.
- NumPy implements its features in ways that are highly optimized, via a process known as 
### Vectorization

In [3]:
import numpy as np
import pandas as pd
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'
# 'all, last, last_expr(default), none, last_expr_or_assign'

## numpy.array

In [4]:
#one-dimensional
l1=[1,2,3]
# pd.DataFrame
arr1=np.array(l1)
arr1

array([1, 2, 3])

In [5]:
print(type(arr1))
arr1.shape

<class 'numpy.ndarray'>


(3,)

In [14]:
arr1.shape

(3,)

In [6]:
#two-dimensional
l2=[[1,'a','2021-01-09'],[3,'B','2022-03-14']] #nested list
arr2=np.array(l2)
arr2

array([['1', 'a', '2021-01-09'],
       ['3', 'B', '2022-03-14']], dtype='<U21')

In [7]:
arr2.shape

(2, 3)

In [9]:
l3=[[1,'a','2021-01-09'],[3,'B','2022-03-14'],[4,'X','2022-03-04']] #nested list
arr3=np.array(l3)
arr3
arr3.shape

array([['1', 'a', '2021-01-09'],
       ['3', 'B', '2022-03-14'],
       ['4', 'X', '2022-03-04']], dtype='<U21')

(3, 3)

In [10]:
# muti-dimensional
l3=[[[1,2,3,4],[5,6,7,8]]
    ,[[9,10,11,12],[13,14,15,16]]
   ,[[17,18,19,20],[21,22,23,24]]
   ]
arr3=np.array(l3)
arr3.shape

(3, 2, 4)

In [11]:
arr3

array([[[ 1,  2,  3,  4],
        [ 5,  6,  7,  8]],

       [[ 9, 10, 11, 12],
        [13, 14, 15, 16]],

       [[17, 18, 19, 20],
        [21, 22, 23, 24]]])

In [22]:
df = pd.DataFrame(l2)
df.columns = ['num','char','date']
# df.char.lower() error
df.char.str.lower()
pd.to_datetime(df.date).dt.strftime('%Y')

0    a
1    b
Name: char, dtype: object

0    2021
1    2022
Name: date, dtype: object

In [None]:
s='SavvyPro'
s.lower()

In [19]:
# for two dimentional array / nested list, you can also convert it into a dataframe
df = pd.DataFrame(l2)
df.columns = ['num','char','date']
# question: how do you change the column 1 to all lower case
df.char.str.lower()

#change column 2 to datetime values
df.date=pd.to_datetime(df.date)
df.dtypes

# take the year information from column 2 only
df.date.dt.strftime('%Y')

# why? Because a lot of methods/functions in Python can only be applied to a single value. 
# some self-defined functions are only designed to be applied to a single value
# to apply those functions, you have to use loop
# however, numpy provides us a great tool to apply those functions to a series of values without loop 

0    2021
1    2022
Name: date, dtype: object

## Vectorization
Vectorization is used to speed up the Python code without using loop. Using such a function can help in minimizing the running time.

<p><strong>Problems</strong>: to apply sigmoid function to a list / vector of values</p>
<p>$sigmoid(x) = \frac{1}{1+e^{-x}}$ is sometimes also known as the logistic function. It is a non-linear function used not only in Machine Learning (Logistic Regression), but also in Deep Learning.</p>

In [23]:
import math
math.exp(1)

2.718281828459045

In [24]:
def sigmoid(x):
    return 1/(math.exp(-x)+1)

In [25]:
sigmoid(0)

0.5

<p>Calculate&nbsp;$\sum_{i=1}^{1000000} sigmoid(i)$?</p>

In [28]:
#range(1,1001)

x=np.arange(1,10000001) #1-d array
x
#array range
sigmoid(x)

array([       1,        2,        3, ...,  9999998,  9999999, 10000000])

TypeError: only size-1 arrays can be converted to Python scalars

In [29]:
temp=0
for i in x:
    temp=temp+sigmoid(i)
temp

9999999.535836484

In [30]:
#np.vectorize function can convert a function into a vectorized version
vec_sigmoid = np.vectorize(sigmoid)
vec_sigmoid(x).sum()

9999999.535836484

### If you want to see faster calculation, find the vectorized version of the functions in Numpy
In fact, if $ x = (x_1, x_2, ..., x_n)$ is a row vector then $np.exp(x)$ will apply the exponential function to every element of x. The output will thus be: $np.exp(x) = (e^{x_1}, e^{x_2}, ..., e^{x_n})$</p>

In [31]:
def sigmoid2(x):
    return 1/(np.exp(-x)+1)

In [32]:
import time
start=time.time()

S=sigmoid2(x).sum()

end=time.time()
print('Calculation Result:',S,'\nTime spent: ',(end-start)*1000,'ms')

Calculation Result: 9999999.535836484 
Time spent:  45.75014114379883 ms


In [33]:
import time
start=time.time()

temp=0
for i in x:
    temp=temp+sigmoid(i)
temp

end=time.time()
print('Calculation Result:',S,'\nTime spent: ',(end-start)*1000,'ms')

9999999.535836484

Calculation Result: 9999999.535836484 
Time spent:  3019.1988945007324 ms


In [34]:
import time
start=time.time()

temp=0
for i in x:
    temp=temp+sigmoid(i)

end=time.time()
print('Calculation Result:',temp,'\nTime spent: ',(end-start)*1000,'ms')

Calculation Result: 9999999.535836484 
Time spent:  2981.4319610595703 ms


## Basic Mathematical Operations Using Arrays

In [34]:
X=np.array([[1,2],[3,4]])
X

array([[1, 2],
       [3, 4]])

In [41]:
# Y=np.array([[5,6],[7,8]])
Y=np.array([[5,6,7],[7,8,9]])
Y

array([[5, 6, 7],
       [7, 8, 9]])

<strong>Addition</strong>

In [42]:
X+Y

ValueError: operands could not be broadcast together with shapes (2,2) (2,3) 

<strong>Scalar multiplication</strong>

In [37]:
3*X

array([[ 3,  6],
       [ 9, 12]])

<strong>Element-wise multiplication</strong>

In [38]:
X*Y

array([[ 5, 12],
       [21, 32]])

<strong>Matrix multiplication</strong>

https://en.wikipedia.org/wiki/Matrix_(mathematics)#Addition,_scalar_multiplication,_and_transposition

In [39]:
np.dot(X,Y)

array([[19, 22],
       [43, 50]])

In [40]:
X@Y

array([[19, 22],
       [43, 50]])

## Array Broadcasting

Numpy provides a mechanism for performing mathematical operations on arrays of unequal shapes:

<p>Array Broadcasting is a mechanism used by Numpy to permit vectorized mathematical operations between arrays of unequal, but compatible shapes. Specifically, an array will be treated as if its contents have been replicated along the appropriate dimensions, such that the shape of this new, higher-dimensional array suits the mathematical operation being performed.</p>

Example 1- strech leftmost dimension

$\begin{pmatrix}
0&-0.1&-0.2&-0.3\\
-0.4&-0.5&-0.6&-0.7\\
-0.8&-0.9&-1.0&-1.1
\end{pmatrix} \times \begin{pmatrix}1&2&3&4\end{pmatrix} \rightarrow 
\begin{pmatrix}
0&-0.1&-0.2&-0.3\\
-0.4&-0.5&-0.6&-0.7\\
-0.8&-0.9&-1.0&-1.1
\end{pmatrix} \times  \begin{pmatrix}
1&2&3&4\\
1&2&3&4\\
1&2&3&4
\end{pmatrix} $  

$X: 3 \times 4$  
$Y: 1 \times 4 \rightarrow 3 \times 4$

In [43]:
# a shape-(3, 4) array
X = [[0 , -0.1, -0.2, -0.3],
     [-0.4, -0.5, -0.6, -0.7],
     [-0.8, -0.9, -1. , -1.1]]
X = np.array(X)
print(X.shape)
X

(3, 4)


array([[ 0. , -0.1, -0.2, -0.3],
       [-0.4, -0.5, -0.6, -0.7],
       [-0.8, -0.9, -1. , -1.1]])

In [44]:
Y=[1,2,3,4]
Y=np.array(Y)
Y.shape

(4,)

In [43]:
X*Y

array([[ 0. , -0.2, -0.6, -1.2],
       [-0.4, -1. , -1.8, -2.8],
       [-0.8, -1.8, -3. , -4.4]])

Example 2- strech rightmost dimension

$\begin{pmatrix}
0&-0.1&-0.2&-0.3\\
-0.4&-0.5&-0.6&-0.7\\
-0.8&-0.9&-1.0&-1.1
\end{pmatrix} \times \begin{pmatrix}1\\2\\3\end{pmatrix} \rightarrow 
\begin{pmatrix}
0&-0.1&-0.2&-0.3\\
-0.4&-0.5&-0.6&-0.7\\
-0.8&-0.9&-1.0&-1.1
\end{pmatrix} \times  \begin{pmatrix}
1&1&1&1\\
2&2&2&2\\
3&3&3&3
\end{pmatrix} $  

$X: 3 \times 4$  
$Y: 3 \times 1 \rightarrow 3 \times 4$

In [45]:
Y=[1,2,3]
Y=np.array(Y)
print(Y)
Y=Y.reshape(3,1)
print(Y.shape)
Y

[1 2 3]
(3, 1)


array([[1],
       [2],
       [3]])

In [45]:
X*Y

array([[ 0. , -0.1, -0.2, -0.3],
       [-0.8, -1. , -1.2, -1.4],
       [-2.4, -2.7, -3. , -3.3]])

# Summary of Numpy

- defined a new object: numpy.array
    - np.array method to convert list / nested list into array
    - similar to pandas.DataFrame
    - array.shape to see the dimentions
- Vectorization 
    - Vectorizations allows you to apply simple function to a vector / list / series of values
    - numpy.vectorize() converts a simple function to the vectorized version
    - numpy has a lot built-in functions that are already vectorized and optimized: np.exp
            https://numpy.org/doc/stable/reference/routines.math.html#
    - in Pandas, dataframe, you can use accessor, for example:
        - df.str.lower(), df.dt.strftime(%Y-%m)
- Array Operations
    - same dimention: + - * / 
    - matrix mutiplication: dot, @
- Array broadcast
    - stretch to the leftmost dimention (this is equivalent to operating on each "column")
        - this is by default
    - stretch to the rightmost dimention (this is equivalent to operating on each "row")
        - you have to reshape the one dimention array into multiple rows