### What is Numpy 

NumPy, short for Numerical Python, is one of the most important foundational packages for numerical computing in Python. Most computational packages providing
scientific functionality use NumPy’s array objects as the lingua franca for data
exchange.

#### What numpy provides: 

* ndarray, an efficient multidimensional array providing fast array-oriented arithmetic operations
* Mathematical functions for fast operations on entire arrays of data without having to write loops. 
* Linear algebra, random number generation 
* A C API for connecting NumPy with libraries written in C, C++ 

<img src="files/otherlibs.png"/>

#### Why using It instead of python's list 


* Provide matrix and vector operations 
* It's written in C optimized code. Much more faster and memory efficient.(up to hundreds time faster and ten's time more memory efficient)
* provides an easy-to-use C API, it is straightforward to pass data to external libraries written in a low-level language and also for external libraries to return data to Python as NumPy arrays.

### Difference between numpy array and python's list 
<img src="files/array_vs_list.png"/>


### How numpy works 

* NumPy internally stores data in a contiguous block of memory, independent of other built-in Python objects. NumPy’s library of algorithms written in the C language can operate on this memory without any type checking or other overhead. NumPy arrays also use much less memory than built-in Python sequences.
* NumPy operations perform complex computations on entire arrays without the
    need for Python for loops. 
    


### The NumPy ndarray: A Multidimensional Array Object 

* One of the key features of numpy is its N-dimensional array object, or *ndarray*, which is a fast, flexible container for large datasets in Python. 
* Arrays enable to perform mathematical operations on whole blocks of data using similar syntax to the equivalent operations between scalar elements. 



Importing numpy module and prefixing it by *np*:

In [None]:
import numpy as np

### Creating ndarray's 
* Easies way: using *array* function

First make python list:

In [None]:
data1 = [4, 5.3, 8, 9, 12]

In [None]:
arr1 = np.array(data1)

Nested sequence :

In [None]:
nested_seq = [[1, 2, 3], [4, 5, 6]]

In [None]:
nested_arr = np.array(nested_seq)

### Some properties of numpy array's (ndarray) 
* An ***ndarray*** is a generic multidimensional container for homogeneous data : all of the elements must be the same type
* Every array has a ***shape*** : a tuple indicating the size of each dimension 
* Every array has a ***dtype*** : an object describing the data type of the array 

In [None]:
nested_arr.ndim

In [None]:
nested_arr.shape

In [None]:
nested_arr.dtype

### Some properties of numpy array's (ndarray) 
* Unless explicitly specified, ***np.array*** tries to infer a good data type for the array that it creates. The data type is stored ***dtype***


<h4> There are a number of other functions for creating new
arrays: </h4>
<ul>
    <li> <b>zeros</b> : create arrays of 0s </li>
    <li> <b>ones</b> : create arrays of 1s </li>
    <li> <b>empty</b> : creates an array without initializing its values to any particular value.</li>
    <li> <b>arange</b> : Like the built-in <b>range</b> but returns an ndarray instead of a list</li>
    <li> <b> ones_like </b> : produces a ones array of the same shape and dtype </li>
    <li> <b> zeros_like </b> : Like <b>ones_like</b> but for zeros </li>
    <li> <b> full </b> : Produce an array of the given shape and dtype with all values set to the indicated “fill value”</li>
    <li> <b> eye</b>, <b>identity </b> : Create a square N × N identity matrix (1s on the diagonal and 0s elsewhere) 
</ul>

In [None]:
np.zeros((4,3))

In [None]:
np.ones((4,3))

In [None]:
np.empty((4,6))

<ul>
    <li><b>empty</b>, unlike <b>zeros</b>, does not set the array values to zero, and may therefore be marginally faster. <br></li>
    <li><b>empty</b> has nothing to do with creating an array that is "empty" in the sense of having no elements. It just means the array doesn't have its values initialized (i.e., they are unpredictable and depend on whatever happens to be in the memory allocated for the array).</li>
</ul>

In [None]:
np.full((3,4), 4)

In [None]:
np.ones_like(nested_arr)

In [None]:
arr = np.array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

In [None]:
np.identity(3)

In [None]:
np.arange(10)

In [None]:
np.arange(2, 12)

In [None]:
np.arange(2, 12, .5)

#### The data type or ***dtype*** is a special object containing the information the ndarray needs to interpret a chunk of memory as a particular type of data

<h3> Numpy data types </h3>
<ul>
    <li> <b> int8, uint8 </b> : Signed and unsigned 8-bit (1 byte) integer types </li>
    <li> <b> int16, uint16 </b> : Signed and unsigned 16-bit integer types </li>
    <li> <b> int32, uint32 </b> : Signed and unsigned 32-bit integer types </li>
    <li> <b> int64, uint64 </b> : Signed and unsigned 64-bit integer types </li>
    <li> <b> float16 </b> : Half-precision floating point </li>
    <li> <b> float32 </b> : Standard single-precision floating point; compatible with C float </li>
    <li> <b> float64 </b> : Standard double-precision floating point; compatible with C double and
    Python float object </li>
    <li> <b> float128 </b> : Extended-precision floating point </li>
    <li> <b> complex64, complex128 </b> : Complex numbers represented by two 32, 64, or 128 floats, respectively </li>
    <li> <b> bool </b> : Boolean type storing True and False values </li>
    <li> <b> object </b> : Python object type; a value can be any Python object </li>
    <li> <b> string_ </b> : Fixed-length ASCII string type (1 byte per character); for example, to create a
    string dtype with length 10, use 'S10' </li>
    <li> <b> unicode_ </b> : Fixed-length Unicode type (number of bytes platform specific); same
specification semantics as string_ (e.g., 'U10' ) </li>
</ul><br><br>




In [None]:
arr1 = np.array([1, 2, 3, 4], dtype=np.float32)
arr1

In [None]:
arr1.dtype

In [None]:
arr2 = np.array([1.2, 3, -0.3], dtype=np.int32)
arr2

#### Convert array's dtype with ***astype*** method 

In [None]:
arr = np.array([1, 2, 3, 4, 5], dtype=np.int64)

In [None]:
float_arr = arr.astype(np.float32)
float_arr

In [None]:
arr = np.array([1.3, -0.6, 4, 5.8, -1.9], dtype= np.float64)
arr

In [None]:
int_arr = arr.astype(np.int64)
int_arr

In [None]:
string_arr = np.array(['ab', '1' , 'fj'], dtype=np.string_)

In [None]:
string_arr.astype(np.int64)

### Arithmetic with NumPy Arrays 
* Arrays are important because they enable you to express batch operations on data without writing any for loops. NumPy users call this vectorization. Any arithmetic operations between equal-size arrays applies the operation element-wise

In [None]:
arr = np.array([[1, 2 , 3], [4, 5, 6]], dtype=np.float64)
arr

In [None]:
arr * arr

In [None]:
arr - arr

In [None]:
arr ** 5

In [None]:
1 / arr

Comparisons between arrays of the same size yield boolean arrays

In [None]:
arr2 = np.array([[0, 5, 1], 
                 [6, 5 , 10]], dtype=np.float64)

In [None]:
print('arr: ', arr, '\narr2: ',arr2)

In [None]:
arr2 > arr

In [None]:
arr == arr2

### Broadcasting 
* Operations between differently sized arrays is called ***broadcasting*** 
* Broadcasting is the process of making arrays with different shapes have compatible shapes for arithmetic operations

In [None]:
arr * 4

In [None]:
arr + 4

In [None]:
broad_array = np.array([[2], [4]])
print(broad_array)
print(broad_array.shape)

In [None]:
arr * broad_array

<h3> Indexing and Slicing</h3>
<ul>
    <li> <b>arr[start:stop:step]</b> for 1d arrays </li>
    <li> <b>arr[start:stop:step, start:stop:step, ....]</b> for more than  1d arrays </li> 
    <li> <b>arr[start:stop:step]</b> means: <b>arr[slice(start, stop, step)]</b> </li>
    <li> <b> arr[index] </b> means: <b>arr.__getitem__(index)</b>
    <li> <b>arr[index] = value </b> means: <b>arr.__setitem__(index, value)</b> </li>
<p> numpy use above facilities to enhance user experience </p><br>
<ul>

In [None]:
arr = np.arange(20)
arr

In [None]:
arr[4]

In [None]:
arr[-1]

In [None]:
arr[-3]

#### In numpy arrays and python's list [a:b] slicing from N index means [a,b) in math. or [a, b-1] 


In [None]:
arr[4:7]

In [None]:
arr[:6]

In [None]:
arr[-4:]

In [None]:
arr[2:-4:2]

In [None]:
arr[2] = 2020
arr

In [None]:
arr[3:6] = 3088
arr

#### Change values in sliced array, the mutations are reflected in the original array arr 




In [None]:
new_arr = arr[:6]
new_arr

In [None]:
new_arr[0] = 999
nested_arr

In [None]:
arr

#### Slicing in higher dimension arrays

#### Indexing elements in a NumPy array 
<img src="https://raw.githubusercontent.com/h8hawk/Datacamp-Scientific-Python/master/files/numpy_2darray.jpg"/>

In [None]:
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr2d

In [None]:
arr2d[1]

In [None]:
arr2d[2][0]

In [None]:
arr2d[2,0]

In [None]:
arr2d[:2, 1:] # Rows: start until 2  , columns: 1 until end

In [None]:
arr2d[:2, 2] 

In [None]:
arr2d[:, :1]

In [None]:
arr2d[:, 1]

###  Boolean Indexing
* This boolean array can be passed when indexing the array

data = np.random.randn(4, 4)
data

In [None]:
boolean_index = (data < -.5) | (data > .5)
boolean_index

In [None]:
data[boolean_index]

#### Fancy Indexing

In [None]:
arr = np.random.rand(8, 4)
arr

In [None]:
arr[[4, 3, 0, 6]]

In [None]:
arr[[1, 6], [3, 1]]

### Reshaping and Transposing Arrays and Swapping Axes 

In [None]:
arr = np.arange(15)
arr

In [None]:
arr.reshape((5, 3))

In [None]:
arr.reshape((4, 3))

In [None]:
arr

In [None]:
arr.T

In [None]:
x = np.array([[1,2,3]])
print(x)
x.shape

In [None]:
x.swapaxes(0,1)

In [None]:
x = np.arange(24).reshape((2, 3, 4))
print(x)
x.shape

In [None]:
x.swapaxes(0, 2).shape

### Universal Functions: Fast Element-Wise Array Functions 

* A universal function, or ***ufunc***, is a function that performs element-wise operations on data in ndarrays

In [None]:
arr = np.arange(10)
arr

In [None]:
np.sqrt(arr)

np.exp(arr)

In [None]:
np.mean(arr)

In [None]:
np.min(arr)

In [None]:
np.max(arr)

In [None]:
np.absolute(arr)

These are referred to as ***unary*** ufuncs. Others, such as add or maximum , take two arrays
(thus, ***binary*** ufuncs) and return a single array as the result

In [None]:
x = np.random.randn(6)
x

In [None]:
y = np.random.randn(6)
y

In [None]:
np.maximum(x, y)

<h3> Unary universal functions:</h3>
<ul>
    <li><b>abs, fabs</b> : Compute the absolute value element-wise for integer, floating-point, or complex values</li>
    <li><b>sqrt</b> : Compute the square root of each element (equivalent to arr ** 0.5 )</li>
    <li><b>square</b> : Compute the square of each element (equivalent to arr ** 2 )</li>
    <li><b>exp</b> : Compute the exponent e x of each element</li>
        <li><b>log, log10, log2, log1p</b> : Natural logarithm (base e), log base 10, log base 2, and log(1 + x),  respectively</li>
    <li><b>sign</b> : Compute the sign of each element: 1 (positive), 0 (zero), or –1 (negative)</li>
    <li><b>ceil</b> : Compute the ceiling of each element (i.e., the smallest integer greater than or equal to that
number) </li>
    <li><b>floor</b> : Compute the floor of each element (i.e., the largest integer less than or equal to each element) </li>
    <li><b>round</b> : Round elements to the nearest integer </li>
    <li><b>modf</b> : Return fractional and integral parts of array as a separate array</li>
    <li><b>isnan</b> : Return boolean array indicating whether each value is NaN (Not a Number) </li>
    <li><b>isfinite, isinf </b> : Return boolean array indicating whether each element is finite (non- inf , non- NaN ) or infinite,
respectively </li>
    <li><b>cos, cosh, sin, sinh, tan, tanh</b> : Regular and hyperbolic trigonometric functions</li>
    <li><b>arccos, arccosh, arcsin, arcsinh, arctan, arctanh</b> : Inverse trigonometric functions </li>
    <li><b>logical_not</b> : Compute truth value of not x element-wise (equivalent to ~arr ). </li>
</ul>

In [None]:
np.modf(x)

#### NaN
* NaN, standing for not a number, is a numeric data type value representing an undefined or unrepresentable value

In [None]:
nan = float('nan')

In [None]:
nan

In [None]:
nan == nan

In [None]:
from numpy import nan

In [None]:
type(nan)

In [None]:
arr = np.array([3, 5, nan])
arr

In [None]:
np.isnan(arr)

In [None]:
np.sqrt(np.array([1, -1]))

In [None]:
from numpy import inf

In [None]:
93883928484 < inf

In [None]:
-3883892838 > -inf

In [None]:
compar_y = y > .5
compar_y

In [None]:
~ compar_y 

In [None]:
np.logical_not(compar_y)

In [None]:
np.round(x)

In [None]:
y[-1] = inf
y

In [None]:
np.isfinite(y)

In [None]:
y

<h3> Binary universal functions: </h3>
<ul>
    <li><b>add</b> : Add corresponding elements in arrays</li>
    <li><b>subtract</b> : Subtract elements in second array from first array</li>
    <li><b>multiply</b> : Multiply array elements</li>
    <li><b>divide, floor_divide</b> : Divide or floor divide (truncating the remainder)</li>
    <li><b>power</b> : Raise elements in first array to powers indicated in second array</li>
    <li><b>maximum, fmax</b> : Element-wise maximum; fmax ignores NaN</li>
    <li><b>minimum, fmin</b> : Element-wise minimum; fmin ignores NaN</li>
    <li><b>mod</b> : Element-wise modulus (remainder of division)</li>
    <li><b>copysign</b> : Copy sign of values in second argument to values in first argument</li>
    <li><b>greater, greater_equal, less, less_equal, equal, not_equal</b> : Perform element-wise comparison, yielding boolean array (equivalent to infix
operators >, >=, <, <=, ==, !=)</li>
    <li><b>logical_and, logical_or, logical_xor</b> : Compute element-wise truth value of logical operation (equivalent to infix operators
& |, ^ )</li>

</ul>

In [None]:
x + y # np.add(x, y)

In [None]:
x - y #np.subtract(x, y)

In [None]:
x * y #np.multiply(x, y)

In [None]:
x / y #np.divide(x, y)

In [None]:
x // y #np.floor_divide(x, y)

In [None]:
x ** y # np.power(x, y)

In [None]:
x

In [None]:
x[0] = nan

In [None]:
np.maximum(x, y)

In [None]:
np.fmin(x, y)

In [None]:
arr1 = np.array([2, 3, 4])
arr2 = np.array([25, 32, 98])

In [None]:
arr1 % arr2 # np.mod(arr1, arr2)

In [None]:
np.copysign(x, y)

In [None]:
x[0] = 2

In [None]:
x >= y

In [None]:
x < y

In [None]:
x != y

In [None]:
a = np.array([True, False , True])
a.dtype

In [None]:
b = np.array([True, False, False])

In [None]:
a | b # np.logical_or

In [None]:
a & b # np.logical_and

In [None]:
a ^ b # np.logical_xor

### Array Oriented Programming
* express many kinds of data processing tasks as concise array expressions
* replacing explicit loops with array expressions is commonly referred to as ***vectorization***

In [None]:
X = np.random.randn(10)

In [None]:
Y = np.random.randn(10)

euclodian distance $
 = \begin{align} \sqrt{(X_1-Y_1)^2+(X_2-Y_2)^2+ .... +(X_n-Y_n)^2}     
\end{align}
$

In [None]:
euc_distance = np.sqrt(np.sum((X-Y)**2))

Manhattan distance $
 = \begin{align}  |X_1-Y_1| + |X_2-Y_2|+ .... +|X_n-Y_n|
\end{align}
$

In [None]:
manhatan_distance = np.sum(np.abs(X-Y))

<h3> Expressing Conditional Logic as Array Operations </h3>
<ul>
    <li> <b>np.where(condition, X, Y)</b> : vectorized version of the ternary expression x if condition else y </li>
    <li> <b>np.where(condition)</b>: return list of indexes that condition are true in.
</ul>

In [None]:
arr = np.random.randn(4, 4)
arr

In [None]:
np.where(arr > 0)

In [None]:
pos_numbers = np.ones((4, 4))
neg_numbers = np.full((4, 4), -1)

In [None]:
np.where(arr>0 , pos_numbers, neg_numbers)

In [None]:
np.where(arr>0 , 1, -1)

### Any and All

In [None]:
b = np.array([[True, False], [True, True]])
t = np.array([[True, True], [True, True]])

In [None]:
b.any() 

In [None]:
b.all()

In [None]:
t.all()

In [None]:
np.all([1, 2, -1])

In [None]:
np.all([1, 2, 0])

<h3> Mathematical and Statistical Methods </h3>
<ul>
    <li> <b>sum</b> : Sum of all the elements in the array or along an axis; zero-length arrays have sum 0</li>
    <li> <b>mean</b> : Arithmetic mean; zero-length arrays have NaN mean</li>
    <li> <b>std, var</b> : Standard deviation and variance </li>
    <li> <b>argmin, argmax</b> : Indices of minimum and maximum elements, respectively</li>
    <li> <b>cumsum</b> : Cumulative sum of elements starting from 0</li>
    <li> <b>cumprod</b> : Cumulative product of elements starting from 1</li>
<ul><br/>

In [None]:
x = np.random.randn(10)
x

In [None]:
x.std()

In [None]:
x.var()

In [None]:
x.argmin()

In [None]:
x.argmax()

In [None]:
a = np.arange(10)
a

In [None]:
a.cumsum()

In [None]:
np.cumproduct(a[1:])

<h3> How <b>axis</b> works in numy </h3>
<img src="files/numpy-arrays-have-axes.png"/>
<br>

* Axis 0 will act on all the ROWS in each COLUMN
* Axis 1 will act on all the COLUMNS in each ROW


In [None]:
arr = np.arange(20).reshape((4, 5))
arr

In [None]:
arr.sum()

In [None]:
arr.sum(axis=0)


In [None]:
arr.sum(axis=0).shape

In [None]:
arr.std(axis=1)

In [None]:
arr = np.arange(24).reshape((2, 3, 4))
arr.shape

In [None]:
arr.sum(axis=0).shape

In [None]:
arr.sum(axis=1).shape

<h3>Concatenating and Splitting Arrays</h3>
<ul>
    <li>numpy.concatenate takes a sequence (tuple, list, etc.) of arrays and joins them together in order along the input axis </li>
</ul>

In [None]:
arr1 = np.array([[1,2,3],[4,5,6]])
arr1

In [None]:
arr2 = np.array([[7,8,9],[10,11,12]])
arr2

In [None]:
np.concatenate([arr1, arr2] , axis=0)

In [None]:
np.concatenate([arr1, arr2])

In [None]:
np.concatenate([arr1, arr2], axis=1)

<p><b>vstack</b> and <b>hstack</b></p>

In [None]:
np.vstack((arr1, arr2))

In [None]:
np.hstack((arr1, arr2))

In [None]:
arr = np.arange(20)
arr

In [None]:
one, two, three = np.split(arr, [4,6])

In [None]:
one, two , three

<h3> Linear Algebra </h3>
<ul>
    <li> Multiplication, decompositions, determinants, and other square matrix math</li>
    <li> <b>numpy.dot</b> If both a and b are 2-D arrays, it is matrix multiplication, but using <b>np.matmul</b> or a @ b is preferred. </li>
<ul>

<h3> np.dot </h3>
<ul>
    <li>If both a and b are 1-D arrays, it is inner product of vectors (without complex conjugation)</li>
    <li>If both a and b are 2-D arrays, it is matrix multiplication, but using matmul or a @ b is preferred.</li>
    <li>If either a or b is 0-D (scalar), it is equivalent to multiply and using numpy.multiply(a, b) or a * b is preferred.</li>
    <li>If a is an N-D array and b is a 1-D array, it is a sum product over the last axis of a and b.</li>
    <li>If a is an N-D array and b is an M-D array (where M>=2), it is a sum product over the last axis of a and the second-to-last axis of b : <br>dot(a, b)[i,j,k,m] = sum(a[i,j,:] * b[k,:,m])</li>
</ul>

In [None]:
x = np.array([[1., 2., 3.], [4., 5., 6.]])

In [None]:
y = np.array([[6., 23.], [-1, 7], [8, 9]])

In [None]:
x.dot(y)

In [None]:
x @ y

In [None]:
np.dot([1,2], [1, 2])

### Pseudorandom Number Generation

* *pseudorandom* numbers because they are generated by an algorithm with deterministic behavior based on the seed of the random number generator

In [None]:
import random

In [None]:
random.random() 

From document:

"
Warning

The pseudo-random generators of this module should not be used for security purposes. For security or cryptographic uses, see the secrets module. "

In [None]:
np.random.seed(2019)

In [None]:
np.random.rand()

In [None]:
np.random.rand()

In [None]:
np.random.seed(2019)

In [None]:
np.random.rand()

In [None]:
np.random.rand()

### Performance
* *numba* : fast *jit* numpy aware compiler that recognize subset of python
* *cython* : Cython is an optimising static compiler for both the Python programming language and the extended Cython programming language
* *c-extension* : writing c code and bind it as python module with c-api


#### installation
pip install numba <br>
pip install cython

In [None]:
from numba import jit

In [None]:
def our_sum(arr:np.ndarray)->float:
    su = 0
    for i in arr:
        su += i
    return su

In [None]:
@jit
def jited_sum(arr:np.ndarray)->float:
    su = 0
    for i in arr:
        su += i
    return su

In [None]:
a = np.random.randn(10**8)

In [None]:
%time our_sum(a)

In [None]:
%time jited_sum(a)