<h3>    What is Numpy </h3>

NumPy, short for Numerical Python, is one of the most important foundational packages for numerical computing in Python. Most computational packages providing
scientific functionality use NumPy’s array objects as the lingua franca for data
exchange.

<h4> What numpy provides: </h4>

<ul>
    <li> ndarray, an efficient multidimensional array providing fast array-oriented arithmetic operations and flexible broadcasting capabilities. </li>
    <li> Mathematical functions for fast operations on entire arrays of data without having to write loops. </li>
    <li> Linear algebra, random number generation </li>
    <li> A C API for connecting NumPy with libraries written in C, C++ </li>
</ul>

<h4> Why using It instead of python's list </h4>

<ul>
    <li> Provide matrix and vector operations </li>
    <li> It's written in C optimized code. Much more faster and memory efficient.(up to hundreds time faster and ten's time more memory efficient)</li>
    <li>provides an easy-to-use C API, it is straightforward to pass data to external libraries written in a low-level language and also for external libraries to return data to Python as NumPy arrays.</li>
</ul>

<h3> Deference between numpy array and python's list </h3><br><br><br>
<img src="https://raw.githubusercontent.com/h8hawk/Datacamp-Scientific-Python/master/files/array_vs_list.png"/>

<h3> What provides in this courese? </h3>

<ul>
    <li> Fast vectorized array operations for data munging and cleaning, subsetting and
        filtering, transformation, and any other kinds of computations </li>
    <li> Common array algorithms like sorting, unique, and set operations </li>
    <li> Efficient descriptive statistics and aggregating/summarizing data </li>
    <li> Data alignment and relational data manipulations for merging and joining
        together heterogeneous datasets </li>
    <li> Expressing conditional logic as array expressions instead of loops with if-elif-
        else branches </li>
    <li> Group-wise data manipulations (aggregation, transformation, function applica‐
        tion)</li>
</ul>


<h3> How numpy works </h3>

<ul>
    <li> NumPy internally stores data in a contiguous block of memory, independent of
other built-in Python objects. NumPy’s library of algorithms written in the C language can operate on this memory without any type checking or other overhead. NumPy arrays also use much less memory than built-in Python sequences.</li>
    <li> NumPy operations perform complex computations on entire arrays without the
    need for Python for loops. </li>
    
</ul>


<h3> The NumPy ndarray: A Multidimensional Array Object </h3>

<ul> 
    <li> One of the key features of numpy is its N-dimensional array object, or <b>ndarray</b>, which is a fast, flexible container for large datasets in Python. </li>
    <li>Arrays enable to perform mathematical operations on whole blocks of data using similar syntax to the equivalent operations between scalar elements. </li>
</ul>




<h4> First look at numpy array: </h4>

Importing numpy module and prefixing it by <b>np</b>:

In [1]:
import numpy as np

Generate Some random data.  
<a href="https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.RandomState.html#numpy.random.RandomState">How numpy generate random numbers? </a>

In [2]:
data = np.random.randn(2,3)

In [3]:
data

array([[ 0.45993638, -1.16380581,  0.40610151],
       [ 2.75292933, -0.72998003,  0.23324967]])

<p> Mathematical operation with <b>data</b> </p>

In [4]:
data * 10

array([[  4.59936376, -11.63805806,   4.06101507],
       [ 27.52929328,  -7.29980028,   2.33249675]])

<p> all of the elements of <b>data</b> have been multiplied by 10. </p>

<h3> Some properties of numpy array's (ndarray) </h3>
<ul>
    <li> An ndarray is a generic multidimensional container for homogeneous data : all
        of the elements must be the same type. </li>
    <li> Every array has a <b>shape</b> : a tuple indicating the
        size of each dimension </li>
    <li> Every array has a <b>dtype</b> : an object describing the data type of the array </li>
</ul>

For 'data':

In [5]:
data.shape

(2, 3)

In [6]:
data.dtype

dtype('float64')

<h3> Creating ndarray's </h3>
<ul>
    <li> Easies way: using <b>array</b> function </li>
</ul>

First make python list:

In [7]:
data1 = [4, 5.3, 8, 9, 12]

In [8]:
arr1 = np.array(data1)

In [9]:
print(arr1)

[ 4.   5.3  8.   9.  12. ]


Nested sequence :

In [10]:
nested_seq = [[1, 2, 3], [4, 5, 6]]

In [11]:
nested_arr = np.array(nested_seq)

In [12]:
print(nested_arr)

[[1 2 3]
 [4 5 6]]


In [13]:
nested_arr.ndim

2

In [14]:
nested_arr.shape

(2, 3)

In [15]:
nested_arr.dtype

dtype('int64')

<h3> Some properties of numpy array's (ndarray) </h3>
<ul>
    <li> Unless explicitly specified, <b>np.array</b> tries to infer a good data
        type for the array that it creates. The data type is stored <b>dtype</b> </li>
</ul>
<br>

<h4> There are a number of other functions for creating new
arrays: </h4>
<ul>
    <li> <b>zeros</b> : create arrays of 0s </li>
    <li> <b>ones</b> : create arrays of 1s </li>
    <li> <b>empty</b> : creates an array without initializing its values to any particular value.</li>
    <li> <b>arange</b> : Like the built-in <b>range</b> but returns an ndarray instead of a list</li>
    <li> <b> ones_like </b> : produces a ones array of the same shape and dtype </li>
    <li> <b> zeros_like </b> : Like <b>ones_like</b> but for zeros </li>
    <li> <b> full </b> : Produce an array of the given shape and dtype with all values set to the indicated “fill value”</li>
    <li> <b> eye</b>, <b>identity </b> : Create a square N × N identity matrix (1s on the diagonal and 0s elsewhere) 
</ul>

In [16]:
np.zeros((4,3))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [17]:
np.ones((4,3))

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [18]:
np.empty((4,6))

array([[1.64596982e-316, 7.58445798e-302, 2.01160983e-309,
        9.77801185e+199, 2.44046782e-152, 1.27587196e-152],
       [1.13224202e+277, 1.33360307e+241, 1.87160029e-311,
        1.70182262e-123, 1.00678753e-312, 6.52094399e-308],
       [3.32653140e-111, 3.33481217e+079, 6.88488166e-313,
        4.85213860e-308, 2.27610565e-159, 2.64171898e-313],
       [1.27631071e-303, 5.12965257e+064, 5.58599989e+093,
        9.08367206e+223, 1.13224202e+277, 1.43485896e+161]])

<ul>
    <li><b>empty</b>, unlike <b>zeros</b>, does not set the array values to zero, and may therefore be marginally faster. <br></li>
    <li><b>empty</b> has nothing to do with creating an array that is "empty" in the sense of having no elements. It just means the array doesn't have its values initialized (i.e., they are unpredictable and depend on whatever happens to be in the memory allocated for the array).</li>
</ul>

In [19]:
np.full((3,4), 4)

array([[4, 4, 4, 4],
       [4, 4, 4, 4],
       [4, 4, 4, 4]])

In [20]:
np.ones_like(nested_arr)

array([[1, 1, 1],
       [1, 1, 1]])

In [21]:
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [22]:
np.arange(1,10,.5)

array([1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. , 5.5, 6. , 6.5, 7. ,
       7.5, 8. , 8.5, 9. , 9.5])

<h3>Data Types for ndarrays:</h3>
<ul>
    <li> The data type or <b>dtype</b> is a special object containing the information the ndarray needs to interpret a chunk of memory as a particular
type of data. </li>
</ul>

In [23]:
arr1 = np.array([1, 2, 3, 4], dtype=np.float32)

In [24]:
arr1

array([1., 2., 3., 4.], dtype=float32)

In [25]:
arr2 = np.array([1.2, 3, -0.3], dtype=np.int32)

In [26]:
arr2

array([1, 3, 0], dtype=int32)

<h6> Casting floating-pint to intger: the decimal part will be truncated: </h6>

<h3> Numpy data types </h3>
<ul>
    <li> <b> int8, uint8 </b> : Signed and unsigned 8-bit (1 byte) integer types </li>
    <li> <b> int16, uint16 </b> : Signed and unsigned 16-bit integer types </li>
    <li> <b> int32, uint32 </b> : Signed and unsigned 32-bit integer types </li>
    <li> <b> int64, uint64 </b> : Signed and unsigned 64-bit integer types </li>
    <li> <b> float16 </b> : Half-precision floating point </li>
    <li> <b> float32 </b> : Standard single-precision floating point; compatible with C float </li>
    <li> <b> float64 </b> : Standard double-precision floating point; compatible with C double and
    Python float object </li>
    <li> <b> float128 </b> : Extended-precision floating point </li>
    <li> <b> complex64, complex128 </b> : Complex numbers represented by two 32, 64, or 128 floats, respectively </li>
    <li> <b> bool </b> : Boolean type storing True and False values </li>
    <li> <b> object </b> : Python object type; a value can be any Python object </li>
    <li> <b> string_ </b> : Fixed-length ASCII string type (1 byte per character); for example, to create a
    string dtype with length 10, use 'S10' </li>
    <li> <b> unicode_ </b> : Fixed-length Unicode type (number of bytes platform specific); same
specification semantics as string_ (e.g., 'U10' ) </li>
</ul><br><br>
<h5> Convert array's dtype : <b>astype</b> method </h5>



In [27]:
arr = np.array([1, 2, 3, 4, 5], dtype=np.int64)

In [28]:
float_arr = arr.astype(np.float32)

In [29]:
float_arr.dtype

dtype('float32')

<h6> Casting floating-pint to intger: the decimal part will be truncated: </h6>

In [30]:
string_arr = np.array(['ab', '1' , 'fj'], dtype=np.string_)

In [31]:
string_arr.dtype

dtype('S2')

In [32]:
string_arr.astype(np.int64)

ValueError: invalid literal for int() with base 10: 'ab'

<h3> Arithmetic with NumPy Arrays </h3>
<ul>
<li>Arrays are important because they enable you to express batch operations on data
without writing any for loops. NumPy users call this vectorization. Any arithmetic
operations between equal-size arrays applies the operation element-wise: </li>
</ul>

In [None]:
arr = np.array([[1, 2 , 3], [4, 5, 6]], dtype=np.float64)

In [None]:
arr * arr

In [None]:
arr - arr

<p>Arithmetic operations with scalars propagate the scalar argument to each element in
the array:</p>

In [None]:
1/arr

In [None]:
arr ** 5

<p>Comparisons between arrays of the same size yield boolean arrays:</p>

In [None]:
arr2 = np.array([[0, 5, 1], 
                 [6, 5 , 10]], dtype=np.float64)

In [None]:
arr2 > arr

In [None]:
arr2 == arr

<h3>Broadcasting </h3>

<ul>
    <li> Operations between differently sized arrays is called <b>broadcasting</b> </li>
    <li> Broadcasting is the process of making arrays with different shapes have compatible shapes for arithmetic operations. </li>
</ul>

<p> Here we say that the scalar value 4 has been broadcast to all of the other elements in
the multiplication operation. : </p>

In [None]:
arr * 4

In [None]:
arr + 4

<h3> Indexing and Slicing</h3>
<ul>
    <li> <b>arr[start:stop:step]</b> for 1d arrays </li>
    <li> <b>arr[start:stop:step, start:stop:step, ....]</b> for more than  1d arrays </li> 
<ul>

In [None]:
arr = np.arange(20)

In [None]:
arr

In [None]:
arr[4]

In [None]:
arr[4:7]

In [None]:
arr[:6]

In [None]:
arr[2:10:2]

<p><b>Tip: </b> In numpy arrays and python's list [a:b] slicing from N index means [a,b) in math. or [a, b-1]

In [None]:
arr[3:5] = 256
# arr.__setitem__(slice(3,5), 256)

In [None]:
arr

<h4>Advanced Tip :</h4>
<ul>
    <li> <b>arr[start:stop:step]</b> means: <b>arr[slice(start, stop, step)]</b> </li>
    <li> <b> arr[index] </b> means: <b>arr.__getitem__(index)</b>
    <li> <b>arr[index] = value </b> means: <b>arr.__setitem__(index, value)</b> </li>
</ul>
<p> numpy use above facilities to enhance user experience </p><br>

In [None]:
arr_slice = arr[1:4]

In [None]:
arr_slice

In [None]:
arr_slice[0]=9999

In [None]:
arr

<p> change values in arr_slice , the mutations are reflected in the original array arr </p><br>

<h4> Slicing in higher dimension arrays: </h4><br>

In [None]:
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

In [None]:
arr2d

In [None]:
arr2d[1]

In [None]:
arr2d[2][0]

In [None]:
arr2d[2, 1]

<h4>Indexing elements in a NumPy array</h4><br>
<img src="https://raw.githubusercontent.com/h8hawk/Datacamp-Scientific-Python/master/files/numpy_2darray.jpg"/>

In [None]:
arr2d[:2, 1:] # Rows: start until 2  , columns: 1 until end

In [None]:
arr2d[:2, 2] 

In [None]:
arr2d[:, :1]

In [None]:
arr2d[:, 1]

<h4> Boolean Indexing </h4>
<p> This boolean array can be passed when indexing the array. </p><br>

In [None]:
data = np.random.randn(4, 4)

In [None]:
boolean_index = (data < -.5) | (data > .5)

In [None]:
boolean_index

In [None]:
data[boolean_index]

<h4> Fancy Indexing :</h4><br>

In [None]:
arr = np.random.randn(8, 4)

In [None]:
arr

In [None]:
arr[[4, 3, 0, 6]]

In [None]:
arr[[1, 6], [3, 1]]

<h3> Reshaping and Transposing Arrays and Swapping Axes </h3><br>

In [None]:
arr = np.arange(15)

In [None]:
arr

In [None]:
arr.reshape((3,5))

In [None]:
arr.reshape((4,8))

In [None]:
arr = arr.reshape((5,3))

In [None]:
arr

In [None]:
arr.T

<h3> Universal Functions: Fast Element-Wise Array Functions </h3>
<ul>
    <li>A universal function, or <b>ufunc</b>, is a function that performs element-wise operations
        on data in ndarrays.</li>
</ul><br>

In [None]:
arr = np.arange(10)

In [None]:
arr

In [None]:
np.sqrt(arr)

In [None]:
np.exp(arr)

In [None]:
np.max(arr)

In [None]:
np.min(arr)

In [None]:
np.mean(arr)

In [None]:
np.absolute(arr)

<p> These are referred to as unary ufuncs. Others, such as add or maximum , take two arrays
(thus, binary ufuncs) and return a single array as the result: </p>

In [None]:
x = np.random.randn(6)

In [None]:
y = np.random.randn(6)

In [None]:
x, y

In [None]:
np.maximum(x, y)

<h3> Unary universal functions:</h3>
<ul>
    <li><b>abs, fabs</b> : Compute the absolute value element-wise for integer, floating-point, or complex values</li>
    <li><b>sqrt</b> : Compute the square root of each element (equivalent to arr ** 0.5 )</li>
    <li><b>square</b> : Compute the square of each element (equivalent to arr ** 2 )</li>
    <li><b>exp</b> : Compute the exponent e x of each element</li>
        <li><b>log, log10, log2, log1p</b> : Natural logarithm (base e), log base 10, log base 2, and log(1 + x),  respectively</li>
    <li><b>sign</b> : Compute the sign of each element: 1 (positive), 0 (zero), or –1 (negative)</li>
    <li><b>ceil</b> : Compute the ceiling of each element (i.e., the smallest integer greater than or equal to that
number) </li>
    <li><b>floor</b> : Compute the floor of each element (i.e., the largest integer less than or equal to each element) </li>
    <li><b>rint</b> : Round elements to the nearest integer, preserving the dtype </li>
    <li><b>modf</b> : Return fractional and integral parts of array as a separate array</li>
    <li><b>isnan</b> : Return boolean array indicating whether each value is NaN (Not a Number) </li>
    <li><b>isfinite, isinf </b> : Return boolean array indicating whether each element is finite (non- inf , non- NaN ) or infinite,
respectively </li>
    <li><b>cos, cosh, sin, sinh, tan, tanh</b> : Regular and hyperbolic trigonometric functions</li>
    <li><b>arccos, arccosh, arcsin, arcsinh, arctan, arctanh</b> : Inverse trigonometric functions </li>
    <li><b>logical_not</b> : Compute truth value of not x element-wise (equivalent to ~arr ). </li>
</ul>

    

<h3> Binary universal functions: </h3>
<ul>
    <li><b>add</b> : Add corresponding elements in arrays</li>
    <li><b>subtract</b> : Subtract elements in second array from first array</li>
    <li><b>multiply</b> : Multiply array elements</li>
    <li><b>divide, floor_divide</b> : Divide or floor divide (truncating the remainder)</li>
    <li><b>power</b> : Raise elements in first array to powers indicated in second array</li>
    <li><b>maximum, fmax</b> : Element-wise maximum; fmax ignores NaN</li>
    <li><b>minimum, fmin</b> : Element-wise minimum; fmin ignores NaN</li>
    <li><b>mod</b> : Element-wise modulus (remainder of division)</li>
    <li><b>copysign</b> : Copy sign of values in second argument to values in first argument</li>
    <li><b>greater, greater_equal, less, less_equal, equal, not_equal</b> : Perform element-wise comparison, yielding boolean array (equivalent to infix
operators >, >=, <, <=, ==, !=)</li>
    <li><b>logical_and, logical_or, logical_xor</b> : Compute element-wise truth value of logical operation (equivalent to infix operators
& |, ^ )</li>

</ul>

<h3> Array Oriented Programming </h3>
<ul>
    <li>express many kinds of data processing tasks as concise array expressions</li>
    <li>replacing explicit loops with array expressions is commonly referred to as <b>vectorization</b></li>
</ul>
<br>

In [49]:
X = np.random.randn(1000000)

In [50]:
Y = np.random.randn(1000000)

euclodian distance $
 = \begin{align} \sqrt{(X_1-Y_1)^2+(X_2-Y_2)^2+ .... +(X_n-Y_n)^2}     
\end{align}
$

In [51]:
%time euc_distance = np.sqrt(np.sum((X-Y)**2))

CPU times: user 828 µs, sys: 6.91 ms, total: 7.74 ms
Wall time: 6.52 ms


In [52]:
%time d = np.linalg.norm(X-Y)

CPU times: user 6.7 ms, sys: 5.01 ms, total: 11.7 ms
Wall time: 6.17 ms


Manhattan distance $
 = \begin{align}  |X_1-Y_1| + |X_2-Y_2|+ .... +|X_n-Y_n|
\end{align}
$

In [53]:
manhatan_distance = np.sum(np.abs(X-Y))

<h3> Expressing Conditional Logic as Array Operations </h3>
<ul>
    <li> <b>np.where(condition, X, Y)</b> : vectorized version of the ternary expression x if condition else y </li>
    <li> <b>np.where(condition)</b>: return list of indexes that condition are true in.
</ul>

In [65]:
arr = np.random.randn(4, 4)

In [66]:
np.where(arr > 0)

(array([0, 0, 0, 1, 2, 2, 3, 3, 3]), array([0, 1, 2, 1, 0, 2, 0, 2, 3]))

In [67]:
np.where(arr>0 , 2 , -2)

array([[ 2,  2,  2, -2],
       [-2,  2, -2, -2],
       [ 2, -2,  2, -2],
       [ 2, -2,  2,  2]])

<h3> Mathematical and Statistical Methods </h3>
<ul>
    <li> <b>sum</b> : Sum of all the elements in the array or along an axis; zero-length arrays have sum 0</li>
    <li> <b>mean</b> : Arithmetic mean; zero-length arrays have NaN mean</li>
    <li> <b>std, var</b> : Standard deviation and variance </li>
    <li> <b>argmin, argmax</b> : Indices of minimum and maximum elements, respectively</li>
    <li> <b>cumsum</b> : Cumulative sum of elements starting from 0</li>
    <li> <b>cumprod</b> : Cumulative product of elements starting from 1</li>
<ul><br/>

In [68]:
arr = np.random.randn(5, 4)

In [69]:
arr.mean()

0.22587984819025655

In [70]:
arr.sum()

4.517596963805131

In [74]:
arr.mean(axis=0)

array([ 0.43569009,  0.69224516,  0.03797257, -0.26238843])

<h3>Unique and Other Set Logic</h3><br>
<ul>
    <li><b>unique(x)</b> : Compute the sorted, unique elements in x</li>
    <li><b>intersect1d(x, y)</b> : Compute the sorted, common elements in x and y</li>
    <li><b>union1d(x, y)</b> : Compute the sorted union of elements</li>
    <li><b>in1d(x, y)</b> : Compute a boolean array indicating whether each element of x is contained in y</li>
    <li><b>setdiff1d(x, y)</b> : Set difference, elements in x that are not in y</li>
    <li><b>setxor1d(x, y)</b> : Set symmetric differences; elements that are in either of the arrays, but not both </li>
</ul><br>

In [79]:
ints = np.array([3, 3, 3, 2, 2, 1, 1, 4, 4])

In [81]:
np.unique(ints, return_index=True, return_counts=True)

(array([1, 2, 3, 4]), array([5, 3, 0, 7]), array([2, 2, 3, 2]))

<h3> Linear Algebra </h3>
<ul>
    <li> Multiplication, decompositions, determinants, and other square matrix math</li>
    <li> <b>numpy.dot</b> If both a and b are 2-D arrays, it is matrix multiplication, but using <b>np.matmul</b> or a @ b is preferred. </li>
<ul>

<h3> np.dot </h3>
<ul>
    <li>If both a and b are 1-D arrays, it is inner product of vectors (without complex conjugation)</li>
    <li>If both a and b are 2-D arrays, it is matrix multiplication, but using matmul or a @ b is preferred.</li>
    <li>If either a or b is 0-D (scalar), it is equivalent to multiply and using numpy.multiply(a, b) or a * b is preferred.</li>
    <li>If a is an N-D array and b is a 1-D array, it is a sum product over the last axis of a and b.</li>
    <li>If a is an N-D array and b is an M-D array (where M>=2), it is a sum product over the last axis of a and the second-to-last axis of b : <br>dot(a, b)[i,j,k,m] = sum(a[i,j,:] * b[k,:,m])</li>
</ul>

In [82]:
x = np.array([[1., 2., 3.], [4., 5., 6.]])

In [83]:
y = np.array([[6., 23.], [-1, 7], [8, 9]])

In [84]:
x.dot(y)

array([[ 28.,  64.],
       [ 67., 181.]])

In [86]:
np.dot(x, y)

array([[ 28.,  64.],
       [ 67., 181.]])

In [87]:
x @ y

array([[ 28.,  64.],
       [ 67., 181.]])

<h3> numpy.linalg </h3>
<ul>
    <li> numpy.linalg has a standard set of matrix decompositions and things like inverse and determinant. under the hood use libraies like BLAS, LAPACK, </li>
    <li>under the hood use libraies like BLAS, LAPACK </li>
</ul><br>
<h3>Commonly used numpy.linalg functions </h3>
<ul>
    <li><b>diag</b> :Return the diagonal (or off-diagonal) elements of a square matrix as a 1D array, or convert a 1D array into a square matrix with zeros on the off-diagonal </li>
    <li><b>dot</b> : Matrix multiplication</li>
    <li><b>trace</b> : Compute the sum of the diagonal elements</li>
    <li><b>det</b> : Compute the matrix determinant</li>
    <li><b>eig</b> : Compute the eigenvalues and eigenvectors of a square matrix</li>
    <li><b>inv</b> : Compute the inverse of a square matrix</li>
    <li> <a href="https://docs.scipy.org/doc/numpy/reference/routines.linalg.html"><b>More</b></a> </li>
</ul><br>

<h3>Pseudorandom Number Generation</h3>