
> **Introduction to Numpy library:**
 
 <hr/>
 NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.
 



* NumPy fully supports an object-oriented approach, starting, once again, with ndarray. For example, ndarray is a class, possessing numerous methods and attributes. Many of its methods are mirrored by functions in the outer-most NumPy namespace, allowing the programmer to code in whichever paradigm they prefer. This flexibility has allowed the NumPy array dialect and NumPy ndarray class to become the de-facto language of multi-dimensional data interchange used in Python.



In [3]:
import numpy as np

In [3]:
a = np.array([1,3,5,6,7], dtype = 'i')

In [4]:
b = np.array((2,5,6,7), dtype = 'f')

In [5]:
type(a)

numpy.ndarray

In [6]:
type(b)

numpy.ndarray

In [7]:
a.dtype

dtype('int32')

In [8]:
b.dtype

dtype('float32')

##### Numpy dimension:
An array is usually a fixed-size container of items of the same type and size. The number of dimensions and items in an array is defined by its shape. The shape of an array is a tuple of non-negative integers that specify the sizes of each dimension.

In NumPy, dimensions are called axes. This means that if you have a 2D array that looks like this:
[[0., 0., 0.],
 [1., 1., 1.]]
Your array has 2 axes. The first axis has a length of 2 and the second axis has a length of 3.

In [9]:
twoD = np.array([[1,2,3],
       [4,5,6]])
print(twoD.ndim)
# ndim -- prints out the dimentions of the array 

2


In [10]:
twoD[0,1]
#just like a normal array

2

In [11]:
threeD = np.array([[[1,2,3],[4,5,6]],[
       [-4,-5,-6],[-1,-2,-3]]])
print(threeD.ndim)
type(threeD)


3


numpy.ndarray

In [12]:
threeD[1,1,2]

-3

In [13]:
threeD.shape

(2, 2, 3)

In [14]:
twoD.shape

(2, 3)

## Numpy array functions:

    1. np.arrange
    2. np.random.permutation
    3. np.reshape
    4. np.zeros
    5. np.ones
    5. np.linspace





In [7]:
#Return evenly spaced numbers over a specified interval
np.linspace(3,30, num = 10)

array([ 3.,  6.,  9., 12., 15., 18., 21., 24., 27., 30.])

In [15]:
odd = int(input("Enter the start number:"))
last = int(input("Enter the last odd number"))
quick = np.arange(odd,last+1,2)
print(quick)

Enter the start number:1
Enter the last odd number10
[1 3 5 7 9]


In [16]:
np.random.randint(1,10)

1

In [17]:
first = np.array([1,2,3,4,5,6,7,8,9,10])
B= first.reshape(2,5)
print(B)

[[ 1  2  3  4  5]
 [ 6  7  8  9 10]]


In [18]:
print(np.zeros((2,3), dtype = int))

[[0 0 0]
 [0 0 0]]


In [19]:
print(np.ones((2,3), dtype = float))

[[1. 1. 1.]
 [1. 1. 1.]]


### NUMPY -- Slicing
 #####   Syntax: [ Start : End : Step ]
![iamge](numpy_indexing.png)

In [20]:
A = np.arange(100)

In [21]:
b = A[3:10]
print(b)

[3 4 5 6 7 8 9]


In [22]:
b[0]= -1200

In [23]:
print(A)


#note down the changes in A !!

[    0     1     2 -1200     4     5     6     7     8     9    10    11
    12    13    14    15    16    17    18    19    20    21    22    23
    24    25    26    27    28    29    30    31    32    33    34    35
    36    37    38    39    40    41    42    43    44    45    46    47
    48    49    50    51    52    53    54    55    56    57    58    59
    60    61    62    63    64    65    66    67    68    69    70    71
    72    73    74    75    76    77    78    79    80    81    82    83
    84    85    86    87    88    89    90    91    92    93    94    95
    96    97    98    99]


In [24]:
#To change this behaviour
b = A[3:10].copy()

In [25]:
print(b)

[-1200     4     5     6     7     8     9]


In [26]:
b[0]=3

In [27]:
print(A)

[    0     1     2 -1200     4     5     6     7     8     9    10    11
    12    13    14    15    16    17    18    19    20    21    22    23
    24    25    26    27    28    29    30    31    32    33    34    35
    36    37    38    39    40    41    42    43    44    45    46    47
    48    49    50    51    52    53    54    55    56    57    58    59
    60    61    62    63    64    65    66    67    68    69    70    71
    72    73    74    75    76    77    78    79    80    81    82    83
    84    85    86    87    88    89    90    91    92    93    94    95
    96    97    98    99]


In [28]:
print(b)

[3 4 5 6 7 8 9]


In [29]:
A[::5]

array([ 0,  5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80,
       85, 90, 95])

In [30]:
A[::-5]

array([99, 94, 89, 84, 79, 74, 69, 64, 59, 54, 49, 44, 39, 34, 29, 24, 19,
       14,  9,  4])

In [31]:
A[::-1]

array([   99,    98,    97,    96,    95,    94,    93,    92,    91,
          90,    89,    88,    87,    86,    85,    84,    83,    82,
          81,    80,    79,    78,    77,    76,    75,    74,    73,
          72,    71,    70,    69,    68,    67,    66,    65,    64,
          63,    62,    61,    60,    59,    58,    57,    56,    55,
          54,    53,    52,    51,    50,    49,    48,    47,    46,
          45,    44,    43,    42,    41,    40,    39,    38,    37,
          36,    35,    34,    33,    32,    31,    30,    29,    28,
          27,    26,    25,    24,    23,    22,    21,    20,    19,
          18,    17,    16,    15,    14,    13,    12,    11,    10,
           9,     8,     7,     6,     5,     4, -1200,     2,     1,
           0])

In [32]:
ind = np.argwhere(A==-1200)

In [33]:
ind

array([[3]], dtype=int64)

In [34]:
A[ind] = ind

In [35]:
A

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
       51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
       68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
       85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])

In [36]:
#generating random float values b/w 0 and 1 and shaping them into 5 rows and 3 columns
#round method to rount the float value

twoArray = np.round(10*np.random.rand(5,3))
twoArray

array([[6., 9., 6.],
       [1., 1., 0.],
       [2., 5., 3.],
       [4., 9., 4.],
       [4., 6., 5.]])

In [37]:
#slcing the whole second row

twoArray[1, :]

array([1., 1., 0.])

In [38]:
#accessing the whole 2nd column

twoArray[:,1]

array([9., 1., 5., 9., 6.])

In [39]:
#[row range , column range for those rows]

twoArray[0:3, 0:2]

array([[6., 9.],
       [1., 1.],
       [2., 5.]])

In [40]:
#transpose the matrix
print(twoArray)
twoArray.T 

[[6. 9. 6.]
 [1. 1. 0.]
 [2. 5. 3.]
 [4. 9. 4.]
 [4. 6. 5.]]


array([[6., 1., 2., 4., 4.],
       [9., 1., 5., 9., 6.],
       [6., 0., 3., 4., 5.]])

In [41]:
import numpy.linalg as la

In [42]:
la.inv(np.random.randint(1,51,(5,5)))


array([[-0.03688938, -0.02910539,  0.04482888, -0.01407099,  0.04404065],
       [ 0.03455721,  0.01846988, -0.00652373, -0.01414559, -0.03177932],
       [ 0.00626019,  0.062126  , -0.04378281,  0.02593211, -0.04564355],
       [-0.00050325, -0.01311613, -0.00128234,  0.0289208 ,  0.00200929],
       [ 0.01240829, -0.02210573,  0.00617534, -0.03645874,  0.04042883]])

In [43]:
#sorting arrays (columns = axis->0)

twoArray.sort(axis = 0)
twoArray

array([[1., 1., 0.],
       [2., 5., 3.],
       [4., 6., 4.],
       [4., 9., 5.],
       [6., 9., 6.]])

In [44]:
#sorting arrays (row = axis->1)
twoArray.sort(axis = 1)
twoArray

array([[0., 1., 1.],
       [2., 3., 5.],
       [4., 4., 6.],
       [4., 5., 9.],
       [6., 6., 9.]])

In [45]:
#matrix multiplication using numpy's * operator
ag = np.array([[1,2,3],[4,5,6],[7,8,9]])
bg = np.array([[1,2,3],[4,5,6],[7,8,9]])
cg = ag*bg
cg

array([[ 1,  4,  9],
       [16, 25, 36],
       [49, 64, 81]])

In [46]:
# slicng using '&'

test1 = np.arange(100)

In [47]:
test2 = test1[20:60:2]

In [48]:
test2

array([20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52,
       54, 56, 58])

In [49]:
test2 = test1[(A>20) & (A<40)]
test2

array([21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37,
       38, 39])

In [50]:
test2[0] = 20

In [51]:
test1

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
       51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
       68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
       85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])

In [52]:
test2

array([20, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37,
       38, 39])

## Broadcasting

###### "The term broadcasting describes how NumPy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes. Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python. It does this without making needless copies of data and usually leads to efficient algorithm implementations. There are, however, cases where broadcasting is a bad idea because it leads to inefficient use of memory that slows computation."

When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing (i.e. rightmost) dimension and works its way left. Two dimensions are compatible when

    -they are equal, or
    -one of them is 1.

-A set of arrays is called “broadcastable” to the same shape if the above rules produce a valid result.
--
For example, if a.shape is (5,1), b.shape is (1,6), c.shape is (6,) and d.shape is () so that d is a scalar, then a, b, c, and d are all broadcastable to dimension (5,6); and

    a acts like a (5,6) array where a[:,0] is broadcast to the other columns,
    b acts like a (5,6) array where b[0,:] is broadcast to the other rows,
    c acts like a (1,6) array and therefore like a (5,6) array where c[:] is broadcast to every row, and finally,
    d acts like a (5,6) array where the single value is repeated.

Here are some more examples:
---
    A      (2d array):  5 x 4
    B      (1d array):      1
        Result (2d array):  5 x 4

    A      (2d array):  5 x 4
    B      (1d array):      4
        Result (2d array):  5 x 4

    A      (3d array):  15 x 3 x 5
    B      (3d array):  15 x 1 x 5
        Result (3d array):  15 x 3 x 5

    A      (3d array):  15 x 3 x 5
    B      (2d array):       3 x 5
        Result (3d array):  15 x 3 x 5

    A      (3d array):  15 x 3 x 5
    B      (2d array):       3 x 1
        Result (3d array):  15 x 3 x 5
        
Here are examples of shapes that do not broadcast:
---
    A      (1d array)  :  3
    B      (1d array):  4 # trailing dimensions do not match

    A      (2d array):      2 x 1
    B      (3d array):  8 x 4 x 3 # second from last dimensions mismatched


Universal functions:
--
    np.hstack
    np.vstack
    np.sort

In [78]:
broad = np.arange(1,21).reshape(5,2,2)
broad

array([[[ 1,  2],
        [ 3,  4]],

       [[ 5,  6],
        [ 7,  8]],

       [[ 9, 10],
        [11, 12]],

       [[13, 14],
        [15, 16]],

       [[17, 18],
        [19, 20]]])

In [79]:
broad = broad + 5
broad

array([[[ 6,  7],
        [ 8,  9]],

       [[10, 11],
        [12, 13]],

       [[14, 15],
        [16, 17]],

       [[18, 19],
        [20, 21]],

       [[22, 23],
        [24, 25]]])

## -Using the hstack() and vstack() method

In [80]:
con1 = np.arange(1,11).reshape(2,5)
con1
con2 = np.arange(11,21).reshape(2,5)
con2

array([[11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20]])

In [81]:
np.hstack((con1,con2))

array([[ 1,  2,  3,  4,  5, 11, 12, 13, 14, 15],
       [ 6,  7,  8,  9, 10, 16, 17, 18, 19, 20]])

In [82]:
conv1 = np.arange(1,11).reshape(5,2)
print(conv1)
conv2 = np.arange(11,21).reshape(5,2)
conv2

[[ 1  2]
 [ 3  4]
 [ 5  6]
 [ 7  8]
 [ 9 10]]


array([[11, 12],
       [13, 14],
       [15, 16],
       [17, 18],
       [19, 20]])

In [83]:
np.vstack((conv1,conv2))

array([[ 1,  2],
       [ 3,  4],
       [ 5,  6],
       [ 7,  8],
       [ 9, 10],
       [11, 12],
       [13, 14],
       [15, 16],
       [17, 18],
       [19, 20]])

####  -Universal fuctions vs inbuilt python functions vs user defined:


In [59]:
testArray = np.random.rand(1000000)
%timeit sum(testArray)
%timeit np.sum(testArray)

90.7 ms ± 8.16 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
1.15 ms ± 75.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [60]:
def sumArray(G):
    """
    Defining our own sum function ->
    It takes the array as the argument
    and returs the sum
    """
    sum = 0
    for i in G:
        sum += i
    return sum
%timeit sumArray(testArray)

134 ms ± 9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [61]:
a = float(2.1234)
print("%.2f"%a)
print("%.2f",a)
print("{:.2f}".format(a))


2.12
%.2f 2.1234
2.12


In [62]:
True or False and False

True

### matrix multiplication in numpy contd..

We can use the np.matmul function or the @ operator to perform matrix multiplication.

You may recognize the above 2-d array as a matrix with five rows and three columns. Each row represents one region, and the columns represent temperature, rainfall, and humidity, respectively.

Numpy arrays can have any number of dimensions and different lengths along each dimension. We can inspect the length along each dimension using the `.shape`  property of an array.

<img src="https://fgnt.github.io/python_crashkurs_doc/_images/numpy_array_t.png" width="420">


In [64]:
climate_data = np.array([[73, 67, 43],
                         [91, 88, 64],
                         [87, 134, 58],
                         [102, 43, 37],
                         [69, 96, 70]])

In [74]:
w1, w2, w3 = 0.3, 0.2, 0.5
weights = np.array([w1, w2, w3])

In [75]:
print(weights.dtype)
weights.shape

float64


(3,)

In [76]:
np.matmul(climate_data, weights)

array([56.8, 76.9, 81.9, 57.7, 74.9])

In [77]:
climate_data @ weights

array([56.8, 76.9, 81.9, 57.7, 74.9])

## Working with CSV data files

Numpy also provides helper functions reading from & writing to files. Let's download a file `climate.txt`, which contains 10,000 climate measurements (temperature, rainfall & humidity) in the following format:


```
temperature,rainfall,humidity
25.00,76.00,99.00
39.00,65.00,70.00
59.00,45.00,77.00
84.00,63.00,38.00
66.00,50.00,52.00
41.00,94.00,77.00
91.00,57.00,96.00
49.00,96.00,99.00
67.00,20.00,28.00
...
```

This format of storing data is known as *comma-separated values* or CSV. 

> **CSVs**: A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. A CSV file typically stores tabular data (numbers and text) in plain text, in which case each line will have the same number of fields. (Wikipedia)


To read this file into a numpy array, we can use the `genfromtxt` function.

`import urllib.request`

`urllib.request.urlretrieve('LINK')`

In [85]:
#ARGUMENTS passed to genFromtxt:
# -first -> textfile name
# -second -> seperator e.g ','
# -skip-header 
climate_data = np.genfromtxt('climate.txt', delimiter=',', skip_header=1)

In [86]:
climate_data

array([[25., 76., 99.],
       [39., 65., 70.],
       [59., 45., 77.],
       ...,
       [99., 62., 58.],
       [70., 71., 91.],
       [92., 39., 76.]])

In [88]:
climate_data.shape

(10000, 3)

In [89]:
weights = np.array([0.3, 0.2, 0.5])

In [90]:
yields = climate_data @ weights

In [91]:
yields

array([72.2, 59.7, 65.2, ..., 71.1, 80.7, 73.4])

In [92]:
yields.shape

(10000,)

Adding the `yields` to `climate_data` as a fourth column using the `np.concatenate` function.

`axis= 1-> concatenate  matricies along the columns
axis= 2-> concatenate  matricies along the rows`

In [95]:
climate_results = np.concatenate((climate_data, yields.reshape(10000, 1)), axis=1)

In [94]:
climate_results

array([[25. , 76. , 99. , 72.2],
       [39. , 65. , 70. , 59.7],
       [59. , 45. , 77. , 65.2],
       ...,
       [99. , 62. , 58. , 71.1],
       [70. , 71. , 91. , 80.7],
       [92. , 39. , 76. , 73.4]])

In [96]:
np.savetxt('climate_results.txt', 
           climate_results, 
           fmt='%.2f', 
           delimiter=',',
           header='temperature,rainfall,humidity,yeild_apples', 
           comments='')

Numpy provides hundreds of functions for performing operations on arrays. Here are some commonly used functions:


* Mathematics: `np.sum`, `np.exp`, `np.round`, arithemtic operators 
* Array manipulation: `np.reshape`, `np.stack`, `np.concatenate`, `np.split`
* Linear Algebra: `np.matmul`, `np.dot`, `np.transpose`, `np.eigvals`
* Statistics: `np.mean`, `np.median`, `np.std`, `np.max`

> The easiest way to find the right function for a specific operation or use-case is to do a web search. For instance, searching for "How to join numpy arrays" leads to [this tutorial on array concatenation](https://cmdlinetips.com/2018/04/how-to-concatenate-arrays-in-numpy/). 

You can find a full list of array functions here: https://numpy.org/doc/stable/reference/routines.html