# Numpy

In [2]:
import numpy as np

In [33]:
#climate data of different regions
climate_data = np.array([[73, 67, 43],
                         [91, 88, 64],
                         [87, 134, 58],
                         [102, 43, 37],
                         [69, 96, 70]])

In [4]:
# Weights
w1, w2, w3 = 0.3, 0.2, 0.5

We can represent the set of weights used in the formula as a vector.

In [5]:
weights = [w1, w2, w3]

In [6]:
climate_data, weights

(array([[ 73,  67,  43],
        [ 91,  88,  64],
        [ 87, 134,  58],
        [102,  43,  37],
        [ 69,  96,  70]]),
 [0.3, 0.2, 0.5])

Let's check the type of weights and climate_data

In [7]:
type(weights), type(climate_data)

(list, numpy.ndarray)

weight here, as we know, is List and cliamte_data is of type numpy.ndarray

In [8]:
weights = np.array(weights)

In [9]:
weights, type(weights)

(array([0.3, 0.2, 0.5]), numpy.ndarray)

(Above) We have converted weights(List) to an array(ndarray)

#### .size
* This gives the size of the array ; or the number of elements in an array

In [34]:
climate_data.size

15

#### .shape gives the type or shape of the ndarrays.

In [18]:
print(f"{climate_data}\n\n{climate_data.shape}")

[[ 73  67  43]
 [ 91  88  64]
 [ 87 134  58]
 [102  43  37]
 [ 69  96  70]]

(5, 3)


Breaking down - (5,3):
* 5 denotes the length, size or the number of elements inside the first array(or List)
* 3 denotes the length, size or the number of elements inside the second array. In our case we have 5 arrays(that is the 'second array') inside the first one
* We got a tuple (5,3) of size 2, that means this ndarray is a 2-D array.

In [19]:
print(f"{weights}\n\n{weights.shape}")

[0.3 0.2 0.5]

(3,)


Let's try to make some n-D array

In [22]:
# 2-D arary
array= np.array([
    [0.3,0.2,0.5],
    [0.3,0.2,0.5],
    [0.3,0.2,0.5]
])

In [21]:
array.shape

(3, 3)

In [26]:
# 3-D array
array= np.array([
    [[0.3,0.2],
    [0.4,0.5]],
    [[0.7,0.8],
    [0.6,0.2]]
])

In [27]:
array.shape

(2, 2, 2)

### Benefits of using Numpy arrays
Numpy arrays offer the following benefits over Python lists for operating on numerical data:

* Ease of use: You can write small, concise, and intuitive mathematical expressions like (kanto * weights).sum() rather than using loops & custom functions like crop_yield.
* Performance: Numpy operations and functions are implemented internally in C++, which makes them much faster than using Python statements & loops that are interpreted at runtime

### Dot Product

In [83]:
a= np.array(
    [[0.3,0.2,0.5],
    [0.6,0.7,0.8],]
    )

In [84]:
b= np.array([0.3,0.2,0.5])

In [85]:
dot_product = np.dot(a,b)

In [86]:
dot_product

array([0.38, 0.72])

### Matrix multiplication:
* Dot product of row n of array 1 and col n of array 2
* if we have two matrices: a x b and c x d, __then matrix multiplication will work only if b == c__
* __the resultant matrix will be of type, a x d__
* we can also use __@__ instead of np.matmul to do matrix mulitplication.

In [87]:
# 2x2 matrix
m1= np.array(
    [
        [1,2],
        [3,4]
    ]
    )

In [88]:
# 2x2 matrix
m2= np.array(
    [
        [5,6],
        [7,8]
    ]
    )

In [89]:
# 2x2 :: 2x2
mat_mult = np.matmul(m1,m2)

In [90]:
mat_mult

array([[19, 22],
       [43, 50]])

In [91]:
# 3x2 matrix
m3= np.array(
    [
        [5,6],
        [7,8],
        [1,2]
    ]
    )

In [92]:
# 2x3 matrix
m4 = np.array(
    [
       [1,2,3],
       [4,5,6] 
    ]
)

In [93]:
# 3x2 :: 2x3
mat_mult2 = m3 @ m4

In [94]:
mat_mult2

array([[29, 40, 51],
       [39, 54, 69],
       [ 9, 12, 15]])

In [95]:
# The resultant matrix is of type....3x3
mat_mult2.shape

(3, 3)

#### Working with CSV data files

In [99]:
from urllib import request

request.urlretrieve('https://gist.github.com/BirajCoder/a4ffcb76fd6fb221d76ac2ee2b8584e9/raw/4054f90adfd361b7aa4255e99c2e874664094cea/climate.csv', 
    'climate.txt')

('climate.txt', <http.client.HTTPMessage at 0x18be8873f70>)

##### np.genfromtxt
This is used to generate np.narray from a text file.
options given below:
* file path - in our case it's just the file name as we are in the same directory
* delimiter - how our data is seperated
* skip_header - we can give number of header row counts to skip, in our case we are skipping just 1 row  

In [100]:

climate_data = np.genfromtxt('climate.txt', delimiter=',', skip_header=1)

In [101]:
climate_data

array([[25., 76., 99.],
       [39., 65., 70.],
       [59., 45., 77.],
       ...,
       [99., 62., 58.],
       [70., 71., 91.],
       [92., 39., 76.]])

In [102]:
climate_data.shape

(10000, 3)

In [None]:
weights = np.array([0.3, 0.2, 0.5])

In [103]:
# will give yields for all the regions in the csv dataset
yields = climate_data @ weights

In [104]:
yields

array([72.2, 59.7, 65.2, ..., 71.1, 80.7, 73.4])

In [107]:
# shape gave 1000, that is, we have yields of thousand regions
yields.shape

(10000,)

Let's add the yields to climate_data as a fourth column using the np.concatenate function.

In [108]:
climate_data.shape

(10000, 3)

(Above) climate_data.shape is (1000,3). 1000 here is number of rows and 3 is the number of columns.

(Below)The axis=1 means we are targetting column. climate_data has 2 elements, i.e. 0,1...0 is for row and 1 is for column.

we are adding yields as a new column

In [109]:
climate_results= np.concatenate((climate_data,yields),axis=1)

ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)

we got an expected error, note that the 'yields.shape'=(10000,) there is no second element in this tuple.

to fix this issue we need to reshape the 'yields'

In [111]:
climate_results= np.concatenate((climate_data,yields.reshape(10000,1)),axis=1)

In [114]:
# Fourth column is added ; yields
climate_results

array([[25. , 76. , 99. , 72.2],
       [39. , 65. , 70. , 59.7],
       [59. , 45. , 77. , 65.2],
       ...,
       [99. , 62. , 58. , 71.1],
       [70. , 71. , 91. , 80.7],
       [92. , 39. , 76. , 73.4]])

In [115]:
# what if we change the axis...

climate_results_row= np.concatenate((climate_data,yields.reshape(10000,1)),axis=0)

ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 3 and the array at index 1 has size 1

Expected error:

When we did reshape of yields and concatenated it with climate_data, we merged it on axis 1 and not 0.

on axis 0, both, climate_data and yields have exactly same number of elements(=10000).

So in short, when we are merging to arrays, the number of element on the merging axis of both the arrays should be exactly same.

In [117]:
climate_results

array([[25. , 76. , 99. , 72.2],
       [39. , 65. , 70. , 59.7],
       [59. , 45. , 77. , 65.2],
       ...,
       [99. , 62. , 58. , 71.1],
       [70. , 71. , 91. , 80.7],
       [92. , 39. , 76. , 73.4]])

Let's write the final results from our computation above back to a new file using the np.savetxt function.

#### np.savetxt
* fmt="%.2f" - to specify the format of the datacell/data. Here "%.2f" means upto 2 floating points.

In [119]:
np.savetxt('climate_result.txt',climate_results,fmt="%.2f",delimiter=",",header='temperature,rainfall,humidity,yeild_apples',comments="None")

##### Array Comparison
* The comparison will be done for each element and then a ndarray of boolean values will be returned.

In [120]:
arr1 = np.array([[1, 2, 3], [3, 4, 5]])
arr2 = np.array([[2, 2, 3], [1, 2, 5]])

In [124]:
# returns an ndarray
arr1 == arr2

array([[False,  True,  True],
       [False, False,  True]])

In [122]:
type(arr1==arr2)

numpy.ndarray

In [125]:
arr_bool = arr1 == arr2

In [126]:
arr_bool

array([[False,  True,  True],
       [False, False,  True]])

In [128]:
# Result will be 3, as we have 3 True values
arr_bool.sum()

3

In [129]:
arr1>arr2

array([[False, False, False],
       [ True,  True, False]])

In [130]:
arr1>=arr2

array([[False,  True,  True],
       [ True,  True,  True]])

In [132]:
# what will happen if the dimension of the arrays is not same...let's see
arr3 = np.array([[1, 2, 3], [3, 4, 5]])
arr4 = np.array([[2, 2, 3]])

In [135]:
arr3.shape, arr4.shape

((2, 3), (1, 3))

In [134]:
arr3 == arr4

array([[False,  True,  True],
       [False, False, False]])

Nice!! so each list inside the outer list of first array arr3 is compared with the only list in the arr4.

let's change the shape a bit more.

In [136]:
arr5 = np.array([[1, 2, 3], [3, 4, 5]])
arr6 = np.array([[2, 2]])

In [137]:
arr5 == arr6

  arr5 == arr6


False

So it will fail....okay. The error mssg is self explainatory.

Array broadcasting is covered in the jovian notebook, it's simple but still try to revise it.

#### Array slicing and indexing

In [4]:
import numpy as np

In [5]:
arr1= np.array(
    [
        [1,2,3],
        [4,5,6],
        [7,8,9]
    ]
)

In [7]:
arr1, arr1.shape

(array([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]]),
 (3, 3))

In [8]:
# Let's try to get '6' from this array
arr1[1,2]

6

In [10]:
# Get just first two elements from last array
arr1[2,:2]

array([7, 8])

In [11]:
arr3 = np.array([
    [[11, 12, 13, 14], 
     [13, 14, 15, 19]], 
    
    [[15, 16, 17, 21], 
     [63, 92, 36, 18]], 
    
    [[98, 32, 81, 23],      
     [17, 18, 19.5, 43]]])

In [12]:
arr3, arr3.shape

(array([[[11. , 12. , 13. , 14. ],
         [13. , 14. , 15. , 19. ]],
 
        [[15. , 16. , 17. , 21. ],
         [63. , 92. , 36. , 18. ]],
 
        [[98. , 32. , 81. , 23. ],
         [17. , 18. , 19.5, 43. ]]]),
 (3, 2, 4))

In [14]:
# get 15 & 16 from second array and 98 & 32 from third array

arr3[1:,0,:2]

array([[15., 16.],
       [98., 32.]])

##### Other ways of creating Numpy arrays

In [15]:
# np.zeros - will create a numpy arrays of zeros, we need to just provide a tuple for the shape.
np.zeros((3,))

array([0., 0., 0.])

In [16]:
np.zeros((3,3))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [17]:
np.zeros((3,2,2))

array([[[0., 0.],
        [0., 0.]],

       [[0., 0.],
        [0., 0.]],

       [[0., 0.],
        [0., 0.]]])

In [18]:
# np.ones
np.ones((2,2,3))

array([[[1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.]]])

In [24]:
# np.eye - this is to create an identity matrix
# Note: we just have to provide the integer here, not tuple. I think you know why is that...
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [27]:
# np.full
np.full(shape=(3,3), fill_value=22)

array([[22, 22, 22],
       [22, 22, 22],
       [22, 22, 22]])

In [29]:
# np.random.rand
np.random.rand(5)

array([0.98338589, 0.21163386, 0.69693179, 0.20718814, 0.20284229])

In [30]:
# np.arange
# # Range with start, end and step
np.arange(10,80,5)

array([10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75])

In [31]:
# np.linspace
# # Equally spaced numbers in a range - Range start, end and how many numbers you need
np.linspace(2,19,6)

array([ 2. ,  5.4,  8.8, 12.2, 15.6, 19. ])

creating a 3x3 matrix using arange(or any other function)

In [35]:
np.arange(0,9).reshape(3,3)

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

#### np.flipud()
Reverses an array

In [36]:
# Creating a vector
vec = np.arange(0,9)
vec

array([0, 1, 2, 3, 4, 5, 6, 7, 8])

In [37]:
# using flipud()
rev_vec=np.flipud(vec)
rev_vec

array([8, 7, 6, 5, 4, 3, 2, 1, 0])

#### numpy.nonzero()
numpy.nonzero() function is used to Compute the indices of the elements that are non-zero.

* It returns a tuple of arrays, one for each dimension of arr, containing the indices of the non-zero elements in that dimension.
* The corresponding non-zero values in the array can be obtained with arr[nonzero(arr)] . 
* To group the indices by element, rather than dimension we can use transpose(nonzero(arr)).

In [41]:
vec2=np.array(
    [
        [1,3,0],
        [0,5,7],
        [6,0,8]
    ]
    )

vec2

array([[1, 3, 0],
       [0, 5, 7],
       [6, 0, 8]])

In [42]:
np.nonzero(vec2)

(array([0, 0, 1, 1, 2, 2], dtype=int64),
 array([0, 1, 1, 2, 0, 2], dtype=int64))

(Above) Output Explanation:
* as it was a 2x2 matrix the indices of the value will be like 00,01,02...and so on
* Ouptut gives us two array, just merge the values that are in these arrays...so
- value at 00 is non-zero, 01 is non-zero, 11 is non-zero...now you can guess how it works.

In [48]:
vec3 = np.array(
    [
        [
            [0,1],
            [2,3]
        ],
        [
            [4,0],
            [0,6]
        ],
    ]
)

In [49]:
vec3, vec3.shape

(array([[[0, 1],
         [2, 3]],
 
        [[4, 0],
         [0, 6]]]),
 (2, 2, 2))

In [50]:
np.nonzero(vec3)

(array([0, 0, 0, 1, 1], dtype=int64),
 array([0, 1, 1, 0, 1], dtype=int64),
 array([1, 0, 1, 0, 1], dtype=int64))

Mean, max and min
* np.mean(arr)
* np.max(arr)
* np.min(arr)

In [52]:
np.mean(vec3)

2.0

In [53]:
np.max(vec3)

6

In [54]:
np.min(vec3)

0