## Simple NumPy tutorial using the Iris Dataset

Atfirst import the required libraries/packages

In [1]:
import numpy as np
import pandas as pd

Lets, load/read through the dataset using good old pandas and sklearn

I am using Pandas to load the dataset but numpy can also used. There are many options like <b>numpy.loadtxt</b> or <b>numpy.genfromtxt</b>

In [2]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris

# Load the Iris dataset
iris = load_iris()
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df['target'] = iris.target
df.head(5)

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


In [3]:
df['target'].value_counts()

0    50
1    50
2    50
Name: target, dtype: int64

Lets create 1D numpy arrays now, then we will move to the data

In [4]:
a1=np.array([1,2,3,4])
a1

array([1, 2, 3, 4])

In [5]:
a2=np.arange(1,5)
a2

array([1, 2, 3, 4])

In [6]:
a3=np.arange(0,10,2)
a3

array([0, 2, 4, 6, 8])

In [7]:
a4=np.linspace(0,10,5,dtype=int) 
a4

array([ 0,  2,  5,  7, 10])

There is another function called np.logspace, you can check that too

The difference between the two is that <b>np.linspace</b> returns numbers spaced evenly on a linear scale, while <b>np.logspace</b> returns numbers spaced evenly on a logarithmic scale

In [8]:
a4=np.linspace(0,10,5,dtype=float)
a4

array([ 0. ,  2.5,  5. ,  7.5, 10. ])

These are some of the basic ways to create numpy arrays. Please use them, if you find some other easy ways, do let me know!!

In [9]:
a5=np.zeros(3)
a5

array([0., 0., 0.])

Lets create some 2D arrays in Numpy

In [10]:
arr1=np.array([[1,2,3,4],[5,6,7,8]])
arr1

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [11]:
arr2=np.arange(2,20,2).reshape(3,3)
arr2

array([[ 2,  4,  6],
       [ 8, 10, 12],
       [14, 16, 18]])

In [12]:
arr3=np.linspace(0,100,10,dtype=int).reshape(2,5)
arr3

array([[  0,  11,  22,  33,  44],
       [ 55,  66,  77,  88, 100]])

In [13]:
arr4=np.zeros((2,3))
arr4

array([[0., 0., 0.],
       [0., 0., 0.]])

Similarly we can create high dimensional numpy arrays

In [14]:
arr3d=np.arange(0,27).reshape(3,3,3)
arr3d

array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8]],

       [[ 9, 10, 11],
        [12, 13, 14],
        [15, 16, 17]],

       [[18, 19, 20],
        [21, 22, 23],
        [24, 25, 26]]])

In [15]:
arr3dzeros=np.zeros((3,3,3))
arr3dzeros

array([[[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]],

       [[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]],

       [[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]]])

Now, lets check some of the attributes of numpy array

In [16]:
arr3.ndim # dimension of the array

2

In [17]:
arr3.shape # shape of the array 

(2, 5)

In [18]:
arr3d.shape

(3, 3, 3)

In [19]:
arr3.dtype

dtype('int32')

In [20]:
type(arr3)

numpy.ndarray

These are some common attributes.

We can also generate random element into a numpy array

In [21]:
array=np.random.rand(3,2)
array

array([[0.35919725, 0.33408911],
       [0.14425616, 0.39319744],
       [0.83573219, 0.37018067]])

In [22]:
array2=np.random.randint(0,10,size=(3,3))
array2

array([[1, 1, 6],
       [3, 6, 3],
       [6, 1, 1]])

There are a lot of functions in numpy.random like <b>numpy.random.normal, numpy.random.lognormal, numpy.random.poisson</b> etc. <br>
<b>numpy.random.randint</b> generates integers from “discrete uniform” distribution

Now, lets implement these into our original dataset

In [23]:
sepal_length=np.array(df['sepal length (cm)'])
sepal_length[:5]

array([5.1, 4.9, 4.7, 4.6, 5. ])

In [24]:
sepal_lengthTarget=np.array([df['target'],df['sepal length (cm)']])
sepal_lengthTarget=sepal_lengthTarget.T      # transpose
sepal_lengthTarget[:5]

array([[0. , 5.1],
       [0. , 4.9],
       [0. , 4.7],
       [0. , 4.6],
       [0. , 5. ]])

Transposing otherwise we have two different lists inside the list, we want to keep it like the way it is in the dataset

In [25]:
sepal_lengthTarget.shape

(150, 2)

In [26]:
sepal_lengthTarget.dtype

dtype('float64')

The datatype is float

In [27]:
np_df=df.values
np_df[:5]

array([[5.1, 3.5, 1.4, 0.2, 0. ],
       [4.9, 3. , 1.4, 0.2, 0. ],
       [4.7, 3.2, 1.3, 0.2, 0. ],
       [4.6, 3.1, 1.5, 0.2, 0. ],
       [5. , 3.6, 1.4, 0.2, 0. ]])

This can be used if we want to convert the entire pandas dataframe into a numpy array. Each row is a separate list in our 2D list.

In [28]:
np_df.ndim

2

Lets try accessing elements from our numpy arrays

In [29]:
sepal_length[3]

4.6

In [30]:
sepal_lengthTarget[1,0] 

0.0

Did you see the difference, in <b>sepal_length[3]</b> is same as a Python list but <b>sepal_lengthTarget[1,0] </b> numpy arrays are bit easier when it comes to indexing. If <b>sepal_lengthTarget</b> were to be a python list of lists we would have to use <b>sepal_lengthTarget[1][0]</b> which becomes complicated when have n-dimensional array, so <b>array[i,j,k...]</b> makes life more easier

<b>Slicing</b>

Slicing extract portions of the array based on specific indexes. It always returns a view and is not a copy

In [31]:
sepal_length[3:20]

array([4.6, 5. , 5.4, 4.6, 5. , 4.4, 4.9, 5.4, 4.8, 4.8, 4.3, 5.8, 5.7,
       5.4, 5.1, 5.7, 5.1])

In [32]:
sepal_length[-4:-1]

array([6.3, 6.5, 6.2])

In [33]:
sepal_length[-1:-3]

array([], dtype=float64)

Slicing works from left to right, so direction is important else we will get an empty array in return.

In [34]:
sepal_lengthTarget[0:3,:1]

array([[0.],
       [0.],
       [0.]])

In [35]:
sepal_lengthTarget[-3:-1,0:]

array([[2. , 6.5],
       [2. , 6.2]])

Slicing is similar to generic python apart from the numpy indexing syntaxes

<b>arithmetic operations</b>

In [36]:
sepal_width=np.array(df['sepal width (cm)'])
sepal_width[:5]

array([3.5, 3. , 3.2, 3.1, 3.6])

In [37]:
#Lets sepal length and sepal width
sepal_width_sepal_length=sepal_length + sepal_width
sepal_width_sepal_length[:5]

array([8.6, 7.9, 7.9, 7.7, 8.6])

So, numpy does element-wise addition as opposed to generic python. Isnt it amazing!!<br>



In [38]:
sepal_area=sepal_length * sepal_width
sepal_area[:5]

array([17.85, 14.7 , 15.04, 14.26, 18.  ])

 element-wise operations, do check them out.

Lets see if we can do matrix multiplication

In [39]:
sepal_matrix=np.array([df['sepal length (cm)'],df['sepal width (cm)']])
sepal_matrix=sepal_matrix.T      # transpose
sepal_matrix.shape

(150, 2)

In [40]:
petal_matrix=np.array([df['petal length (cm)'],df['petal width (cm)']])
#petal_matrix=petal_matrix.T      # transpose
petal_matrix.shape

(2, 150)

In [41]:
np.matmul(sepal_matrix,petal_matrix)

array([[ 7.84,  7.84,  7.33, ..., 33.52, 35.59, 32.31],
       [ 7.46,  7.46,  6.97, ..., 31.48, 33.36, 30.39],
       [ 7.22,  7.22,  6.75, ..., 30.84, 32.74, 29.73],
       ...,
       [ 9.7 ,  9.7 ,  9.05, ..., 39.8 , 42.  , 38.55],
       [ 9.36,  9.36,  8.74, ..., 39.04, 41.3 , 37.74],
       [ 8.86,  8.86,  8.27, ..., 36.68, 38.76, 35.49]])

In [42]:
sepal_matrix @ petal_matrix

array([[ 7.84,  7.84,  7.33, ..., 33.52, 35.59, 32.31],
       [ 7.46,  7.46,  6.97, ..., 31.48, 33.36, 30.39],
       [ 7.22,  7.22,  6.75, ..., 30.84, 32.74, 29.73],
       ...,
       [ 9.7 ,  9.7 ,  9.05, ..., 39.8 , 42.  , 38.55],
       [ 9.36,  9.36,  8.74, ..., 39.04, 41.3 , 37.74],
       [ 8.86,  8.86,  8.27, ..., 36.68, 38.76, 35.49]])

In [43]:
np.dot(sepal_matrix,petal_matrix)

array([[ 7.84,  7.84,  7.33, ..., 33.52, 35.59, 32.31],
       [ 7.46,  7.46,  6.97, ..., 31.48, 33.36, 30.39],
       [ 7.22,  7.22,  6.75, ..., 30.84, 32.74, 29.73],
       ...,
       [ 9.7 ,  9.7 ,  9.05, ..., 39.8 , 42.  , 38.55],
       [ 9.36,  9.36,  8.74, ..., 39.04, 41.3 , 37.74],
       [ 8.86,  8.86,  8.27, ..., 36.68, 38.76, 35.49]])

In [44]:
np.dot(sepal_length,sepal_width)

2673.4300000000003

We can use all these different ways to matrix multiplication in numpy. We can use the <b>np.matmul</b>,<b>np.dot</b> or the <b>@ operator</b> to do it. <b>np.dot</b> is very versatile, if used for a 1D array or scalar it gives us the dot product for a 2D or a mtrix you will get a matrix multiplication

<b> Comparisions, Logical Operations 

In [108]:
petal_length=np.array(df['petal length (cm)'])
petal_width=np.array(df['petal width (cm)'])

In [46]:
temp_length=np.array(sepal_length==petal_length)
temp_length[:5]

array([False, False, False, False, False])

In [47]:
temp_width=np.array(sepal_width==petal_width)
temp_width[:5]

array([False, False, False, False, False])

In [48]:
np.array_equal(temp_length,temp_width)

True

If the logical operators like <b>'==','>','<', etc</b> are used in conjuction with numpy arrays it will do element-wise comparisons. To check if the arrays are equal we can use the <b>np.array_equal</b> function

One very important thing we can do with this, we can use this to do boolean masking to get new arrays of of it or access our required elements. 

In [49]:
petal_length[petal_length>6]

array([6.6, 6.3, 6.1, 6.7, 6.9, 6.7, 6.1, 6.4, 6.1])

In [50]:
sepal_width[sepal_width>4]

array([4.4, 4.1, 4.2])

In [51]:
sepal_width[(sepal_width>2) & (sepal_width<2.5)]

array([2.3, 2.3, 2.4, 2.2, 2.2, 2.4, 2.4, 2.3, 2.3, 2.2])

<b> Universal Functions</b>

We are already using some inherent numpy attributes already. Lets now turn our eye to a few common numpy functions which used a lot

In [52]:
np.sum(sepal_length)

876.5

In [53]:
np.sum(sepal_matrix)

1335.1000000000001

In [54]:
np.sum(sepal_matrix,axis=0)

array([876.5, 458.6])

In [55]:
np.sum(sepal_matrix,axis=1)[:5]

array([8.6, 7.9, 7.9, 7.7, 8.6])

In [56]:
sepal_matrix[:3,:3]

array([[5.1, 3.5],
       [4.9, 3. ],
       [4.7, 3.2]])

So, we use <b>np.sum</b> function to find summations or additions in a numpy array. For a 1D is pretty simple, just pass the array as a pramaeter and it will sum up all the elements and return it. But for 2D or higher dimension we can specify the axis, this gives us more flexibility, or else we can still get all elenment sum. The axis parameter will make sure, we do sum in any one axis only. Now, question arises what will be our axes values as we can use numpy for n-dimensional data.


![numpy_axes_explanation.png](attachment:numpy_axes_explanation.png)

<p>
  In the context of NumPy, the concept of "axis" is pivotal and determines the direction along which operations are performed on arrays. For a 2D matrix, like our <code>sepal_matrix</code>, it has two axes: axis 0 and axis 1.
</p>
<ul>
  <li>When we specify <code>axis=0</code>, operations are conducted in a row-wise fashion. In other words, the operation is applied to each column independently, resulting in a sum total for each column.</li>
  <li>Conversely, when we use <code>axis=1</code>, operations are carried out in a column-wise manner. The operation is applied to each row independently, yielding a sum total for each row.</li>
</ul>
<p>
  So, in essence, the choice of axis depends on the order of axes in the array. It's a numerical designation starting from 0 and incrementing for each subsequent axis. This distinction becomes particularly crucial when dealing with multi-dimensional arrays, guiding us on how to aggregate, reshape, or broadcast operations effectively.
</p>



In [57]:
np.mean(sepal_matrix,axis=0)

array([5.84333333, 3.05733333])

In [58]:
np.sum(sepal_matrix,axis=1)[:5]

array([8.6, 7.9, 7.9, 7.7, 8.6])

In [59]:
np.min(sepal_matrix)

2.0

Same way you can define the axis parameter to do with row-wise or column-wise calculation

Lets see the arg functions 

In [60]:
np.argmin(sepal_length)

13

In [61]:
sepal_length[13]

4.3

In [62]:
np.argmin(sepal_matrix)

121

In [63]:
overall_min_index = np.unravel_index(np.argmin(sepal_matrix), sepal_matrix.shape)
overall_min_index

(60, 1)

In [64]:
sepal_matrix[60,1]

2.0


  <p>The <code>np.argmin</code> function is used to find the index of the minimum value in an array. In the case of a 1D array like <code>sepal_length</code>, it returns the index of the first occurrence of the minimum value.</p>

  <p>For a 2D array such as <code>sepal_matrix</code>, <code>np.argmin</code> returns a single flattened index corresponding to the minimum value in the entire array.</p>

  <p>However, to interpret this flattened index correctly in the context of the original 2D shape, we use <code>np.unravel_index</code>. This function takes the flattened index and the shape of the array and returns a tuple of indices. This tuple represents the position of the minimum value in the original shape of the 2D array.</p>

  <p>In summary, <code>np.unravel_index</code> is crucial for converting the flattened index obtained from <code>np.argmin</code> back into the corresponding indices in the original shape of the array, especially when dealing with multidimensional arrays.</p>



Same way you can define the axis parameter to do with row-wise or column-wise calculation

In [74]:
np.argwhere(sepal_matrix>5)[:5]

array([[ 0,  0],
       [ 5,  0],
       [10,  0],
       [14,  0],
       [15,  0]], dtype=int64)



  <p><code>np.argmin(sepal_matrix)</code> returns the flattened index of the minimum value in the 2D array <code>sepal_matrix</code>. It does not provide the row and column indices separately.</p>

  <p><code>np.argwhere(sepal_matrix > 5)</code> returns the indices of elements greater than 5 in <code>sepal_matrix</code>. It provides both row and column indices, giving a more detailed location of the elements that satisfy the condition. So each function has different properties so do play with them a bit more to dive into these nuances</p>


<b>sorting</b>

In [80]:
sepal_matrix.sort()
sepal_matrix[:5]

array([[3.5, 5.1],
       [3. , 4.9],
       [3.2, 4.7],
       [3.1, 4.6],
       [3.6, 5. ]])

In [76]:
np.sort(sepal_matrix)[:5]

array([[3.5, 5.1],
       [3. , 4.9],
       [3.2, 4.7],
       [3.1, 4.6],
       [3.6, 5. ]])

In [81]:
np.argsort(sepal_matrix)[:5]

array([[0, 1],
       [0, 1],
       [0, 1],
       [0, 1],
       [0, 1]], dtype=int64)



  <p><code>sepal_matrix.sort()</code> sorts the original array <code>sepal_matrix</code> in-place. It does not return anything but directly modifies the array.</p>

  <p><code>np.sort(sepal_matrix)</code> returns a sorted copy of <code>sepal_matrix</code> along the last axis. The original array remains unchanged.</p>

  <p><code>np.argsort(sepal_matrix)</code> returns the indices that would sort the array <code>sepal_matrix</code>. It provides the sorted indices along the last axis.</p>



Lets look at some logical functions

In [67]:
np.any(sepal_length>sepal_width)

True

In [71]:
np.all(sepal_length<sepal_width)

False

In [69]:
np.where(sepal_length>5,1,0)[:5]

array([1, 0, 0, 0, 0])



  <p><code>np.any(sepal_length > sepal_width)</code> checks if there is any element in <code>sepal_length</code> greater than the corresponding element in <code>sepal_width</code>.</p>

  <p><code>np.all(sepal_length < sepal_width)</code> checks if all elements in <code>sepal_length</code> are less than the corresponding elements in <code>sepal_width</code>.</p>

  <p><code>np.where(sepal_length > 5, 1, 0)[:5]</code> creates a new array where elements greater than 5 are replaced with 1 and others with 0. The <code>[:5]</code> part displays the first 5 elements of the resulting array.</p>



<b>Array Manipulation - Reshaping,merging and Splitting</b>

In [82]:
temp=sepal_length[:9]
temp.reshape(3,3)

array([[5.1, 4.9, 4.7],
       [4.6, 5. , 5.4],
       [4.6, 5. , 4.4]])

In [84]:
temp.flatten()

array([5.1, 4.9, 4.7, 4.6, 5. , 5.4, 4.6, 5. , 4.4])

In [85]:
temp.ravel()

array([5.1, 4.9, 4.7, 4.6, 5. , 5.4, 4.6, 5. , 4.4])

In [86]:
temp.T

array([5.1, 4.9, 4.7, 4.6, 5. , 5.4, 4.6, 5. , 4.4])



  <p><code>temp.reshape(3,3)</code> reshapes the 1D array <code>temp</code> into a 2D array with a shape of (3, 3).</p>

  <p><code>temp.flatten()</code> returns a flattened 1D copy of the <code>temp</code> array. It creates a new array with the same data.</p>

  <p><code>temp.ravel()</code> also returns a flattened 1D copy of the <code>temp</code> array. It may return a view of the original array or a new array depending on the underlying data.</p>

  <p><code>temp.T</code> returns the transpose of the <code>temp</code> array. For a 1D array, the transpose doesn't change the array.</p>


In [99]:
split_arrays=np.split(sepal_matrix,indices_or_sections = [1, 3])
[arr[:2] for arr in split_arrays]

[array([[3.5, 5.1]]),
 array([[3. , 4.9],
        [3.2, 4.7]]),
 array([[3.1, 4.6],
        [3.6, 5. ]])]

In [103]:
split_arrays=np.split(sepal_matrix,3)
[arr[:2] for arr in split_arrays]

[array([[3.5, 5.1],
        [3. , 4.9]]),
 array([[3.2, 7. ],
        [3.2, 6.4]]),
 array([[3.3, 6.3],
        [2.7, 5.8]])]

In [105]:
split_arrays=np.hsplit(sepal_matrix, [1, 2])
[arr[:2] for arr in split_arrays]

[array([[3.5],
        [3. ]]),
 array([[5.1],
        [4.9]]),
 array([], shape=(2, 0), dtype=float64)]

In [106]:
split_arrays=np.vsplit(sepal_matrix, 2)
[arr[:2] for arr in split_arrays]

[array([[3.5, 5.1],
        [3. , 4.9]]),
 array([[3. , 6.6],
        [2.8, 6.8]])]



  <p><code>np.split(sepal_matrix, indices_or_sections=[1, 3])</code> splits the <code>sepal_matrix</code> array into three parts along the horizontal axis (column-wise) at the specified indices [1, 3]. The result is a list of arrays.</p>

  <p><code>np.split(sepal_matrix, 3)</code> splits the <code>sepal_matrix</code> array into 3 equal parts along the horizontal axis (column-wise). The result is a list of arrays.</p>

  <p><code>np.hsplit(sepal_matrix, [1, 2])</code> horizontally splits the <code>sepal_matrix</code> array at the specified indices [1, 2]. The result is a list of arrays representing the split along the columns.</p>

  <p><code>np.vsplit(sepal_matrix, 2)</code> vertically splits the <code>sepal_matrix</code> array into two equal parts along the vertical axis (row-wise). The result is a list of arrays representing the split along the rows.</p>
  



  <p>The list comprehension <code>[arr[:2] for arr in split_arrays]</code> iterates over each array in the <code>split_arrays</code> list and extracts the first two elements from each array. The result is a list containing the first two elements of each split array.I did it to save notebook real-estate nothing much otherwise results were too huge to display.</p>
  
  


In [111]:
temp_sepal=sepal_length[:5]
temp_petal=petal_length[:5]
np.concatenate((temp_sepal,temp_petal))

array([5.1, 4.9, 4.7, 4.6, 5. , 1.4, 1.4, 1.3, 1.5, 1.4])

In [113]:
temp_sepal_petal=np.hstack((temp_sepal,temp_petal))
temp_sepal_petal

array([5.1, 4.9, 4.7, 4.6, 5. , 1.4, 1.4, 1.3, 1.5, 1.4])

In [115]:
temp_sepal_petal2=np.vstack((temp_sepal,temp_petal))
temp_sepal_petal2

array([[5.1, 4.9, 4.7, 4.6, 5. ],
       [1.4, 1.4, 1.3, 1.5, 1.4]])



  <p>The arrays <code>temp_sepal</code> and <code>temp_petal</code> represent subsets of sepal lengths and petal lengths, respectively.</p>
  
  <p><code>np.concatenate((temp_sepal, temp_petal))</code> concatenates the two arrays along the default axis (axis=0) resulting in a flattened array.</p>
  
  <p><code>temp_sepal_petal = np.hstack((temp_sepal, temp_petal))</code> horizontally stacks the arrays, creating a 1D array with interleaved sepal and petal lengths.</p>
  
  <p><code>temp_sepal_petal2 = np.vstack((temp_sepal, temp_petal))</code> vertically stacks the arrays, creating a 2D array with sepal lengths in the first row and petal lengths in the second row.</p>


The stacking is very important to understand <b>broadcasting</b> in numpy which a very important feature of it

In [117]:

temp_sepal = sepal_length[:5]
temp_petal = petal_length[:5]

result_broadcast = temp_sepal[:, np.newaxis] + temp_petal

print(result_broadcast)


[[6.5 6.5 6.4 6.6 6.5]
 [6.3 6.3 6.2 6.4 6.3]
 [6.1 6.1 6.  6.2 6.1]
 [6.  6.  5.9 6.1 6. ]
 [6.4 6.4 6.3 6.5 6.4]]




<p>Broadcasting is a powerful feature in NumPy that enables operations on arrays with different shapes without explicitly reshaping them. </p>

<ul>
  <li>
    <strong>Definition:</strong>
    <p>Broadcasting allows NumPy to work with arrays of different shapes during arithmetic operations.</p>
  </li>

  <li>
    <strong>Rules:</strong>
    <ul>
      <li>If two dimensions are equal or one of them is 1, these arrays are compatible for broadcasting.</li>
      <li>The size of the resulting array is the maximum size along each dimension of the input arrays.</li>
    </ul>
  </li>

  <li>
    <strong>Example:</strong>
    <p>When performing operations between arrays with different shapes, NumPy automatically adjusts their sizes to make the operation meaningful. For instance, adding a scalar to a 1D array or adding a column vector to a 2D array.</p>
  </li>
</ul>

<p>Broadcasting simplifies syntax, making it more convenient to perform element-wise operations on arrays of different shapes. It's a crucial concept for efficient array computations in NumPy.</p>
