# 101 NumPy Exercises for Data Analysis

*by Selva Prabhakaran*

From the website: https://www.machinelearningplus.com/python/101-numpy-exercises-python/

*The goal of the numpy exercises is to serve as a reference as well as to get you to apply numpy beyond the basics.*
*The questions are of 4 levels of difficulties with L1 being the easiest to L4 being the hardest.*

**NOTE**: Run the next cell to load the data and the libraries needed for the exercises.

In [1]:
pip install -r requirements.txt


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


(L1) **Import numpy as `np` and print the version number.**

In [2]:
import numpy as np
print(np.version.version)

1.26.4


(L1) **Create a 1D array of numbers from 0 to 9.**

In [3]:
array = np.arange(0, 10)
array

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

(L1) **Create a 3×3 numpy array of all True’s.**

In [4]:
all_trues = np.full((3, 3), True, dtype=bool)
print(all_trues)

all_trues = np.ones((3, 3), dtype=bool)
print(all_trues)

[[ True  True  True]
 [ True  True  True]
 [ True  True  True]]
[[ True  True  True]
 [ True  True  True]
 [ True  True  True]]


(L1) **Extract all odd numbers from `arr`.**

In [5]:
# Input
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

odd = arr[arr % 2 == 1]

print(arr, odd)

[0 1 2 3 4 5 6 7 8 9] [1 3 5 7 9]


(L1) **Replace all odd numbers in `arr` with -1.**

In [6]:
odds_replaced = arr.copy()
odds_replaced[odds_replaced % 2 == 1] = -1
print(odds_replaced)

[ 0 -1  2 -1  4 -1  6 -1  8 -1]


(L2) **Replace all odd numbers in arr with -1 without changing `arr`**

In [7]:
out_of_place_replace = np.where(arr % 2 == 1, -1, arr)
print(out_of_place_replace, arr)

[ 0 -1  2 -1  4 -1  6 -1  8 -1] [0 1 2 3 4 5 6 7 8 9]


(L1) **Convert a 1D array to a 2D array with 2 rows.**

In [8]:
reshaped_arr = arr.reshape((2, 5))
print(reshaped_arr)

# -1 sets the dimension automatically
reshaped_arr = arr.reshape((2, -1))
print(reshaped_arr)

[[0 1 2 3 4]
 [5 6 7 8 9]]
[[0 1 2 3 4]
 [5 6 7 8 9]]


(L2) **Stack arrays `a` and `b` vertically.**

In [9]:
# Input
a = np.arange(10).reshape(2,-1)
b = np.repeat(1, 10).reshape(2,-1)

print(a, b)

c = np.vstack([a, b])
print(c)

[[0 1 2 3 4]
 [5 6 7 8 9]] [[1 1 1 1 1]
 [1 1 1 1 1]]
[[0 1 2 3 4]
 [5 6 7 8 9]
 [1 1 1 1 1]
 [1 1 1 1 1]]


(L2) **Stack the arrays `a` and `b` horizontally.**

In [10]:
d = np.hstack([a, b])
print(d)

[[0 1 2 3 4 1 1 1 1 1]
 [5 6 7 8 9 1 1 1 1 1]]


(L2) **Create the following pattern without hardcoding. Use only numpy functions and the below input array `a`.**

In [11]:
# Input
a = np.array([1,2,3])

# This repeats EACH ELEMENT along given axis
b = a.repeat(3)

# This repeats WHOLE ARRAY, as tiles
triple_a = np.tile(a, 3)

c = np.hstack([b, triple_a])
print(c)

[1 1 1 2 2 2 3 3 3 1 2 3 1 2 3 1 2 3]


(L2) **Get the common items between a and b.**

In [12]:
# Input
a = np.array([1,2,3,2,3,4,3,4,5,6])
b = np.array([7,2,10,2,7,4,9,4,9,8])

# Indexing and unique
c = np.unique(a[a == b])
print(c)

# Via intersect method
d = np.intersect1d(ar1=a, ar2=b)
print(d)

[2 4]
[2 4]


(L2) **How to remove from one array those items that exist in another?**

In [13]:
# Input
a = np.array([1,2,3,4,5])
b = np.array([5,6,7,8,9])

# Returns difference between arrays
c = np.setdiff1d(a , b)
print(c)

[1 2 3 4]


(L2) **Get the positions where elements of a and b match.**

In [14]:
# Input
a = np.array([1,2,3,2,3,4,3,4,5,6])
b = np.array([7,2,10,2,7,4,9,4,9,8])

'''
From docs: 

nonzero: Returns a tuple of arrays, one for each dimension of a, containing the indices of the non-zero elements in that dimension. The values in a are always tested and returned in row-major, C-style order.
asarray: Convert the input to an array.
'''

c = np.asarray(a == b).nonzero()
print(c)

d = np.where(a == b)
print(d)

(array([1, 3, 5, 7]),)
(array([1, 3, 5, 7]),)


(L2) **Get all items between 5 and 10 from `a`.**

In [15]:
# Input
a = np.array([2, 6, 1, 9, 10, 3, 27])

# Middle ground
b = a[np.where((a <= 10) & (a >= 5))]
print(b)

# Shorthand
c = a[(a >= 5) & (a <= 10)]
print(c)

# Extended
d = a[np.where(np.logical_and(a >=5, a <= 10))]
print(d)

[ 6  9 10]
[ 6  9 10]
[ 6  9 10]


(L2) **Convert the function maxx that works on two scalars, to work on two arrays.**

In [16]:
# Base function
def maxx(x, y):
    """Get the maximum of two items"""
    if x >= y:
        return x
    else:
        return y

# Inputs
a = np.array([5, 7, 9, 8, 6, 4, 5])
b = np.array([6, 3, 4, 8, 9, 7, 1])

# Direct definition
def pair_max(input_a, input_b):
    return np.where(input_a > input_b, input_a, input_b)

result = pair_max(a, b)
print(result)

# Vectorize
'''
From docs:
Returns an object that acts like pyfunc, but takes arrays as input.
Define a vectorized function which takes a nested sequence of objects or numpy arrays as inputs and returns a single numpy array or a tuple of numpy arrays.
The vectorized function evaluates pyfunc over successive tuples of the input arrays like the python map function, except it uses the broadcasting rules of numpy

tl;dr - A map array method
'''
pair_max = np.vectorize(maxx, otypes=[float])

result = pair_max(a, b)
print(result)

[6 7 9 8 9 7 5]
[6. 7. 9. 8. 9. 7. 5.]


(L2) **Swap columns 1 and 2 in the array `arr`.**

In [17]:
# Input
arr = np.arange(9).reshape(3,3)
print(arr)

swapped_cols = arr[:, [1, 0, 2]]
print(swapped_cols)

[[0 1 2]
 [3 4 5]
 [6 7 8]]
[[1 0 2]
 [4 3 5]
 [7 6 8]]


(L2) **Swap rows 1 and 2 in the array `arr`.**

In [18]:
swapped_rows = arr[[1, 0, 2], :]
print(swapped_rows)

[[3 4 5]
 [0 1 2]
 [6 7 8]]


(L2) **Reverse the rows of a 2D array `arr`.**

In [19]:
# Via flip
reversed_rows = np.flip(arr, axis=0)
print(reversed_rows)

# w/ fancy indexing
print(arr[::-1])

[[6 7 8]
 [3 4 5]
 [0 1 2]]
[[6 7 8]
 [3 4 5]
 [0 1 2]]


(L2) **Reverse the columns of a 2D array `arr`.**

In [20]:
# Via flip
reversed_cols = np.flip(arr, axis=1)
print(reversed_cols)

[[2 1 0]
 [5 4 3]
 [8 7 6]]


(L2) **Create a 2D array of shape 5x3 to contain random decimal numbers between 5 and 10.**

In [21]:
MIN = 5
MAX = 10

# Manual sampling from uniform dist
random_arr = (MAX - MIN) * np.random.random_sample((5, 3)) + MIN
print(random_arr)

# Sampling from uniform distribution via uniform() method
random_arr = np.random.uniform(MIN, MAX, (5,3))
print(random_arr)

[[9.79042908 5.94291901 5.30362951]
 [7.49722526 7.39420196 8.69099207]
 [9.40784392 9.71094214 7.27462878]
 [8.24908406 8.93345702 9.30503599]
 [8.89910504 9.38098198 6.60205501]]
[[9.05774534 5.50532299 6.55847554]
 [7.03397343 9.11547426 9.30723225]
 [7.77681898 8.1365936  7.72654205]
 [6.22757294 5.15627775 5.66857085]
 [9.69795907 9.01604968 6.46086976]]


(L2) **Print or show only 3 decimal places of the numpy array `rand`.**

In [22]:
rand = np.random.random((5,3))

# Using printoptions as ctx manager makes the changes temporary!
with np.printoptions(precision=3):
    print(rand)

# Manual change and revert
np.set_printoptions(precision=3)
print(rand)

## A "Meyer's" reset for printoptions
np.set_printoptions(
    edgeitems=3,
    infstr='inf',
    linewidth=75,
    nanstr='nan',
    precision=8,
    suppress=False,
    threshold=1000,
    formatter=None
)


[[0.238 0.542 0.466]
 [0.379 0.055 0.242]
 [0.067 0.161 0.252]
 [0.983 0.597 0.095]
 [0.945 0.659 0.686]]
[[0.238 0.542 0.466]
 [0.379 0.055 0.242]
 [0.067 0.161 0.252]
 [0.983 0.597 0.095]
 [0.945 0.659 0.686]]


(L2) **Pretty print `rand` by suppressing the scientific notation (like 1e10).**

In [23]:
np.random.seed(100)
rand = np.random.random([3,3])/1e3
with np.printoptions(suppress=True):
    print(rand)

[[0.0005434  0.00027837 0.00042452]
 [0.00084478 0.00000472 0.00012157]
 [0.00067075 0.00082585 0.00013671]]


(L1) **Limit the number of items printed in python numpy array a to a maximum of 6 elements.**

In [24]:
a = np.arange(15)
with np.printoptions(threshold=6):
    print(a)

[ 0  1  2 ... 12 13 14]


(L1) **Print the full numpy array a without truncating.**

In [25]:
with np.printoptions(threshold=None):
    print(a)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]


(L2) **Import the iris dataset keeping the text intact.**

In [26]:
IRIS_DATASET_URL = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"

# None or object returns an ARRAY of ROWS (as tuples)
# since dtypes are not uniform and defined across all columns
iris_dataset = np.genfromtxt(IRIS_DATASET_URL, delimiter=',', dtype=object)
print(iris_dataset)

[[b'5.1' b'3.5' b'1.4' b'0.2' b'Iris-setosa']
 [b'4.9' b'3.0' b'1.4' b'0.2' b'Iris-setosa']
 [b'4.7' b'3.2' b'1.3' b'0.2' b'Iris-setosa']
 [b'4.6' b'3.1' b'1.5' b'0.2' b'Iris-setosa']
 [b'5.0' b'3.6' b'1.4' b'0.2' b'Iris-setosa']
 [b'5.4' b'3.9' b'1.7' b'0.4' b'Iris-setosa']
 [b'4.6' b'3.4' b'1.4' b'0.3' b'Iris-setosa']
 [b'5.0' b'3.4' b'1.5' b'0.2' b'Iris-setosa']
 [b'4.4' b'2.9' b'1.4' b'0.2' b'Iris-setosa']
 [b'4.9' b'3.1' b'1.5' b'0.1' b'Iris-setosa']
 [b'5.4' b'3.7' b'1.5' b'0.2' b'Iris-setosa']
 [b'4.8' b'3.4' b'1.6' b'0.2' b'Iris-setosa']
 [b'4.8' b'3.0' b'1.4' b'0.1' b'Iris-setosa']
 [b'4.3' b'3.0' b'1.1' b'0.1' b'Iris-setosa']
 [b'5.8' b'4.0' b'1.2' b'0.2' b'Iris-setosa']
 [b'5.7' b'4.4' b'1.5' b'0.4' b'Iris-setosa']
 [b'5.4' b'3.9' b'1.3' b'0.4' b'Iris-setosa']
 [b'5.1' b'3.5' b'1.4' b'0.3' b'Iris-setosa']
 [b'5.7' b'3.8' b'1.7' b'0.3' b'Iris-setosa']
 [b'5.1' b'3.8' b'1.5' b'0.3' b'Iris-setosa']
 [b'5.4' b'3.4' b'1.7' b'0.2' b'Iris-setosa']
 [b'5.1' b'3.7' b'1.5' b'0.4' b'Ir

(L2) **Extract the text column species from the 1D iris imported in previous question.**

In [27]:
COLUMN_NAMES = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')

species = [
    row[4]
    for row in iris_dataset
]
print(species)

[b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-setosa', b'Iris-versicolor', b'Iris-versicolor', b'Iris-versicolor', b'Iris-versicolor', b'Iris-versicolor', b'Iris-versicolor', b'Iris-versicolor', b'Iris-versicolor', b'Iris-versicolor', b'Iris-versicolor',

(L2) **Convert the 1D iris to 2D array iris_2d by omitting the species text field.**

In [28]:
# Input
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'

# Removing string column makes all columns of uniform type, so it can be inferred
# and NumPy sees that it must be a matrix of floats
iris_2d = np.genfromtxt(url, delimiter=',', dtype=None, usecols=[0,1,2,3])

print(iris_2d.shape)

(150, 4)


(L1) **Find the mean, median, standard deviation of iris's sepallength (1st column).**

In [29]:
for measure in (np.mean, np.median, np.std):
    print(
        measure(
            iris_2d[:,0]
        )
    )

5.843333333333334
5.8
0.8253012917851409


(L2) **Create a normalized form of iris's sepallength whose values range exactly between 0 and 1 so that the minimum has value 0 and maximum has value 1.**

In [30]:
sepallength = iris_2d[:,0]

maximum_sepallength = np.max(sepallength)
minimum_sepallength = np.min(sepallength)

normalized_sepallength = (sepallength.copy() - minimum_sepallength) / (maximum_sepallength - minimum_sepallength)
print(normalized_sepallength)

[0.22222222 0.16666667 0.11111111 0.08333333 0.19444444 0.30555556
 0.08333333 0.19444444 0.02777778 0.16666667 0.30555556 0.13888889
 0.13888889 0.         0.41666667 0.38888889 0.30555556 0.22222222
 0.38888889 0.22222222 0.30555556 0.22222222 0.08333333 0.22222222
 0.13888889 0.19444444 0.19444444 0.25       0.25       0.11111111
 0.13888889 0.30555556 0.25       0.33333333 0.16666667 0.19444444
 0.33333333 0.16666667 0.02777778 0.22222222 0.19444444 0.05555556
 0.02777778 0.19444444 0.22222222 0.13888889 0.22222222 0.08333333
 0.27777778 0.19444444 0.75       0.58333333 0.72222222 0.33333333
 0.61111111 0.38888889 0.55555556 0.16666667 0.63888889 0.25
 0.19444444 0.44444444 0.47222222 0.5        0.36111111 0.66666667
 0.36111111 0.41666667 0.52777778 0.36111111 0.44444444 0.5
 0.55555556 0.5        0.58333333 0.63888889 0.69444444 0.66666667
 0.47222222 0.38888889 0.33333333 0.33333333 0.41666667 0.47222222
 0.30555556 0.47222222 0.66666667 0.55555556 0.36111111 0.33333333
 0.33333

(L3) **Compute the softmax score of sepallength.**

In [31]:
# https://stackoverflow.com/questions/34968722/how-to-implement-the-softmax-function-in-python
softmax_exponentials = np.exp(sepallength - np.max(sepallength))
softmax = softmax_exponentials / softmax_exponentials.sum(axis=0)

with np.printoptions(precision=3):
    print(softmax)

[0.002 0.002 0.001 0.001 0.002 0.003 0.001 0.002 0.001 0.002 0.003 0.002
 0.002 0.001 0.004 0.004 0.003 0.002 0.004 0.002 0.003 0.002 0.001 0.002
 0.002 0.002 0.002 0.002 0.002 0.001 0.002 0.003 0.002 0.003 0.002 0.002
 0.003 0.002 0.001 0.002 0.002 0.001 0.001 0.002 0.002 0.002 0.002 0.001
 0.003 0.002 0.015 0.008 0.013 0.003 0.009 0.004 0.007 0.002 0.01  0.002
 0.002 0.005 0.005 0.006 0.004 0.011 0.004 0.004 0.007 0.004 0.005 0.006
 0.007 0.006 0.008 0.01  0.012 0.011 0.005 0.004 0.003 0.003 0.004 0.005
 0.003 0.005 0.011 0.007 0.004 0.003 0.003 0.006 0.004 0.002 0.004 0.004
 0.004 0.007 0.002 0.004 0.007 0.004 0.016 0.007 0.009 0.027 0.002 0.02
 0.011 0.018 0.009 0.008 0.012 0.004 0.004 0.008 0.009 0.03  0.03  0.005
 0.013 0.004 0.03  0.007 0.011 0.018 0.007 0.006 0.008 0.018 0.022 0.037
 0.008 0.007 0.006 0.03  0.007 0.008 0.005 0.013 0.011 0.013 0.004 0.012
 0.011 0.011 0.007 0.009 0.007 0.005]


(L1) **Find the 5th and 95th percentile of iris's sepallength.**

In [32]:
five_percentile, nine_five_percentile = np.percentile(sepallength, 5), np.percentile(sepallength, 95)
print(five_percentile, nine_five_percentile)

# Doing both at once
print(
    np.percentile(sepallength, [5, 95])
)

4.6 7.254999999999998
[4.6   7.255]


(L2) **Find the correlation between SepalLength(1st column) and PetalLength(3rd column) in `iris_2d`**

In [33]:
petallength = iris_2d[:,2]

correlation = np.corrcoef(x=sepallength, y=petallength)
print(correlation)

[[1.         0.87175416]
 [0.87175416 1.        ]]


(L2) **Find out if `iris_2d` has any missing values.**

In [34]:
nans = np.isnan(iris_2d)
nans.any()

False

(L2) **Replace all occurrences of nan with 0 in numpy array.**

In [35]:
# Input
nanified_iris_2d = iris_2d.copy()
nanified_iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan

nanified_iris_2d[np.isnan(nanified_iris_2d)] = 0
np.isnan(nanified_iris_2d).any()

False

(L2) **Find the unique values and the count of unique values in iris's species.**

In [36]:
# Iterate manually
np.unique(species)
for value in np.unique(species):
    print(value, species.count(value))

# Using the unique.return_counts parameter
print(
    np.unique(species, return_counts=True)
)

b'Iris-setosa' 50
b'Iris-versicolor' 50
b'Iris-virginica' 50
(array([b'Iris-setosa', b'Iris-versicolor', b'Iris-virginica'],
      dtype='|S15'), array([50, 50, 50]))



(L2) **Bin the petal length (3rd) column of iris_2d to form a text array, such that if petal length is:**

* Less than 3 --> 'small'
* 3-5 --> 'medium'
* '>=5 --> 'large'

In [37]:
bins = [0.0, 3.0, 5.0]
bins_labels = ['small', 'medium', 'large']
petallength_put_in_bins = np.digitize(petallength, bins)
petallength_categories = [
    bins_labels[bin_number-1]
    for bin_number in petallength_put_in_bins
]
print(petallength_categories)

['small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'small', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'large', 'medium', 'medium', 'medium', 'medium', 'medium', 'large', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'medium', 'large', 'large', 'large', 'large', 'large', 'large

(L2) **Create a new column for volume in iris_2d, where volume is `(pi x petallength x sepal_length^2)/3`**

In [38]:
np.column_stack(
    [
        iris_2d,  # Old array
        (np.pi * petallength * sepallength ** 2) / 2  # New array with volumes - calculated on the fly
    ])

array([[5.10000000e+00, 3.50000000e+00, 1.40000000e+00, 2.00000000e-01,
        5.71989774e+01],
       [4.90000000e+00, 3.00000000e+00, 1.40000000e+00, 2.00000000e-01,
        5.28007477e+01],
       [4.70000000e+00, 3.20000000e+00, 1.30000000e+00, 2.00000000e-01,
        4.51085581e+01],
       [4.60000000e+00, 3.10000000e+00, 1.50000000e+00, 2.00000000e-01,
        4.98570754e+01],
       [5.00000000e+00, 3.60000000e+00, 1.40000000e+00, 2.00000000e-01,
        5.49778714e+01],
       [5.40000000e+00, 3.90000000e+00, 1.70000000e+00, 4.00000000e-01,
        7.78675155e+01],
       [4.60000000e+00, 3.40000000e+00, 1.40000000e+00, 3.00000000e-01,
        4.65332704e+01],
       [5.00000000e+00, 3.40000000e+00, 1.50000000e+00, 2.00000000e-01,
        5.89048623e+01],
       [4.40000000e+00, 2.90000000e+00, 1.40000000e+00, 2.00000000e-01,
        4.25748636e+01],
       [4.90000000e+00, 3.10000000e+00, 1.50000000e+00, 1.00000000e-01,
        5.65722297e+01],
       [5.40000000e+00, 3.7000

(L3) **Randomly sample iris's species such that setose is twice the number of versicolor and virginica.**

In [39]:
probabilities_map = {
    b'Iris-setosa': 0.5,
    b'Iris-versicolor': 0.25,
    b'Iris-virginica': 0.25
}
sampling_probabilities = np.array([
    probabilities_map.get(name)
    for name in species
])
sampling_distribution = sampling_probabilities / sum(sampling_probabilities)
sample = np.random.choice(species, size=100, p=sampling_distribution)
np.unique(sample, return_counts=True)

(array([b'Iris-setosa', b'Iris-versicolor', b'Iris-virginica'],
       dtype='|S15'),
 array([51, 26, 23]))

(L3) **What is the value of second longest petallength of species setosa**

In [40]:
np.unique(  # Remove repetitions to find the value itself not what is in the second to last POSITION
    np.sort([  # Sort ascending
        iris_dataset_row[2]
        for iris_dataset_row in iris_dataset
        if iris_dataset_row[4]  == b'Iris-setosa'
    ]
    )
)[-2]

b'1.7'

(L2) **Sort the iris dataset based on sepallength column.**

In [41]:
sorted_sepallength_indices = iris_2d[:,0].argsort()
sorted_iris_array = iris_dataset[sorted_sepallength_indices]
print(sorted_iris_array)

[[b'4.3' b'3.0' b'1.1' b'0.1' b'Iris-setosa']
 [b'4.4' b'3.2' b'1.3' b'0.2' b'Iris-setosa']
 [b'4.4' b'3.0' b'1.3' b'0.2' b'Iris-setosa']
 [b'4.4' b'2.9' b'1.4' b'0.2' b'Iris-setosa']
 [b'4.5' b'2.3' b'1.3' b'0.3' b'Iris-setosa']
 [b'4.6' b'3.6' b'1.0' b'0.2' b'Iris-setosa']
 [b'4.6' b'3.1' b'1.5' b'0.2' b'Iris-setosa']
 [b'4.6' b'3.4' b'1.4' b'0.3' b'Iris-setosa']
 [b'4.6' b'3.2' b'1.4' b'0.2' b'Iris-setosa']
 [b'4.7' b'3.2' b'1.3' b'0.2' b'Iris-setosa']
 [b'4.7' b'3.2' b'1.6' b'0.2' b'Iris-setosa']
 [b'4.8' b'3.0' b'1.4' b'0.1' b'Iris-setosa']
 [b'4.8' b'3.0' b'1.4' b'0.3' b'Iris-setosa']
 [b'4.8' b'3.4' b'1.9' b'0.2' b'Iris-setosa']
 [b'4.8' b'3.4' b'1.6' b'0.2' b'Iris-setosa']
 [b'4.8' b'3.1' b'1.6' b'0.2' b'Iris-setosa']
 [b'4.9' b'2.4' b'3.3' b'1.0' b'Iris-versicolor']
 [b'4.9' b'2.5' b'4.5' b'1.7' b'Iris-virginica']
 [b'4.9' b'3.1' b'1.5' b'0.1' b'Iris-setosa']
 [b'4.9' b'3.1' b'1.5' b'0.1' b'Iris-setosa']
 [b'4.9' b'3.1' b'1.5' b'0.1' b'Iris-setosa']
 [b'4.9' b'3.0' b'1.4' b'0.

(L1) **Find the most frequent value of petal length (3rd column) in iris dataset.**

In [42]:
values, counts = np.unique(iris_dataset[:,2], return_counts=True)
maximum_unique_value_index = counts.argmax()
print(values[maximum_unique_value_index])

b'1.5'


(L2) **Find the position of the first occurrence of a value greater than 1.0 in petalwidth 4th column of iris dataset.**

In [43]:
petalwidth = iris_2d[:, 3]
np.argwhere(petalwidth > 1.0)[0]

array([50])

(L2) **From the array `a`, replace all values greater than 30 to 30 and less than 10 to 10.**

In [47]:
np.random.seed(100)
a = np.random.uniform(1,50, 20)
a[a > 30] = 30
a[a < 10] = 10
print(a)

[27.62684215 14.64009987 21.80136195 42.39403048  1.23122395  6.95688692
 33.86670515 41.466785    7.69862289 29.17957314 44.67477576 11.25090398
 10.08108276  6.31046763 11.76517714 48.95256545 40.77247431  9.42510962
 40.99501269 14.42961361]
[27.62684215 14.64009987 21.80136195 30.         10.         10.
 30.         30.         10.         29.17957314 30.         11.25090398
 10.08108276 10.         11.76517714 30.         30.         10.
 30.         14.42961361]
