# Iris Dataset with NumPy
Goal: In this part of the lab, you will load the Iris dataset, extract its features
and labels, and analyze it using NumPy’s statistical tools.
## Feature extraction and statistical analysis
In this exercise, you will work with the Iris dataset to practice basic data handling and statistical analysis using NumPy. After downloading and saving the
iris.csv file in your working directory, load it into a NumPy array using the
following command:

In [36]:
import numpy as np
X = np.genfromtxt ('iris.csv', delimiter = ',', usecols = (0, 1, 2, 3), dtype = float )
y = np.genfromtxt ('iris.csv', delimiter = ',', usecols = 4 , dtype =str)

print(X.shape, X.dtype)
print(y.shape, y.dtype)


feature_names = ["sepal length", "sepal width", "petal length", "petal width"]

means = np.mean(X, axis=0)
stds  = np.std(X, axis=0)

for name, mean, std in zip(feature_names, means, stds):
    print(f"{name:12}: mean = {mean:.10f}, std = {std:.10f}")

# print (X[:,1:2]) select 1 col at time


(150, 4) float64
(150,) <U15
sepal length: mean = 5.8433333333, std = 0.8253012918
sepal width : mean = 3.0540000000, std = 0.4321465801
petal length: mean = 3.7586666667, std = 1.7585291834
petal width : mean = 1.1986666667, std = 0.7606126186


## Computing global statistics: Using NumPy’s aggregation functions,
Compute the mean and standard deviation for each of the four features
across the entire dataset (sepal length, sepal width, petal length, petal
width).


In [64]:
classes = np.unique(y)
#print(classes)
for c in classes:
    mask = (y == c)
    # the shape of y is just (150,1) instead X is (150,4). Thanks to broadcasting we can easily mask the two
    mean = X[mask].mean(axis=0)
    std = X[mask].std(axis=0)
    print (c, mean, std)
    #print (mask)

Iris-setosa [5.006 3.418 1.464 0.244] [0.34894699 0.37719491 0.17176728 0.10613199]
Iris-versicolor [5.936 2.77  4.26  1.326] [0.51098337 0.31064449 0.46518813 0.19576517]
Iris-virginica [6.588 2.974 5.552 2.026] [0.62948868 0.31925538 0.54634787 0.27188968]


In [69]:
print(X.shape)

x_mean = X.mean(axis = 0)
x_std = X.std(axis = 0)
x_norm = (X - x_mean) / x_std

print (x_norm)


(150, 4)
[[-9.00681170e-01  1.03205722e+00 -1.34127240e+00 -1.31297673e+00]
 [-1.14301691e+00 -1.24957601e-01 -1.34127240e+00 -1.31297673e+00]
 [-1.38535265e+00  3.37848329e-01 -1.39813811e+00 -1.31297673e+00]
 [-1.50652052e+00  1.06445364e-01 -1.28440670e+00 -1.31297673e+00]
 [-1.02184904e+00  1.26346019e+00 -1.34127240e+00 -1.31297673e+00]
 [-5.37177559e-01  1.95766909e+00 -1.17067529e+00 -1.05003079e+00]
 [-1.50652052e+00  8.00654259e-01 -1.34127240e+00 -1.18150376e+00]
 [-1.02184904e+00  8.00654259e-01 -1.28440670e+00 -1.31297673e+00]
 [-1.74885626e+00 -3.56360566e-01 -1.34127240e+00 -1.31297673e+00]
 [-1.14301691e+00  1.06445364e-01 -1.28440670e+00 -1.44444970e+00]
 [-5.37177559e-01  1.49486315e+00 -1.28440670e+00 -1.31297673e+00]
 [-1.26418478e+00  8.00654259e-01 -1.22754100e+00 -1.31297673e+00]
 [-1.26418478e+00 -1.24957601e-01 -1.34127240e+00 -1.44444970e+00]
 [-1.87002413e+00 -1.24957601e-01 -1.51186952e+00 -1.44444970e+00]
 [-5.25060772e-02  2.18907205e+00 -1.45500381e+00 -1.