Skip to content

Graph a dataset using matplotlib

Khelil Sator edited this page Jun 25, 2019 · 2 revisions

The iris flowers data set quantifies the morphologic variation of Iris flowers of three related species.
The iris dataset consists of measurements of three types of Iris flowers: Iris Setosa, Iris Versicolor, and Iris Virginica.

The iris dataset is intended to be for a supervised machine learning task because it has labels.
It is a classification problem: we are trying to determine the flower categories.
This is a supervised classification problem.

The dataset contains a set of 150 records under five attributes: petal length, petal width, sepal length, sepal width and species.

The data set consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor).
Four features were measured from each sample: the length and the width of the sepals and petals, in centimeters.
Based on the combination of these four features, we can distinguish the species

Classes: 3
Samples per class: 50
Samples total: 150
Dimensionality: 4

Lets use the matplotlib python library to plot the data set

>>> from sklearn.datasets import load_iris
>>> import matplotlib.pyplot as plt

load the data set

>>> iris=load_iris()

Examine the dataset
It has 150 rows and 4 columns

>>> iris.data.shape
(150, 4)

features

>>> iris["feature_names"]
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']

labels

>>> iris["target_names"]
array(['setosa', 'versicolor', 'virginica'], dtype='<U10')

last data point (virginica species)

>>> iris.data[-1]
array([5.9, 3. , 5.1, 1.8])
>>> iris.target[-1]
2

first 3 data points

>>> iris.data[0:3]
array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2]])
>>> iris.target[0:3]
array([0, 0, 0])

extract first column (sepal length) from the array

>>> iris.data[:,[0]]

first column (sepal length) of the first 3 data points

>>> iris.data[0:3,[0]]
array([[5.1],
       [4.9],
       [4.7]])
>>> iris.data[:,[0]][0:3]
array([[5.1],
       [4.9],
       [4.7]])

Graph the sepal length

>>> plt.plot(iris.data[:,[0]])
>>> plt.title('iris')  
>>> plt.ylabel('sepal length (cm)')
>>> plt.show(block=False)

iris sepal length