# Undertanding the iris flower data set

The notebook aims to undertand the content of the iris flower data set.


## Acknowledgments

- Ronald Fisher ( The data set was outlined in his 1936 paper "The use of multiple measurements in taxonomic problems" as an example of linear discriminant analysis

- Some Python packages as **sklearn**

- https://archive.ics.uci.edu/ml/datasets/iris

- https://en.wikipedia.org/wiki/Iris_(plant))


# Iris flower data set

![irisparts.png](datasets/iris/irisparts.png)

1. Iris plant
    - Flowering plants with showy flowers
    - Between 260–300 different species
    - Many colors


2. Uses
    - Botany: aromatic rhizomes
    - Art: find out about Vincent van Gogh paint "Irises, 1889"
    - Geopolotics: Coat-of-arms
    - Science and technology: computer science, data science ¡¡


3. The dataset description
    - Many observations/measurements/recordings of the characteristics/attributes/variables of some iris flowers
    - Three iris varieties are used: setosa, versicolor, virginica
    - Variables: sepal and petal length and width (4 variables) plus the type of flower (1 variable)
    - Total numer of observations: 150 (50 for each type of flower)


4. Images

![irismeasurements1.png](datasets/iris/irismeasurements1.png)





![irisclasses.png](datasets/iris/irisclasses.png)




# Importing and inspecting the data

In [4]:
# Import the packages that we will be using
import pandas as pd                 # For data handling
import seaborn as sns

# Define the col names for the iris dataset
colnames = ["Sepal_Length", "Sepal_Width","Petal_Length","Petal_Width", "Flower_Name"]

# Dataset url
url = "datasets/iris/iris.csv"

# Load the dataset from the UCI machine learning repository
df = pd.read_csv(url, header = None, names = colnames )



In [10]:
# Número de filas y columnas
Nf, Nc = df.shape
print("Filas:", Nf, "\nColumnas:", Nc)
print("")

# Número de observaciones por clase.
print(df.Flower_Name.value_counts())
print("")

# Tipos de variables
print(df.dtypes)
print("")

df

Filas: 150 
Columnas: 5

Iris-versicolor    50
Iris-setosa        50
Iris-virginica     50
Name: Flower_Name, dtype: int64

Sepal_Length    float64
Sepal_Width     float64
Petal_Length    float64
Petal_Width     float64
Flower_Name      object
dtype: object



Unnamed: 0,Sepal_Length,Sepal_Width,Petal_Length,Petal_Width,Flower_Name
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,Iris-virginica
146,6.3,2.5,5.0,1.9,Iris-virginica
147,6.5,3.0,5.2,2.0,Iris-virginica
148,6.2,3.4,5.4,2.3,Iris-virginica


# Activity: work with the iris dataset

1. Load the iris.csv file in your computer and compare with the data printed above.


2. How many observations (rows) are in total?  150


3. How many variables (columns) are in total? What do they represent?  5


4. How many observatoins are for each type of flower?  50


5. What is the type of data for each variable? Sepal_Length    float64 / Sepal_Width   float64 / Petal_Length    float64 / Petal_Width   float64 / Flower_Name      object


6. What are the units of each variable?  Sepal and Pedal lenght and width: cm  / Flower Name: Iris Setosa, Iris Versicolor, Iris virginica