# Fisher's Iris data set

Fisher's Iris dataset is a well-known dataset in the field of machine learning and statistics. It was introduced by the British statistician and biologist Ronald A. Fisher in 1936 as an example of discriminant analysis. The dataset consists of measurements from three species of iris flowers: setosa, versicolor, and virginica. Each species is represented by 50 samples, making a total of 150 data points. The dataset includes four features (variables) for each sample, which are measurements in centimeters:

Sepal Length: This is the length of the iris flower's sepal (the outermost petal-like part).

Sepal Width: This is the width of the iris flower's sepal.

Petal Length: This is the length of the iris flower's petal (the inner, often colorful part).

Petal Width: This is the width of the iris flower's petal.

## Classification of variables

*Sepal Length*:  
* Variable Type: Continuous
* Scale of Measurement: Ratio
* Python Data Type: Float

*Sepal Width*:   
* Variable Type: Continuous
* Scale of Measurement: Ratio
* Python Data Type: Float

*Petal Length*:   
* Variable Type: Continuous
* Scale of Measurement: Ratio
* Python Data Type: Float

*Petal Width*:   
* Variable Type: Continuous
* Scale of Measurement: Ratio
* Python Data Type: Float

These variables are continuous and measured on a ratio scale, meaning they have a meaningful zero point (e.g., petal width of 0 cm is a valid and meaningful value), and we can perform arithmetic operations on them.

Additionally, there is a categorical variable, which is the species of the iris flower:

Species:
* Variable Type: Categorical
* Scale of Measurement: Nominal
* Python Data Type: String or Categorical Variable
The "species" variable represents the class label or target variable in classification tasks, and it is not on a continuous scale like the other four variables. Instead, it represents categories or groups, and you can use Python's string or categorical data types to represent it.


## Summary Statistcs 

The variables include sepal length, sepal width, petal length, and petal width. They are all measured in centimeters. The below statistics can be used to explain to data:

* Mean: The mean sepal/petal length can be calculated to provide the average length of the sepals/petals in the dataset. It gives a sense of the central tendency of this variable.
* Median: The median sepal/petal length is the middle value when the data is sorted, which provides insight into the data's central position and helps detect outliers.
* Standard Deviation: The standard deviation measures the dispersion or spread of sepal/petal lengths, indicating how much individual data points deviate from the mean.
* Minimum and Maximum: Providing the minimum and maximum values helps identify the range of sepal/petal lengths in the dataset.
* Percentiles (e.g., 25th and 75th Percentiles): These can help describe the spread and distribution of sepalpetal lengths.

In [4]:
import numpy as np
import pandas as pd
# Data file from : https://archive.ics.uci.edu/ml/datasets/iris
# Using pandas to load data as a dataframe from iris.data file and add column headings
data = pd.read_csv("data/iris.csv", sep=",", header = None, names= ["Sepal Length (cm)", "Sepal Width (cm)", "Petal Length (cm)", "Petal Width (cm)", "Species"])

# List of columns to be 
columns = ["Sepal Length (cm)", "Sepal Width (cm)", "Petal Length (cm)", "Petal Width (cm)"]

for column in columns:
    # Calculate statistics for the specific columns
    mean = data[column].mean()
    median = data[column].median()
    mode = data[column].mode().values[0]
    std_dev = data[column].std()
    variance = data[column].var()
    minimum = data[column].min()
    maximum = data[column].max()
    data_range = maximum - minimum
    quartiles = np.percentile(data[column], [25, 50, 75])
    q1 = quartiles[0]
    q2 = quartiles[1]
    q3 = quartiles[2]
    iqr = q3 - q1

    print(f"Statistics for {column}:")
    print(f"Mean: {mean}")
    print(f"Median: {median}")
    print(f"Mode: {mode}")
    print(f"Standard Deviation: {std_dev}")
    print(f"Variance: {variance}")
    print(f"Minimum: {minimum}")
    print(f"Maximum: {maximum}")
    print(f"Range: {data_range}")
    print(f"Q1 (25th percentile): {q1}")
    print(f"Q2 (50th percentile, Median): {q2}")
    print(f"Q3 (75th percentile): {q3}")
    print(f"IQR (Interquartile Range): {iqr}")


Statistics for Sepal Length (cm):
Mean: 5.843333333333335
Median: 5.8
Mode: 5.0
Standard Deviation: 0.8280661279778629
Variance: 0.6856935123042505
Minimum: 4.3
Maximum: 7.9
Range: 3.6000000000000005
Q1 (25th percentile): 5.1
Q2 (50th percentile, Median): 5.8
Q3 (75th percentile): 6.4
IQR (Interquartile Range): 1.3000000000000007
Statistics for Sepal Width (cm):
Mean: 3.0540000000000007
Median: 3.0
Mode: 3.0
Standard Deviation: 0.4335943113621737
Variance: 0.18800402684563763
Minimum: 2.0
Maximum: 4.4
Range: 2.4000000000000004
Q1 (25th percentile): 2.8
Q2 (50th percentile, Median): 3.0
Q3 (75th percentile): 3.3
IQR (Interquartile Range): 0.5
Statistics for Petal Length (cm):
Mean: 3.7586666666666693
Median: 4.35
Mode: 1.5
Standard Deviation: 1.7644204199522617
Variance: 3.1131794183445156
Minimum: 1.0
Maximum: 6.9
Range: 5.9
Q1 (25th percentile): 1.6
Q2 (50th percentile, Median): 4.35
Q3 (75th percentile): 5.1
IQR (Interquartile Range): 3.4999999999999996
Statistics for Petal Width (cm