Problem Statement: **Descriptive Statistics**

* Provide summary statistics (mean, median, min, max, standard deviation) for numeric variables, grouped by a categorical variable.
* Create a list containing numeric values for each response to the categorical variable.
* Write a Python program to display basic statistical details (percentile, mean, standard deviation, etc.) for the species in the Iris dataset.
  * Focus on the species 'Iris-setosa', 'Iris-versicolor', and 'Iris-virginica'.

### import Required Files

In [1]:
import pandas as pd
import numpy as np

### read CSV File

In [2]:
df = pd.read_csv("6_Iris.csv")

In [3]:
df.head()

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa


### Drop the 'Id' column as it is not needed

In [4]:
df.drop("Id", axis=1, inplace=True)

### Get the list of species

In [5]:
species_list = df['Species'].unique()

### Print statstical details

In [6]:
cat_col = "Species"
num_col = ['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm']

In [7]:
for i in num_col:
    print("Mean of", i, ": ", df[i].mean() )

Mean of SepalLengthCm :  5.843333333333334
Mean of SepalWidthCm :  3.0540000000000003
Mean of PetalLengthCm :  3.758666666666666
Mean of PetalWidthCm :  1.1986666666666668


In [8]:
for i in num_col:
    print("Standard Deviation of", i, ": ", df[i].std() )

Standard Deviation of SepalLengthCm :  0.828066127977863
Standard Deviation of SepalWidthCm :  0.4335943113621737
Standard Deviation of PetalLengthCm :  1.7644204199522626
Standard Deviation of PetalWidthCm :  0.7631607417008411


In [9]:
for i in num_col:
    print("Mean of", i, ": ", df[i].mean() )

Mean of SepalLengthCm :  5.843333333333334
Mean of SepalWidthCm :  3.0540000000000003
Mean of PetalLengthCm :  3.758666666666666
Mean of PetalWidthCm :  1.1986666666666668


In [10]:
for i in num_col:
    print("25th Percentile", i, ": ", df[i].quantile(0.25) )

25th Percentile SepalLengthCm :  5.1
25th Percentile SepalWidthCm :  2.8
25th Percentile PetalLengthCm :  1.6
25th Percentile PetalWidthCm :  0.3


### Print basic statistical details for each species

In [11]:
for species in species_list:
    print(f"\nStatistics for: {species}")
    
    # Filter the dataframe for current species
    species_df = df[df['Species'] == species]
    
    # Describe gives count, mean, std, min, 25%, 50%, 75%, max
    stats = species_df.describe()
    
    # Display the statistics
    print(stats)


Statistics for: Iris-setosa
       SepalLengthCm  SepalWidthCm  PetalLengthCm  PetalWidthCm
count       50.00000     50.000000      50.000000      50.00000
mean         5.00600      3.418000       1.464000       0.24400
std          0.35249      0.381024       0.173511       0.10721
min          4.30000      2.300000       1.000000       0.10000
25%          4.80000      3.125000       1.400000       0.20000
50%          5.00000      3.400000       1.500000       0.20000
75%          5.20000      3.675000       1.575000       0.30000
max          5.80000      4.400000       1.900000       0.60000

Statistics for: Iris-versicolor
       SepalLengthCm  SepalWidthCm  PetalLengthCm  PetalWidthCm
count      50.000000     50.000000      50.000000     50.000000
mean        5.936000      2.770000       4.260000      1.326000
std         0.516171      0.313798       0.469911      0.197753
min         4.900000      2.000000       3.000000      1.000000
25%         5.600000      2.525000       4

### Numerical to categorical Conversion

In [12]:
from sklearn.preprocessing import LabelEncoder

In [13]:
le = LabelEncoder()

In [14]:
df["Species"] = le.fit_transform(df["Species"])

In [15]:
df

Unnamed: 0,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,2
146,6.3,2.5,5.0,1.9,2
147,6.5,3.0,5.2,2.0,2
148,6.2,3.4,5.4,2.3,2
