## Our libraries that we are importing:

In [1]:
# 1. Importing all the libraries necessary for this assignment
import numpy as np
import pandas as pd
from scipy import stats

For this task, we will be using the famous Iris dataset, which includes measurements of 3 different species of Iris. Unfortunately, the download file was corrupted, viewed and saved in the wrong format, lost some data when transferred from MacOS to Windows, and someone accidentally spilled coffee on it. Fortunately, we believe the bulk of the data is still there. Our task will be to clean and transform the data (i.e., feature engineer) into a format that can be analyzed.

## Dataset:

First up, let's retrieve the data available from the damaged file.

In [2]:
# 2. Retrieving data from the damaged Excel file
print('-------------------------------- 2a ------------------------------')
df = pd.read_excel("messed_up_iris.xlsx")

# 2a. Showing the shape of the data
df.shape

-------------------------------- 2a ------------------------------


(150, 8)

In [3]:
# 2b. Showing the head of the data
print('-------------------------------- 2b ------------------------------')
df.head()

-------------------------------- 2b ------------------------------


Unnamed: 0.1,Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species,color,origin
0,0,5.1,3.5,1.4,0.2,setosa,green,usa
1,1,4.9,3.0,1.4,0.2,setosa,yellow,usa
2,2,4.7,3.2,1.3,0.2,setosa,green,usa
3,3,4.6,3.1,1.5,0.2,setosa,orange,japan
4,4,5.0,3.6,1.4,0.2,setosa,blue,europe


In [4]:
# 2c. Removing the extra index column
print('-------------------------------- 2c ------------------------------')
df = df.set_index('Unnamed: 0')

df.head()

-------------------------------- 2c ------------------------------


Unnamed: 0_level_0,sepal_length,sepal_width,petal_length,petal_width,species,color,origin
Unnamed: 0,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
0,5.1,3.5,1.4,0.2,setosa,green,usa
1,4.9,3.0,1.4,0.2,setosa,yellow,usa
2,4.7,3.2,1.3,0.2,setosa,green,usa
3,4.6,3.1,1.5,0.2,setosa,orange,japan
4,5.0,3.6,1.4,0.2,setosa,blue,europe


Now that we've retrieved the damaged Excel file, let's see what we can do to clean it up.

## Data cleaning:

We have a lot of missing data here. Let's start by removing the columns and rows where there are more than 50% of the data missing.

In [5]:
# 3. Removing more than 50% missing data
# 3a. Removing columns with more than 50% missing data
print('-------------------------------- 3a ------------------------------')
threshold = len(df) * .50
df = df.dropna(thresh=threshold, axis=1)

df.shape

-------------------------------- 3a ------------------------------


(150, 6)

In [6]:
# 3b. Removing rows with more than 50% missing data
print('-------------------------------- 3b ------------------------------')
# We have 6 columns, so now we drop rows where 50% data is missing, so we'll filter out
# rows that have missing values in 3 or more columns
df = df.dropna(thresh=df.shape[1]-3)

df.shape

-------------------------------- 3b ------------------------------


(145, 6)

Next, we'll remove duplicate data, if there is any. 

In [7]:
# 4. Removing all possible duplicate rows based on all columns
print('--------------------------------- 4 ------------------------------')
df = df.drop_duplicates()

df.shape

--------------------------------- 4 ------------------------------


(140, 6)

Great! We'll go ahead and dummy code the categorical data.

In [8]:
# 5. Dummy coding the categorical data
print('--------------------------------- 5 ------------------------------')
df = pd.get_dummies(df, columns=['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species', 'origin'], drop_first=True, prefix=['SepalL', 'SepalW', 'PetalL', 'PetalW', 'Species', 'Origin'])

df.head()

--------------------------------- 5 ------------------------------


Unnamed: 0_level_0,SepalL_4.4,SepalL_4.5,SepalL_4.6,SepalL_4.7,SepalL_4.8,SepalL_4.9,SepalL_5.0,SepalL_5.1,SepalL_5.2,SepalL_5.3,SepalL_5.4,SepalL_5.5,SepalL_5.6,SepalL_5.7,SepalL_5.8,SepalL_5.9,SepalL_6.0,SepalL_6.1,SepalL_6.2,SepalL_6.3,SepalL_6.4,SepalL_6.5,SepalL_6.6,SepalL_6.7,SepalL_6.8,SepalL_6.9,SepalL_7.0,SepalL_7.1,SepalL_7.2,SepalL_7.3,SepalL_7.4,SepalL_7.6,SepalL_7.7,SepalL_7.9,SepalL_51.0,SepalL_55.0,SepalL_69.0,SepalL_77.0,SepalW_2.2,SepalW_2.3,...,PetalL_6.3,PetalL_6.4,PetalL_6.7,PetalL_6.9,PetalL_51.0,PetalL_66.0,PetalL_67.0,PetalW_0.2,PetalW_0.3,PetalW_0.4,PetalW_0.5,PetalW_0.6,PetalW_1.0,PetalW_1.1,PetalW_1.2,PetalW_1.3,PetalW_1.4,PetalW_1.5,PetalW_1.6,PetalW_1.8,PetalW_1.9,PetalW_2.0,PetalW_2.1,PetalW_2.2,PetalW_2.3,PetalW_2.4,PetalW_2.5,PetalW_19.0,PetalW_24.0,PetalW_30.0,Species_setosa,Species_versicolor,Species_versicolr,Species_virginia,Species_virginica,Species_west virginia,Origin_europe,Origin_japan,Origin_uas,Origin_usa
Unnamed: 0,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1
1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1
2,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1
3,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0
4,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0


Now we will drop the redundant columns from the dataframe so that we are left with just the dummy coded columns.

In [9]:
# 6. Dropping redundant columns
print('--------------------------------- 6 ------------------------------')
df = df.T.drop_duplicates().T

df.head()

--------------------------------- 6 ------------------------------


Unnamed: 0_level_0,SepalL_4.4,SepalL_4.5,SepalL_4.6,SepalL_4.7,SepalL_4.8,SepalL_4.9,SepalL_5.0,SepalL_5.1,SepalL_5.2,SepalL_5.3,SepalL_5.4,SepalL_5.5,SepalL_5.6,SepalL_5.7,SepalL_5.8,SepalL_5.9,SepalL_6.0,SepalL_6.1,SepalL_6.2,SepalL_6.3,SepalL_6.4,SepalL_6.5,SepalL_6.6,SepalL_6.7,SepalL_6.8,SepalL_6.9,SepalL_7.0,SepalL_7.1,SepalL_7.2,SepalL_7.3,SepalL_7.4,SepalL_7.6,SepalL_7.7,SepalL_7.9,SepalL_51.0,SepalL_55.0,SepalL_69.0,SepalL_77.0,SepalW_2.2,SepalW_2.3,...,PetalL_5.6,PetalL_5.7,PetalL_5.8,PetalL_5.9,PetalL_6.0,PetalL_6.1,PetalL_6.9,PetalL_51.0,PetalL_67.0,PetalW_0.2,PetalW_0.3,PetalW_0.4,PetalW_0.6,PetalW_1.0,PetalW_1.1,PetalW_1.2,PetalW_1.3,PetalW_1.4,PetalW_1.5,PetalW_1.6,PetalW_1.8,PetalW_1.9,PetalW_2.0,PetalW_2.1,PetalW_2.2,PetalW_2.3,PetalW_2.4,PetalW_2.5,PetalW_19.0,PetalW_24.0,PetalW_30.0,Species_setosa,Species_versicolor,Species_versicolr,Species_virginia,Species_virginica,Origin_europe,Origin_japan,Origin_uas,Origin_usa
Unnamed: 0,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1
1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1
2,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1
3,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0
4,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0


Time to remove outliers above 2 standard deviations.

In [14]:
# 7. Removing outliers above 2 standard deviations
print('--------------------------------- 7 ------------------------------')
# I had some difficulty figuring out how to remove the outliers using z-scores...
df['z_score']=stats.zscore(df['data'])

--------------------------------- 7 ------------------------------


KeyError: ignored

Lastly, for the remaining missing data, we'll replace them with the median value.

In [15]:
# 8. Replacing missing data with the median value
print('--------------------------------- 8 ------------------------------')
df.fillna(df.median())

df.head()

--------------------------------- 8 ------------------------------


Unnamed: 0_level_0,SepalL_4.4,SepalL_4.5,SepalL_4.6,SepalL_4.7,SepalL_4.8,SepalL_4.9,SepalL_5.0,SepalL_5.1,SepalL_5.2,SepalL_5.3,SepalL_5.4,SepalL_5.5,SepalL_5.6,SepalL_5.7,SepalL_5.8,SepalL_5.9,SepalL_6.0,SepalL_6.1,SepalL_6.2,SepalL_6.3,SepalL_6.4,SepalL_6.5,SepalL_6.6,SepalL_6.7,SepalL_6.8,SepalL_6.9,SepalL_7.0,SepalL_7.1,SepalL_7.2,SepalL_7.3,SepalL_7.4,SepalL_7.6,SepalL_7.7,SepalL_7.9,SepalL_51.0,SepalL_55.0,SepalL_69.0,SepalL_77.0,SepalW_2.2,SepalW_2.3,...,PetalL_5.6,PetalL_5.7,PetalL_5.8,PetalL_5.9,PetalL_6.0,PetalL_6.1,PetalL_6.9,PetalL_51.0,PetalL_67.0,PetalW_0.2,PetalW_0.3,PetalW_0.4,PetalW_0.6,PetalW_1.0,PetalW_1.1,PetalW_1.2,PetalW_1.3,PetalW_1.4,PetalW_1.5,PetalW_1.6,PetalW_1.8,PetalW_1.9,PetalW_2.0,PetalW_2.1,PetalW_2.2,PetalW_2.3,PetalW_2.4,PetalW_2.5,PetalW_19.0,PetalW_24.0,PetalW_30.0,Species_setosa,Species_versicolor,Species_versicolr,Species_virginia,Species_virginica,Origin_europe,Origin_japan,Origin_uas,Origin_usa
Unnamed: 0,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1
1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1
2,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1
3,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0
4,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0


## Final dataset:

Here is our new dataset.

In [16]:
# 10. Printing our new dataset
print('-------------------------------- 10 ------------------------------')
print(df)

-------------------------------- 10 ------------------------------
            SepalL_4.4  SepalL_4.5  ...  Origin_uas  Origin_usa
Unnamed: 0                          ...                        
0                    0           0  ...           0           1
1                    0           0  ...           0           1
2                    0           0  ...           0           1
3                    0           0  ...           0           0
4                    0           0  ...           0           0
...                ...         ...  ...         ...         ...
145                  0           0  ...           0           0
146                  0           0  ...           0           0
147                  0           0  ...           0           0
148                  0           0  ...           0           0
149                  0           0  ...           0           1

[140 rows x 130 columns]


In [17]:
df.describe()

Unnamed: 0,SepalL_4.4,SepalL_4.5,SepalL_4.6,SepalL_4.7,SepalL_4.8,SepalL_4.9,SepalL_5.0,SepalL_5.1,SepalL_5.2,SepalL_5.3,SepalL_5.4,SepalL_5.5,SepalL_5.6,SepalL_5.7,SepalL_5.8,SepalL_5.9,SepalL_6.0,SepalL_6.1,SepalL_6.2,SepalL_6.3,SepalL_6.4,SepalL_6.5,SepalL_6.6,SepalL_6.7,SepalL_6.8,SepalL_6.9,SepalL_7.0,SepalL_7.1,SepalL_7.2,SepalL_7.3,SepalL_7.4,SepalL_7.6,SepalL_7.7,SepalL_7.9,SepalL_51.0,SepalL_55.0,SepalL_69.0,SepalL_77.0,SepalW_2.2,SepalW_2.3,...,PetalL_5.6,PetalL_5.7,PetalL_5.8,PetalL_5.9,PetalL_6.0,PetalL_6.1,PetalL_6.9,PetalL_51.0,PetalL_67.0,PetalW_0.2,PetalW_0.3,PetalW_0.4,PetalW_0.6,PetalW_1.0,PetalW_1.1,PetalW_1.2,PetalW_1.3,PetalW_1.4,PetalW_1.5,PetalW_1.6,PetalW_1.8,PetalW_1.9,PetalW_2.0,PetalW_2.1,PetalW_2.2,PetalW_2.3,PetalW_2.4,PetalW_2.5,PetalW_19.0,PetalW_24.0,PetalW_30.0,Species_setosa,Species_versicolor,Species_versicolr,Species_virginia,Species_virginica,Origin_europe,Origin_japan,Origin_uas,Origin_usa
count,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,...,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0,140.0
mean,0.014286,0.007143,0.028571,0.007143,0.021429,0.035714,0.057143,0.064286,0.014286,0.007143,0.035714,0.042857,0.028571,0.057143,0.042857,0.021429,0.035714,0.035714,0.014286,0.057143,0.05,0.035714,0.014286,0.05,0.021429,0.021429,0.007143,0.007143,0.021429,0.007143,0.007143,0.007143,0.021429,0.007143,0.007143,0.007143,0.007143,0.007143,0.021429,0.021429,...,0.042857,0.014286,0.021429,0.014286,0.014286,0.021429,0.007143,0.007143,0.007143,0.178571,0.042857,0.042857,0.007143,0.05,0.021429,0.028571,0.071429,0.05,0.078571,0.028571,0.085714,0.028571,0.042857,0.042857,0.021429,0.057143,0.014286,0.021429,0.007143,0.007143,0.007143,0.328571,0.3,0.007143,0.007143,0.335714,0.328571,0.3,0.014286,0.35
std,0.119092,0.084515,0.167197,0.084515,0.145328,0.186243,0.232949,0.246142,0.119092,0.084515,0.186243,0.203262,0.167197,0.232949,0.203262,0.145328,0.186243,0.186243,0.119092,0.232949,0.218728,0.186243,0.119092,0.218728,0.145328,0.145328,0.084515,0.084515,0.145328,0.084515,0.084515,0.084515,0.145328,0.084515,0.084515,0.084515,0.084515,0.084515,0.145328,0.145328,...,0.203262,0.119092,0.145328,0.119092,0.119092,0.145328,0.084515,0.084515,0.084515,0.384368,0.203262,0.203262,0.084515,0.218728,0.145328,0.167197,0.258464,0.218728,0.270035,0.167197,0.280947,0.167197,0.203262,0.203262,0.145328,0.232949,0.119092,0.145328,0.084515,0.084515,0.084515,0.47138,0.459903,0.084515,0.084515,0.473935,0.47138,0.459903,0.119092,0.478682
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0,1.0,1.0,1.0,0.0,1.0
max,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
