This project demonstrates data loading, exploration, analysis, and visualization using pandas
and matplotlib
(with seaborn
for styling) in Python.
The Iris dataset from sklearn.datasets
is used for this analysis. It is a classic dataset in machine learning and statistics, consisting of 150 samples from three species of Iris (Iris setosa, Iris virginica, and Iris versicolor). Four features were measured from each sample: sepal length, sepal width, petal length, and petal width.
- Loaded the Iris dataset using
sklearn.datasets.load_iris()
and converted it to a pandas DataFrame. - Displayed the first 5 rows using
.head()
. - Inspected the dataset structure and data types using
.info()
. - Checked for missing values using
.isnull().sum()
, confirming no missing values in the Iris dataset.
- Computed descriptive statistics for numerical columns using
.describe()
. - Grouped the dataset by 'species' and calculated the average measurements for each feature.
Four different types of visualizations were created to understand the dataset better:
- Histogram: Shows the distribution of 'Petal Length (cm)'.
- Scatter Plot: Illustrates the relationship between 'Sepal Length (cm)' and 'Sepal Width (cm)', with points colored by 'species'.
- Bar Chart: Compares the average 'Petal Length (cm)' across different Iris species.
- Line Chart: Displays the mean of each feature across the three species, providing a comparative view of feature measurements.
All plots are saved in the plots/
directory.
- Ensure you have Python installed (preferably Python 3.8+).
- Navigate to the project directory.
- Create and activate a virtual environment (recommended):
python3 -m venv venv source venv/bin/activate
- Install the required libraries:
pip install pandas scikit-learn matplotlib seaborn
- Run the script:
This will generate the analysis output in the console and save the plots in the
python3 analysis.py
plots/
directory.
- Species Differentiation: Petal length and petal width are highly effective features for distinguishing between Iris species, especially between Setosa and the other two.
- Setosa Characteristics: Iris Setosa generally has smaller sepal and petal dimensions compared to Iris Versicolor and Iris Virginica.
- Virginica Characteristics: Iris Virginica typically exhibits the largest sepal and petal dimensions among the three species.
- Feature Overlap: There is some overlap in sepal measurements between Versicolor and Virginica, but petal measurements provide clearer separation.