Skip to content

mokwathedeveloper/Pandas_Matplotlib_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Pandas Matplotlib Analysis

Project Purpose

This project demonstrates data loading, exploration, analysis, and visualization using pandas and matplotlib (with seaborn for styling) in Python.

Dataset Used

The Iris dataset from sklearn.datasets is used for this analysis. It is a classic dataset in machine learning and statistics, consisting of 150 samples from three species of Iris (Iris setosa, Iris virginica, and Iris versicolor). Four features were measured from each sample: sepal length, sepal width, petal length, and petal width.

Analysis Tasks and Visualizations Created

Data Loading and Exploration

  • Loaded the Iris dataset using sklearn.datasets.load_iris() and converted it to a pandas DataFrame.
  • Displayed the first 5 rows using .head().
  • Inspected the dataset structure and data types using .info().
  • Checked for missing values using .isnull().sum(), confirming no missing values in the Iris dataset.

Basic Data Analysis

  • Computed descriptive statistics for numerical columns using .describe().
  • Grouped the dataset by 'species' and calculated the average measurements for each feature.

Data Visualizations

Four different types of visualizations were created to understand the dataset better:

  1. Histogram: Shows the distribution of 'Petal Length (cm)'.
  2. Scatter Plot: Illustrates the relationship between 'Sepal Length (cm)' and 'Sepal Width (cm)', with points colored by 'species'.
  3. Bar Chart: Compares the average 'Petal Length (cm)' across different Iris species.
  4. Line Chart: Displays the mean of each feature across the three species, providing a comparative view of feature measurements.

All plots are saved in the plots/ directory.

How to Run the Script

  1. Ensure you have Python installed (preferably Python 3.8+).
  2. Navigate to the project directory.
  3. Create and activate a virtual environment (recommended):
    python3 -m venv venv
    source venv/bin/activate
  4. Install the required libraries:
    pip install pandas scikit-learn matplotlib seaborn
  5. Run the script:
    python3 analysis.py
    This will generate the analysis output in the console and save the plots in the plots/ directory.

Key Insights or Findings

  • Species Differentiation: Petal length and petal width are highly effective features for distinguishing between Iris species, especially between Setosa and the other two.
  • Setosa Characteristics: Iris Setosa generally has smaller sepal and petal dimensions compared to Iris Versicolor and Iris Virginica.
  • Virginica Characteristics: Iris Virginica typically exhibits the largest sepal and petal dimensions among the three species.
  • Feature Overlap: There is some overlap in sepal measurements between Versicolor and Virginica, but petal measurements provide clearer separation.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages