Skip to content

jimmygitz3/python_wk7_assignment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

🌿 Iris Dataset Analysis & Visualization A Journey Through Data Exploration, Insight, and Visual Storytelling

📌 Project Overview This project demonstrates a complete data analysis workflow using the classic Iris dataset. It walks through three essential stages:

  • Loading and Exploring the Dataset
  • Performing Basic Statistical Analysis
  • Creating Insightful Visualizations The goal is to uncover patterns, understand relationships, and present findings in a clear, engaging way—while practicing good coding habits like error handling, modularity, and plot customization.

🧠 Objectives

  • Load and clean a real-world dataset using pandas
  • Perform descriptive and grouped statistical analysis
  • Visualize key relationships and distributions using matplotlib and seaborn
  • Apply best practices in data science: clarity, reproducibility, and insight

📂 Dataset We use the Iris dataset, a well-known multivariate dataset introduced by Ronald Fisher. It contains 150 samples of iris flowers from three species (Setosa, Versicolor, Virginica), with four features:

  • Sepal Length
  • Sepal Width
  • Petal Length
  • Petal Width Each sample is labeled with its species.

🧪 Task 1: Load and Explore the Dataset

  • Load the dataset using pandas and sklearn.datasets.load_iris()
  • Display the first few rows using .head()
  • Check data types and identify missing values
  • Clean the dataset by dropping or filling missing entries
  • Handle file reading and data integrity errors using try-except blocks

📊 Task 2: Basic Data Analysis

  • Use .describe() to compute mean, median, standard deviation, and more
  • Group data by species and compute mean values for each numerical feature
  • Identify patterns such as:
  • Setosa has the smallest petal dimensions
  • Virginica shows the largest overall measurements
  • Strong correlation between sepal and petal lengths

📈 Task 3: Data Visualization We create four customized visualizations to bring the data to life: 1️⃣ Line Chart Simulated time-series of sepal length to show trends across sorted samples. 2️⃣ Bar Chart Average petal length per species for categorical comparison. 3️⃣ Histogram Distribution of sepal width to understand spread and skewness. 4️⃣ Scatter Plot Relationship between sepal length and petal length, colored by species. Each plot includes:

  • Clear titles
  • Labeled axes
  • Legends for context
  • Aesthetic styling using seaborn

🛠️ Technologies Used

  • Python 3.x
  • pandas
  • matplotlib
  • seaborn
  • scikit-learn

🧘 Ubuntu Principles in Practice

  • Community: Connects to open datasets and shares insights
  • Respect: Handles errors gracefully and documents each step
  • Sharing: Organizes visual outputs for easy interpretation
  • Practicality: Solves a real analytical challenge with clarity

📤 How to Run

  • Clone the repository
  • Install dependencies: pip install pandas matplotlib seaborn scikit-learn
  • Run the script in VS Code or Jupyter Notebook

📌 Final Reflection This project is more than just numbers and charts—it’s a demonstration of how data can be transformed into understanding. From loading raw values to crafting visual stories, each step reflects the power of curiosity, structure, and thoughtful design.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages