## "Iris.csv" outcomes using Pandas 

## Task 1: Identifying and Imputing Missing Data
1. Locate missing data and identify columns with missing values.
2. Fill missing values:
   - **Numerical Columns:** Use the median value of each column.
   - **Categorical Column (`species`):** Use the most frequent value (mode).


In [25]:
import pandas as pd

# Load the Iris dataset
file_path = "Iris.csv"
data = pd.read_csv(file_path)

# Rename columns for clarity and consistent naming
data.rename(columns={
    'SepalLengthCm': 'sepal_length',
    'SepalWidthCm': 'sepal_width',
    'PetalLengthCm': 'petal_length',
    'PetalWidthCm': 'petal_width',
    'Species': 'species'
}, inplace=True)

# Task 1: Identifying and Imputing Missing Data

# Locate Missing Data
print("Missing values in each column before imputation:")
print(data.isnull().sum())

# Handle Missing Data in Numerical Columns
for column in ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']:
    median_value = data[column].median()
    data[column] = data[column].fillna(median_value)

# Handle Missing Data in Categorical Columns
if data['species'].isnull().sum() > 0:
    mode_value = data['species'].mode()[0]
    data['species'] = data['species'].fillna(mode_value)





Missing values in each column before imputation:
Id              0
sepal_length    0
sepal_width     0
petal_length    0
petal_width     0
species         0
dtype: int64


## Task 2: Data Integrity and Transformation
1. Remove duplicate rows to ensure only unique entries remain.
2. Create a new feature:
   - Calculate `sepal_area` and `petal_area`.
   - Add these to form a new column `total_area`.
3. Drop rows with unresolved missing values.


In [26]:
# Task 2: Data Integrity and Transformation

# Remove Duplicate Records
data.drop_duplicates(inplace=True)

# Feature Engineering: Creating total_area
data['sepal_area'] = data['sepal_length'] * data['sepal_width']
data['petal_area'] = data['petal_length'] * data['petal_width']
data['total_area'] = data['sepal_area'] + data['petal_area']

# Drop rows with any remaining missing values
data.dropna(inplace=True)


## Task 3: Aggregation and Data Transformation
1. Convert the categorical `species` column into numerical values using a mapping.
2. Group data by `species_num` and calculate the total sum of numerical columns:
   - `sepal_length`, `sepal_width`, `petal_length`, `petal_width`.


In [27]:
# Task 3: Aggregation and Data Transformation

# Numerical Conversion of Categorical Data
species_mapping = {species: idx for idx, species in enumerate(data['species'].unique())}
data['species_num'] = data['species'].map(species_mapping)

# Apply Grouped Aggregation
grouped_data = data.groupby('species_num')[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']].sum()
print("Grouped aggregation by species:")
print(grouped_data)


Grouped aggregation by species:
             sepal_length  sepal_width  petal_length  petal_width
species_num                                                      
0                   250.3        170.9          73.2         12.2
1                   296.8        138.5         213.0         66.3
2                   329.4        148.7         277.6        101.3


## Task 4: Data Reshaping
1. Reshape the dataset into a **long format**:
   - Stack flower attributes (e.g., `sepal_length`, `sepal_width`, etc.) into a single column.
   - Create a new column for attribute type and another for its value.


In [28]:
# Task 4: Data Reshaping

# Reshape Dataset into Long Format
long_format = pd.melt(data, id_vars=['species', 'species_num'], 
                      value_vars=['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'total_area'],
                      var_name='attribute', value_name='value')

# Save the cleaned data and reshaped data for further inspection
data.to_csv("cleaned_iris.csv", index=False)
long_format.to_csv("long_format_iris.csv", index=False)

print("Cleaning and transformation complete. Check 'cleaned_iris.csv' and 'long_format_iris.csv' for results.")


Cleaning and transformation complete. Check 'cleaned_iris.csv' and 'long_format_iris.csv' for results.
