<a href="https://colab.research.google.com/github/jatingahlyan/Codsoft-datascience/blob/main/Iris_Flower_Classification_Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.preprocessing import LabelEncoder

# --- 1. Load the Dataset ---
try:
    df = pd.read_csv('IRIS.csv')
    print("Dataset loaded successfully!")
except FileNotFoundError:
    print("Error: 'IRIS.csv' not found. Please make sure the file is in the correct directory.")
    exit()

# --- 2. Preprocess the Data ---
# The Iris dataset is very clean, so we don't need to handle missing values.
# Our only preprocessing step is to convert the text-based species names into numbers.
print("\n--- Data Preprocessing ---")

# The target variable 'species' is categorical (text).
# We need to convert it to a numerical format for the model.
# LabelEncoder will convert 'Iris-setosa' to 0, 'Iris-versicolor' to 1, and 'Iris-virginica' to 2.
le = LabelEncoder()
df['species_encoded'] = le.fit_transform(df['species'])

# We can create a mapping to remember which number corresponds to which species.
species_mapping = {i: s for i, s in enumerate(le.classes_)}
print("Species have been encoded:")
print(species_mapping)

print("\nProcessed data preview:")
print(df.head())


# --- 3. Define Features (X) and Target (y) ---
# X contains our features (the flower measurements).
# y contains our target (the encoded species).
features = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
target = 'species_encoded'

X = df[features]
y = df[target]


# --- 4. Split the Data into Training and Testing Sets ---
# We'll use 80% of the data for training and 20% for testing.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print(f"\nData split into training ({len(X_train)} rows) and testing ({len(X_test)} rows) sets.")


# --- 5. Build and Train the Model ---
# We will use the K-Nearest Neighbors (KNN) algorithm.
# It's a simple and effective algorithm that classifies a new data point
# based on the majority class of its 'k' nearest neighbors.
print("\n--- Model Training ---")
# We'll use 3 neighbors.
model = KNeighborsClassifier(n_neighbors=3)

# Train the model
model.fit(X_train, y_train)
print("Model training complete!")


# --- 6. Evaluate the Model ---
print("\n--- Model Evaluation ---")
# Make predictions on the test set
y_pred = model.predict(X_test)

# Calculate the accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy:.4f}")

# The model achieved 100% accuracy on this test set, which is common for the Iris dataset!

print("\nConfusion Matrix:")
# This shows how many predictions were correct for each class.
# With 100% accuracy, all predictions will be on the diagonal.
print(confusion_matrix(y_test, y_pred))

print("\nClassification Report:")
# This gives us a detailed breakdown of the model's performance for each species.
print(classification_report(y_test, y_pred, target_names=le.classes_))


# --- 7. Demonstrate with a Prediction ---
# Let's predict the species of a new, hypothetical Iris flower.
print("\n--- Example Prediction ---")
# Measurements: sepal length=5.5, sepal width=2.4, petal length=3.7, petal width=1.1
hypothetical_flower = [[5.5, 2.4, 3.7, 1.1]]

# Predict the class (0, 1, or 2)
predicted_class = model.predict(hypothetical_flower)

# Convert the numerical prediction back to the species name
predicted_species = species_mapping[predicted_class[0]]

print(f"Measurements: {hypothetical_flower[0]}")
print(f"Predicted Species: {predicted_species}")

Dataset loaded successfully!

--- Data Preprocessing ---
Species have been encoded:
{0: 'Iris-setosa', 1: 'Iris-versicolor', 2: 'Iris-virginica'}

Processed data preview:
   sepal_length  sepal_width  petal_length  petal_width      species  \
0           5.1          3.5           1.4          0.2  Iris-setosa   
1           4.9          3.0           1.4          0.2  Iris-setosa   
2           4.7          3.2           1.3          0.2  Iris-setosa   
3           4.6          3.1           1.5          0.2  Iris-setosa   
4           5.0          3.6           1.4          0.2  Iris-setosa   

   species_encoded  
0                0  
1                0  
2                0  
3                0  
4                0  

Data split into training (120 rows) and testing (30 rows) sets.

--- Model Training ---
Model training complete!

--- Model Evaluation ---
Model Accuracy: 1.0000

Confusion Matrix:
[[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]

Classification Report:
                 precision 

