<a href="https://colab.research.google.com/github/rasiq-gulzar/Encryptix/blob/main/IRIS_FLOWER_CLASSIFICATION_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Import necessary libraries
import pandas as pd              # For data manipulation and analysis
import numpy as np               # For numerical operations
import seaborn as sns            # For statistical data visualization
import matplotlib.pyplot as plt  # For creating plots and visualizations
from sklearn.model_selection import train_test_split  # To split data into training and testing sets
from sklearn.preprocessing import LabelEncoder         # To encode categorical variables into numbers
from sklearn.impute import SimpleImputer               # To handle missing values in the dataset
from sklearn.linear_model import LogisticRegression    # The classification algorithm we'll use
from sklearn.metrics import accuracy_score, classification_report  # For evaluating model performance

In [None]:
df = pd.read_csv('IRIS.csv')
# or try 'cp1252' if 'latin-1' still gives errors

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   sepal_length  150 non-null    float64
 1   sepal_width   150 non-null    float64
 2   petal_length  150 non-null    float64
 3   petal_width   150 non-null    float64
 4   species       150 non-null    object 
dtypes: float64(4), object(1)
memory usage: 6.0+ KB


In [None]:
df.shape

(150, 5)

In [None]:
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


In [None]:
df['species'].value_counts()

Unnamed: 0_level_0,count
species,Unnamed: 1_level_1
Iris-setosa,50
Iris-versicolor,50
Iris-virginica,50


In [None]:
df['species'].replace('Iris-setosa',1,inplace=True)
df['species'].replace('Iris-versicolor',2,inplace=True)
df['species'].replace('Iris-virginica',3,inplace=True)
df.head()

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['species'].replace('Iris-setosa',1,inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['species'].replace('Iris-versicolor',2,inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are 

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,1
1,4.9,3.0,1.4,0.2,1
2,4.7,3.2,1.3,0.2,1
3,4.6,3.1,1.5,0.2,1
4,5.0,3.6,1.4,0.2,1


In [None]:

df = df[['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']]

# Handle missing values in the Age column by replacing them with the mean age
# This is necessary because machine learning models can't work with missing (NaN) values
# imputer = SimpleImputer(strategy="mean")  # Create an imputer that replaces missing values with the mean
# df['Duration'] = imputer.fit_transform(df[['Duration']])
# df['Rating'] = imputer.fit_transform(df[['Rating']])
# df['Votes'] = imputer.fit_transform(df[['Votes']]) # Apply the imputer to the Age column

# # Convert categorical 'Sex' variable to numerical values (0 and 1)
# # Machine learning models require numerical input, so text values like 'male' and 'female' must be encoded
# encoder = LabelEncoder()  # Create a label encoder object
# df['Sex'] = encoder.fit_transform(df['Sex'])  # Transform 'male' and 'female' to 1 and 0 respectively

# Split the data into features (X) and target variable (y)
X = df.drop("species", axis=1)  # Features: everything except the Survived column
y = df["species"]               # Target: the Survived column (what we want to predict)

# Split the data into training set (80%) and testing set (20%)
# This allows us to train the model and then test it on unseen data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
# random_state=0 ensures reproducible results (same split every time the code runs)

# Create and train the logistic regression model
# Logistic regression is well-suited for binary classification problems like survival prediction
model = LogisticRegression()  # Initialize the logistic regression model
model.fit(X_train,y_train)   # Train the model using the training data

# Use the trained model to predict survival for the test set
y_pred = model.predict(X_test)  # Make predictions on the test data

# Evaluate the model's performance
accuracy = accuracy_score(y_test, y_pred)  # Calculate the percentage of correct predictions
print(f"Accuracy: {accuracy:.2f}")          # Print the accuracy with 2 decimal places

# Print a detailed classification report
# This shows precision, recall, f1-score and support for each class (survived or not)
print(classification_report(y_test, y_pred))


# Create sample data with exactly matching feature names
sample_data = pd.DataFrame([
    [5.1, 3.5, 1.4, 0.2]  # Your values here
])



# Predict survival for the sample passengers
sample_predictions = model.predict(sample_data)
print("Predictions for species of the folowers):", sample_predictions)
# This shows survival predictions for each passenger in the sample data

Accuracy: 1.00
              precision    recall  f1-score   support

           1       1.00      1.00      1.00        11
           2       1.00      1.00      1.00        13
           3       1.00      1.00      1.00         6

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30

Predictions for species of the folowers): [1]


