
## Goal: Predict the flower's species based on its measurements using a linear model that estimates 
the probability of belonging to each class.

Key Library: Scikit-learn 
Algorithm: Logistic Regression (a generalized linear model for classification)

In [3]:
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report


file_name = pd.read_csv('IRIS.csv')
X = file_name.drop(columns=['species'])
y = file_name['species']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, 
    test_size=0.3, 
    random_state=42, 
    stratify=y
)

# Train the Classifier
model_logreg = LogisticRegression(max_iter=200, solver='liblinear', random_state=42)
model_logreg.fit(X_train, y_train)

# Predict
y_pred = model_logreg.predict(X_test)

# Evaluate and Report Metrics
accuracy = accuracy_score(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

print("--- Logistic Regression Model Performance ---")
print(f"Overall Accuracy: {accuracy:.4f}\n")
print(classification_report(y_test, y_pred))

--- Logistic Regression Model Performance ---
Overall Accuracy: 0.9111

                 precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00        15
Iris-versicolor       1.00      0.73      0.85        15
 Iris-virginica       0.79      1.00      0.88        15

       accuracy                           0.91        45
      macro avg       0.93      0.91      0.91        45
   weighted avg       0.93      0.91      0.91        45



## Problem: Regression (Predicting Length)

Goal: Predict a continuous numerical value the sepal lengthâ€”based on the other measurements. 

Key Library: Scikit-learn 
Algorithm: Linear Regression

In [27]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

file_name = pd.read_csv('IRIS.csv')
print(file_name.head())
# 1. Define X and the new continuous target (y_reg)
X_reg = file_name[['sepal_width', 'petal_length', 'petal_width']]
y_reg = file_name['sepal_length'] # The continuous value we want to predict

# 2. Split Regression Data
X_reg_train, X_reg_test, y_reg_train, y_reg_test = train_test_split(
    X_reg, y_reg, test_size=0.3, random_state=879823
)

# 3. Instantiate and Train the Regressor
model_reg = LinearRegression()
model_reg.fit(X_reg_train, y_reg_train)

# 4. Predict and Evaluate
y_reg_pred = model_reg.predict(X_reg_test)
mse = mean_squared_error(y_reg_test, y_reg_pred)

print(f"Regression Problem (Predicting Sepal Length):")
print(f"Model: Linear Regression")
print(f"Mean Squared Error (MSE): {mse:.4f}")
# MSE is a measure of the average squared difference between predictions and actual values.\n")

   sepal_length  sepal_width  petal_length  petal_width      species
0           5.1          3.5           1.4          0.2  Iris-setosa
1           4.9          3.0           1.4          0.2  Iris-setosa
2           4.7          3.2           1.3          0.2  Iris-setosa
3           4.6          3.1           1.5          0.2  Iris-setosa
4           5.0          3.6           1.4          0.2  Iris-setosa
Regression Problem (Predicting Sepal Length):
Model: Linear Regression
Mean Squared Error (MSE): 0.0906


In [18]:
## Clustering (Finding Groups)

#### Goal: Discove groups in the data without using the known labels (unsupervised learning).

####Key Library: Scikit-learn 
####Algorithm: K-Means Clustering

In [26]:
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import adjusted_rand_score

# Scale Data (Essential in some cassess)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X) # Use all features

# nstantiate and Fit the Clusterer
model_kmeans = KMeans(n_clusters=3, random_state=482738792, n_init=10)
model_kmeans.fit(X_scaled)
cluster_labels = model_kmeans.labels_

# Evaluate Clustering Quality (Comparing found clusters to true species labels)
le = LabelEncoder()
true_labels_int = le.fit_transform(file_name['species'])

ari = adjusted_rand_score(true_labels_int, cluster_labels)

print(f"Clustering Problem (Finding 3 Groups):")
print(f"Model: K-Means Clustering")
print(f"Adjusted Rand Index (ARI): {ari:.4f}")

Clustering Problem (Finding 3 Groups):
Model: K-Means Clustering
Adjusted Rand Index (ARI): 0.6201
