# Logistic regression

Logistic regression is a statistical method for predicting binary classes, where the outcome variable has only two possible classes.

In [80]:
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn.model_selection import train_test_split

In [93]:
# Load the Iris dataset
iris = load_iris()
print(iris.DESCR)

.. _iris_dataset:

Iris plants dataset
--------------------

**Data Set Characteristics:**

:Number of Instances: 150 (50 in each of three classes)
:Number of Attributes: 4 numeric, predictive attributes and the class
:Attribute Information:
    - sepal length in cm
    - sepal width in cm
    - petal length in cm
    - petal width in cm
    - class:
            - Iris-Setosa
            - Iris-Versicolour
            - Iris-Virginica

:Summary Statistics:

                Min  Max   Mean    SD   Class Correlation
sepal length:   4.3  7.9   5.84   0.83    0.7826
sepal width:    2.0  4.4   3.05   0.43   -0.4194
petal length:   1.0  6.9   3.76   1.76    0.9490  (high!)
petal width:    0.1  2.5   1.20   0.76    0.9565  (high!)

:Missing Attribute Values: None
:Class Distribution: 33.3% for each of 3 classes.
:Creator: R.A. Fisher
:Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
:Date: July, 1988

The famous Iris database, first used by Sir R.A. Fisher. The dataset is taken
from Fis

In [82]:
dir(iris)

['DESCR',
 'data',
 'data_module',
 'feature_names',
 'filename',
 'frame',
 'target',
 'target_names']

In [83]:
X, y = iris.data, iris.target

In [84]:
# we know that each flower only has 50 data point and is in order
index = 100

In [85]:
# Step 3: Preprocess the data (for binary classification)
# We'll consider only two classes: Setosa (class 0) and Versicolor (class 1)
X_binary = X[0:index]
y_binary = y[0:index]

In [86]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X_binary, y_binary, test_size=0.2, random_state=42
)

In [87]:
# Initialize the logistic regression model
model = LogisticRegression()

In [88]:
# Fit the model to the data
model.fit(X_train, y_train);

In [89]:
# Predict new values
y_pred = model.predict(X_test)

In [90]:
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)

print(f"Accuracy: {accuracy}")
print(conf_matrix)

Accuracy: 1.0
[[12  0]
 [ 0  8]]
