# Ungraded Lab:  Logistic Regression using Scikit-Learn




## Goals
In this lab you will:
-  Train a logistic regression model using scikit-learn.


## Dataset 
Let's start with the same dataset as before.

In [1]:
import numpy as np

X = np.array([[0.5, 1.5], [1,1], [1.5, 0.5], [3, 0.5], [2, 2], [1, 2.5]])
y = np.array([0, 0, 0, 1, 1, 1])

## Fit the model

The code below imports the [logistic regression model](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression) from scikit-learn. You can fit this model on the training data by calling `fit` function.

In [2]:
from sklearn.linear_model import LogisticRegression

lr_model = LogisticRegression()
lr_model.fit(X, y)

LogisticRegression()

## Make Predictions

You can see the predictions made by this model by calling the `predict` function.

In [3]:
y_pred = lr_model.predict(X)

print("Prediction on training set:", y_pred)

Prediction on training set: [0 0 0 1 1 1]


## Probability Predictions
You can also see the absolute probabililties the model predicts.  
The above is calculated after the model converts probabilities into discrete classes using a threshold (default is 0.5)

In [5]:
y_pred_probs = lr_model.predict_proba(X)

print("Probability Prediction on training set:", y_pred_probs)

Probability Prediction on training set: [[0.68521578 0.31478422]
 [0.66679559 0.33320441]
 [0.64785146 0.35214854]
 [0.32157091 0.67842909]
 [0.27963728 0.72036272]
 [0.39889156 0.60110844]]


## Calculate accuracy

The **accuracy score** is a metric used to measure how well a classification model performs.

It is calculated as:

$$
\text{Accuracy} = \frac{\text{Number of correct predictions}}{\text{Total number of predictions}}
$$

Or in simpler terms:

$$
\text{Accuracy} = \frac{\text{True Positives + True Negatives}}{\text{Total samples}}
$$


To find this, you call the `score` function

In [4]:
print("Accuracy on training set:", lr_model.score(X, y))

Accuracy on training set: 1.0


**Note:**
- For **large datasets**, `SGDClassifier(loss='log')` is usually **faster and more scalable** than `LogisticRegression()` with solvers `'lbfgs'` or `'liblinear'`.  

- For **small to medium datasets**, standard `LogisticRegression()` is **fine and often more accurate** because it uses more stable solvers.
