# Logistic Regression

Logistic regression aims to solve classification problems. It does this by predicting categorical outcomes, unlike linear regression that predicts a continuous outcome.

In the simplest case there are two outcomes, which is called binomial, an example of which is predicting if a tumor is malignant or benign. Other cases have more than two outcomes to classify, in this case it is called multinomial. A common example for multinomial logistic regression would be predicting the class of an iris flower between 3 different species.

In [1]:
import numpy as np

In [19]:
X = np.array([3.78, 2.44, 2.09, 0.14, 1.72, 1.65, 4.92, 4.37, 4.96, 4.52, 3.69, 5.88]).reshape(-1,1)
y = np.array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1]) 

In [20]:
#Note: X has to be reshaped into a column from a row for the LogisticRegression() function to work.
#y represents whether or not the tumor is cancerous (0 for "No", 1 for "Yes").

In [21]:
from sklearn import linear_model

In [22]:
log = linear_model.LogisticRegression()

In [23]:
log.fit(X, y)

LogisticRegression()

### Prediction 

In [31]:
prediction = log.predict(np.array([2.44]).reshape(-1,1))

In [32]:
prediction

array([0])

# Coefficient

In logistic regression the coefficient is the expected change in log-odds of having the outcome per unit change in X.

This does not have the most intuitive understanding so let's use it to create something that makes more sense, odds.

In [33]:
log_odds = log.coef_

In [34]:
odds = np.exp(log_odds)

In [35]:
odds

array([[4.03541657]])

# Probability
The coefficient and intercept values can be used to find the probability

In [42]:
def log2prob(log, X):
    log.odd = log.coef_ * X + log.intercept_
    odd = np.exp(log.odd)
    probability = odd / (1+odd)
    return (probability)

In [43]:
print(log2prob(log, X)) 

[[0.60749955]
 [0.19268876]
 [0.12775886]
 [0.00955221]
 [0.08038616]
 [0.07345637]
 [0.88362743]
 [0.77901378]
 [0.88924409]
 [0.81293497]
 [0.57719129]
 [0.96664243]]
