## Logistic regression model

* The outpyt $y$ can take only a small handful of possible values i.e. several "classes" or "categories".
* Fits an an S-shaped curve $g(z)$ to the data, called sigmoid function or logistic regression function.
* The output of the sigmoid function is always between 0 and 1.
* $g(z)$ = <font size=4>$\frac{1}{1 + e^{-z}}$</font>, where <font size=2>$0<g(z)<1; e\approx2.7$</font>
<br><br> picture of sigmoid <br><br>


* How to model $g(z)$ with  $f_{\vec{w},b}(\vec{x}) = \vec{w}.\vec{x} + b$ ?
* * Substitute $z$ in $g(z)$ with the same old $f_{\vec{w},b}(\vec{x})$, already familiar from regression problems.
* * Keep on modeling $f_{\vec{w},b}(\vec{x})$ with regression gradient descent and cost functions.
* * As a result, we can again take as input data the same set of features $\vec{x}$, but output a number between 0 and 1 for every $x$.
* Think of the output of the logistic regression as a "probability" for the label $y$ to be true or false (i.e. equal to 1 or 0), given a certain input of $x$.
* Example:
* * Let $\vec{x}$ be a feature vector denoting tumor size.
* * Let $y$ be a label (class) denoting malignancy: 1 if the tumor is malignant, and 0 if not.
* * When we get a prediction value of $y=0.7$ - it means 70% chance for $y$ to be "true" (i.e. equal to 1) and therfore for the tumor to be malignant.
<br><br> picture of sigmoid from the example here <br><br>


* A valid notation for logistic regression is also: $f_{\vec{w},b}(\vec{x}) = P(y=1 \mid \vec{x};\vec{w},b)$, where $P(y=1) + P(y=0) = 1$
* * This litteraly means "probability that $y$ is 1, given input $\vec{x}$ and parameters $\vec{w},b$
 

## Decision boundary of logistic regression

* The decision boundary is the line given by the equation: $z=\vec{w}.\vec{x}+b=0$
* The model predicts "true" when $\vec{w}.\vec{x}+b \geq 0$
* Example <b>I</b>:
* * $f_{\vec{w},b}(\vec{x}) = g(z) = g(w_1x_1 + w_2x_2 + b)$, where $\vec{w}=(1, 1), b=-3$
* * $z = \vec{w}.\vec{x}+b = x_1 + x_2 -3 = 0$
* * The decision boundry is given by the srtaight line $z: x_1+x_2=3$
<br><br> picture of decision boundary from the example 1 here <br><br>

* The decision boundary can also be different from a straight line.
* With a higher-order polynomial terms in $z$, the decision boundary can be a quite complex curve i.e. the logistic regression can fit pretty complex data.
* Example <b>II</b>:
* * $f_{\vec{w},b}(\vec{x}) = g(z) = g(w_1x_1^2 + w_2x_2^2 + b)$, where $\vec{w}=(1, 1), b=-1$
* * $z = \vec{w}.\vec{x}+b = x_1^2 + x_2^2 -1 = 0$
* * The decision boundry is given by the circle $z: x_1^2+x_2^2=1$
<br><br> picture of decision boundary from the example 2 here <br><br>

How to Do Classification with Scikit-Learn 

You can use scikit-learn to perform classification using any of its numerous classification algorithms (also known as classifiers), including: 

Decision Tree/Random Forest – the Decision Tree classifier has dataset attributes classed as nodes or branches in a tree. The Random Forest classifier is a meta-estimator that fits a forest of decision trees and uses averages to improve prediction accuracy.

K-Nearest Neighbors (KNN) – a simple classification algorithm, where K refers to the square root of the number of training records.

Linear Discriminant Analysis – estimates the probability of a new set of inputs for every class.

Logistic Regression – a model with an input variable (x) and an output variable (y), which is a discrete value of either 1 (yes) or 0 (no).

Naive Bayes – a family of classifiers based on a simple Bayesian model that is comparatively fast and accurate. Bayesian theory explores the relationship between probability and possibility.

Support Vector Machines (SVMs) – a model with associated learning algorithms that analyze data for classification. Also known as Support-Vector Networks. 
