<a href="https://colab.research.google.com/github/pratikagithub/Machine-Learning-All-Algorithms/blob/main/Stochastic_Gradient_Descent.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Stochastic Gradient Descent (SGD)** is a simple yet very efficient approach to fitting linear classifiers and regressors under convex loss functions such as (linear) Support Vector Machines and Logistic Regression. Even though SGD has been around in the machine learning community for a long time, it has received a considerable amount of attention just recently in the context of large-scale learning.

SGD has been successfully applied to large-scale and sparse machine learning problems often encountered in text classification and natural language processing. Given that the data is sparse, the classifiers in this module easily scale to problems with more than 10^5 training examples and more than 10^5 features.

**The advantages of Stochastic Gradient Descent are:**

1. Efficiency.

2. Ease of implementation (lots of opportunities for code tuning).

**The disadvantages of Stochastic Gradient Descent include:**

1. SGD requires a number of hyperparameters such as the regularization parameter and the number of iterations.

2. SGD is sensitive to feature scaling.

As other classifiers, SGD has to be fitted with two arrays: an array X of shape (n_samples, n_features) holding the training samples, and an array y of shape (n_samples,) holding the target values (class labels) for the training samples:

In [1]:
from sklearn.linear_model import SGDClassifier
X=[[0.,0.],[1.,1.]]
y=[0,1]
clf=SGDClassifier(loss="hinge", penalty="l2", max_iter=5)
clf.fit(X,y)



After being fitted, the model can then be used to predict new values:

In [2]:
clf.predict([[2.,2.]])

array([1])

SGD fits a linear model to the training data. The coef_ attribute holds the model parameters:

In [3]:
clf.coef_

array([[9.91080278, 9.91080278]])

The intercept_ attribute holds the intercept (aka offset or bias):

In [4]:
clf.intercept_

array([-9.99002993])

Whether or not the model should use an intercept, i.e. a biased hyperplane, is controlled by the parameter fit_intercept.

The signed distance to the hyperplane (computed as the dot product between the coefficients and the input sample, plus the intercept) is given by SGDClassifier.decision_function:

In [5]:
clf.decision_function([[2.,2.]])

array([29.65318117])

The concrete loss function can be set via the loss parameter. SGDClassifier supports the following loss functions:

loss="hinge": (soft-margin) linear Support Vector Machine,

loss="modified_huber": smoothed hinge loss,

loss="log_loss": logistic regression,

and all regression losses below. In this case the target is encoded as -1 or 1, and the problem is treated as a regression problem. The predicted class then correspond to the sign of the predicted target.

Please refer to the mathematical section below for formulas. The first two loss functions are lazy, they only update the model parameters if an example violates the margin constraint, which makes training very efficient and may result in sparser models (i.e. with more zero coefficients), even when L2 penalty is used.

Using loss="log_loss" or loss="modified_huber" enables the predict_proba method, which gives a vector of probability estimates P(y/x)
 per sample x:

In [6]:
clf=SGDClassifier(loss="log_loss", max_iter=5).fit(X,y)
clf.predict_proba([[1.,1.]])



array([[0.00459185, 0.99540815]])

The concrete penalty can be set via the penalty parameter. SGD supports the following penalties:

penalty="l2": L2 norm penalty on coef_.

penalty="l1": L1 norm penalty on coef_.

penalty="elasticnet": Convex combination of L2 and L1; (1 - l1_ratio) * L2 + l1_ratio * L1.

The default setting is penalty="l2". The L1 penalty leads to sparse solutions, driving most coefficients to zero. The Elastic Net [11] solves some deficiencies of the L1 penalty in the presence of highly correlated attributes. The parameter l1_ratio controls the convex combination of L1 and L2 penalty.