<a href="https://colab.research.google.com/github/gndede/python/blob/main/SGD_Classification_223.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **SGD Classification Example with SGDClassifier in Python**

Applying the Stochastic Gradient Descent (SGD) to the regularized linear methods can help building an estimator for classification and regression problems.

    Scikit-learn API provides the SGDClassifier class to implement SGD method for classification problems. The SGDClassifier applies regularized linear model with SGD learning to build an estimator. The SGD classifier works well with large-scale datasets and it is an efficient and easy to implement method.

    In this tutorial, we'll briefly learn how to classify data by using the SGDClassifier class in Python. The tutorial covers:

1. Preparing the data
2. Training the model
3. Predicting and accuracy check
4. Iris dataset classification example
5. Source code listing

We'll start by loading the required libraries and functions.

In [None]:
from sklearn.linear_model import SGDClassifier
from sklearn.datasets import load_iris
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from sklearn.preprocessing import scale

**Preparing the data**

    First, we'll generate random classification dataset with make_classification() function. The dataset contains 3 classes with 10 features and the number of samples is 5000.

    

In [None]:
x, y = make_classification(n_samples=5000, n_features=10, 
                           n_classes=3, 
                           n_clusters_per_class=1)

Then, we'll split the data into train and test parts. Here, we'll extract 15 percent of it as test data.

In [None]:
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size = 0.15)

Training the model.
Next, we'll define the classifier by using the SGDClassifier class. Then fit it on the train data. 

In [None]:
sgdc = SGDClassifier(max_iter=1000, tol=0.01)
print(sgdc)
 
sgdc.fit(xtrain, ytrain)

After the training the classifier, we'll check the model accuracy score.


In [None]:
score = sgdc.score(xtrain, ytrain)
print("Training score: ", score) 
 
#Training Score:  0.8454117647058823 0.9670588235294117

**Predicting and accuracy check**

     Now, we can predict the test data by using the trained model. After the prediction, we'll check the accuracy level by using the confusion matrix function.

In [None]:
ypred = sgdc.predict(xtest)

cm = confusion_matrix(ytest, ypred)
print(cm) 
 
'''[[215   6  30]
 [  8 236   4]
 [ 54  21 176]]'''

We can also create a classification report by using classification_report() function on predicted data to check the other accuracy metrics.

In [None]:
cr = classification_report(ytest, ypred)
print(cr)

**Iris dataset classification example**

We'll load the Iris dataset with load_iris() function, extract the x and y parts, then split into the train and test parts. It is better to scale data to improve the training accuracy.

# Iris dataset example 


In [None]:
iris = load_iris()
x, y = iris.data, iris.target
x = scale(x)
xtrain, xtest, ytrain, ytest=train_test_split(x, y, test_size=0.15)

Then, we'll use the same method mentioned above.

In [None]:
sgdc = SGDClassifier(max_iter=1000, tol=0.01)
print(sgdc)

sgdc.fit(xtrain, ytrain)
score = sgdc.score(xtrain, ytrain)
print("Score: ", score)

ypred = sgdc.predict(xtest)

cm = confusion_matrix(ytest, ypred)
print(cm)

cr = classification_report(ytest, ypred)
print(cr) 


   In this tutorial, we've briefly learned how to classify data by using Scikit-learn's SGDClassifier class in Python. The full source code is listed below.

In [None]:
#Full source code:
from sklearn.linear_model import SGDClassifier
from sklearn.datasets import load_iris
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from sklearn.preprocessing import scale

x, y = make_classification(n_samples=5000, n_features=10, 
                           n_classes=3, n_clusters_per_class=1)

xtrain, xtest, ytrain, ytest=train_test_split(x, y, test_size=0.15)

sgdc = SGDClassifier(max_iter=1000, tol=0.01)
print(sgdc)

sgdc.fit(xtrain, ytrain)

score = sgdc.score(xtrain, ytrain)
print("Training score: ", score)

ypred = sgdc.predict(xtest)
cm = confusion_matrix(ytest, ypred)
print(cm)

cr = classification_report(ytest, ypred)
print(cr)


# Iris dataset example
iris = load_iris()
x, y = iris.data, iris.target
x = scale(x)

xtrain, xtest, ytrain, ytest=train_test_split(x, y, test_size=0.15)

sgdc = SGDClassifier(max_iter=1000, tol=0.01)
print(sgdc)

sgdc.fit(xtrain, ytrain)
score = sgdc.score(xtrain, ytrain)
print("Score: ", score)

ypred = sgdc.predict(xtest)

cm = confusion_matrix(ytest, ypred)
print(cm)

cr = classification_report(ytest, ypred)
print(cr)