# USING PASSIVE AGGRESSIVE CLASSIFER

In [1]:
import numpy as np
import pandas as pd
import itertools
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import PassiveAggressiveClassifier
from sklearn.metrics import accuracy_score, confusion_matrix

In [2]:
#Read the data
df=pd.read_csv('DATASET.csv')

#Get shape and head
df.shape
df.head()

Unnamed: 0.1,Unnamed: 0,title,text,label
0,8476,You Can Smell Hillary’s Fear,"Daniel Greenfield, a Shillman Journalism Fello...",FAKE
1,10294,Watch The Exact Moment Paul Ryan Committed Pol...,Google Pinterest Digg Linkedin Reddit Stumbleu...,FAKE
2,3608,Kerry to go to Paris in gesture of sympathy,U.S. Secretary of State John F. Kerry said Mon...,REAL
3,10142,Bernie supporters on Twitter erupt in anger ag...,"— Kaydee King (@KaydeeKing) November 9, 2016 T...",FAKE
4,875,The Battle of New York: Why This Primary Matters,It's primary day in New York and front-runners...,REAL


In [3]:
#DataFlair - Get the labels
labels=df.label
labels.head()

0    FAKE
1    FAKE
2    REAL
3    FAKE
4    REAL
Name: label, dtype: object

In [4]:
df.head()

Unnamed: 0.1,Unnamed: 0,title,text,label
0,8476,You Can Smell Hillary’s Fear,"Daniel Greenfield, a Shillman Journalism Fello...",FAKE
1,10294,Watch The Exact Moment Paul Ryan Committed Pol...,Google Pinterest Digg Linkedin Reddit Stumbleu...,FAKE
2,3608,Kerry to go to Paris in gesture of sympathy,U.S. Secretary of State John F. Kerry said Mon...,REAL
3,10142,Bernie supporters on Twitter erupt in anger ag...,"— Kaydee King (@KaydeeKing) November 9, 2016 T...",FAKE
4,875,The Battle of New York: Why This Primary Matters,It's primary day in New York and front-runners...,REAL


In [6]:
#DataFlair - Split the dataset
x_train,x_test,y_train,y_test=train_test_split(df['text'], labels, test_size=0.2, random_state=1)

`The TfidfVectorizer will tokenize documents,
learn the vocabulary and inverse document frequency weightings, and allow you to encode new documents.`

In [7]:
#DataFlair - Initialize a TfidfVectorizer
tfidf_vectorizer=TfidfVectorizer(stop_words='english', max_df=0.7)

#DataFlair - Fit and transform train set, transform test set
tfidf_train=tfidf_vectorizer.fit_transform(x_train) 
tfidf_test=tfidf_vectorizer.transform(x_test)

- Next, we’ll initialize a PassiveAggressiveClassifier. This is. We’ll fit this on tfidf_train and y_train.
- Then, we’ll predict on the test set from the TfidfVectorizer 
- and calculate the accuracy with accuracy_score() from sklearn.metrics

The Passive-Aggressive algorithms are a family of Machine learning algorithms that 
are not very well known by beginners and even intermediate Machine Learning enthusiasts. 
However, they can be very useful and efficient for certain applications.

Note: This is a high-level overview of the algorithm explaining how it works and when to use it. 
It does not go deep into the mathematics of how it works.
Passive-Aggressive algorithms are generally used for large-scale learning.
It is one of the few ‘online-learning algorithms‘. In online machine learning algorithms, 
the input data comes in sequential order and the machine learning model is updated step-by-step, 
as opposed to batch learning, where the entire training dataset is used at once. This is very useful in 
situations where there is a huge amount of data and it is computationally infeasible to train the entire dataset 
because of the sheer size of the data. We can simply say that an online-learning algorithm will get a training 
example, update the classifier, and then throw away the example.

A very good example of this would be to detect fake news on a social media website like Twitter, 
where new data is being added every second. To dynamically read data from Twitter continuously, 
the data would be huge, and using an online-learning algorithm would be ideal.

Passive-Aggressive algorithms are somewhat similar to a Perceptron model, 
in the sense that they do not require a learning rate. However, they do include a regularization parameter.

How Passive-Aggressive Algorithms Work:
Passive-Aggressive algorithms are called so because :



Passive: If the prediction is correct, keep the model and do not make any changes. i.e., the data in the example is not enough to cause any changes in the model. 
Aggressive: If the prediction is incorrect, make changes to the model. i.e., some change to the model may correct it.

In [9]:
#DataFlair - Initialize a PassiveAggressiveClassifier
pac=PassiveAggressiveClassifier(max_iter=50)
pac.fit(tfidf_train,y_train)

#DataFlair - Predict on the test set and calculate accuracy
y_pred=pac.predict(tfidf_test)
score=accuracy_score(y_test,y_pred)
print(f'Accuracy: {round(score*100,2)}%')

Accuracy: 94.16%


# Confusion Matrix

In [10]:
#DataFlair - Build confusion matrix
confusion_matrix(y_test,y_pred, labels=['FAKE','REAL'])

array([[611,  40],
       [ 34, 582]])