## PassiveAggressiveClassifier

### Passive: if correct classification, keep the model; Aggressive: if incorrect classification, update to adjust to this misclassified example.

Passive-Aggressive algorithms are generally used for large-scale learning. It is one of the few ‘online-learning algorithms‘. In online machine learning algorithms, the input data comes in sequential order and the machine learning model is updated step-by-step, as opposed to batch learning, where the entire training dataset is used at once. This is very useful in situations where there is a huge amount of data and it is computationally infeasible to train the entire dataset because of the sheer size of the data. We can simply say that an online-learning algorithm will get a training example, update the classifier, and then throw away the example.

## Let's start the work

In [127]:
import os
os.chdir("A:\Thesis\Version7\FakeNewsWithUI")

In [128]:
import pandas as pd

In [129]:
dataframe = pd.read_csv('dataset.csv')
dataframe = dataframe.dropna()
dataframe.head()

Unnamed: 0,title,text,label
0,ABS-CBN News,A Filipina living and working in Trinidad and ...,REAL
1,ABS-CBN News,UAE blocks drone attack; shadowy group claims ...,REAL
2,ABS-CBN News,"Kompanya ni Paolo Bediones, inirereklamo ng co...",REAL
3,ABS-CBN News,Football: World Cup-bound Pinays have busy sch...,REAL
4,ABS-CBN News,Meta's shock share price drop shakes world tec...,REAL


In [130]:
x = dataframe['text']
y = dataframe['label']

In [131]:
x

0      A Filipina living and working in Trinidad and ...
1      UAE blocks drone attack; shadowy group claims ...
2      Kompanya ni Paolo Bediones, inirereklamo ng co...
3      Football: World Cup-bound Pinays have busy sch...
4      Meta's shock share price drop shakes world tec...
                             ...                        
307                                  maayos system namin
308                                  hindi maayos system
309                                                    c
310                                                    e
311                                               qweqwe
Name: text, Length: 307, dtype: object

In [132]:
y

0      REAL
1      REAL
2      REAL
3      REAL
4      REAL
       ... 
307    FAKE
308    FAKE
309    FAKE
310    FAKE
311    FAKE
Name: label, Length: 307, dtype: object

In [133]:
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import PassiveAggressiveClassifier
from sklearn.metrics import accuracy_score, confusion_matrix

In [134]:
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.2,random_state=0)
y_train

7      REAL
45     REAL
97     REAL
92     REAL
137    REAL
       ... 
253    REAL
194    REAL
117    REAL
47     REAL
174    REAL
Name: label, Length: 245, dtype: object

In [135]:
y_train

7      REAL
45     REAL
97     REAL
92     REAL
137    REAL
       ... 
253    REAL
194    REAL
117    REAL
47     REAL
174    REAL
Name: label, Length: 245, dtype: object

In [136]:
tfvect = TfidfVectorizer(stop_words='english',max_df=0.7)
tfid_x_train = tfvect.fit_transform(x_train)
tfid_x_test = tfvect.transform(x_test)

* max_df = 0.50 means "ignore terms that appear in more than 50% of the documents".
* max_df = 25 means "ignore terms that appear in more than 25 documents".

In [137]:
classifier = PassiveAggressiveClassifier(max_iter=50)
classifier.fit(tfid_x_train,y_train)

PassiveAggressiveClassifier(max_iter=50)

In [138]:
y_pred = classifier.predict(tfid_x_test)
score = accuracy_score(y_test,y_pred)
print(f'Accuracy: {round(score*100,2)}%')

Accuracy: 100.0%


In [139]:
cf = confusion_matrix(y_test,y_pred, labels=['FAKE','REAL'])
print(cf)

[[ 1  0]
 [ 0 61]]


In [140]:
def fake_news_det(news):
    input_data = [news]
    vectorized_input_data = tfvect.transform(input_data)
    prediction = classifier.predict(vectorized_input_data)
    print(prediction)

In [141]:
fake_news_det('U.S. Secretary of State John F. Kerry said Monday that he will stop in Paris later this week, amid criticism that no top American officials attended Sundayâ€™s unity march against terrorism.')

['REAL']


In [142]:
fake_news_det("""Go to Article 
President Barack Obama has been campaigning hard for the woman who is supposedly going to extend his legacy four more years. The only problem with stumping for Hillary Clinton, however, is sheâ€™s not exactly a candidate easy to get too enthused about.  """)

['REAL']


In [143]:
import pickle
pickle.dump(classifier,open('model.pkl', 'wb'))

In [144]:
# load the model from disk
loaded_model = pickle.load(open('model.pkl', 'rb'))

In [145]:
def fake_news_det1(news):
    input_data = [news]
    vectorized_input_data = tfvect.transform(input_data)
    prediction = loaded_model.predict(vectorized_input_data)
    print(prediction)

In [146]:
fake_news_det1("""Go to Article 
President Barack Obama has been campaigning hard for the woman who is supposedly going to extend his legacy four more years. The only problem with stumping for Hillary Clinton, however, is sheâ€™s not exactly a candidate easy to get too enthused about.  """)

['REAL']


In [147]:
fake_news_det1("""U.S. Secretary of State John F. Kerry said Monday that he will stop in Paris later this week, amid criticism that no top American officials attended Sundayâ€™s unity march against terrorism.""")

['REAL']


In [148]:
fake_news_det('''U.S. Secretary of State John F. Kerry said Monday that he will stop in Paris later this week, amid criticism that no top American officials attended Sundayâ€™s unity march against terrorism.''')

['REAL']
