### **1. Setups & Imports**

In [12]:
import pandas as pd
from fakenews import FakeNewsPipeline, load_dataset, preview
from fakenews.models import LogisticNewsModel, NaiveBayesNewsModel, SVMNewsModel
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

### **2. Load Dataset**

In [2]:
df = load_dataset("news.csv")
preview(df, rows=5)

Unnamed: 0,text,label
0,"Daniel Greenfield, a Shillman Journalism Fello...",FAKE
1,Google Pinterest Digg Linkedin Reddit Stumbleu...,FAKE
2,U.S. Secretary of State John F. Kerry said Mon...,REAL
3,"— Kaydee King (@KaydeeKing) November 9, 2016 T...",FAKE
4,It's primary day in New York and front-runners...,REAL


### **3. Preprocess Labels**

In [3]:
df["label"] = df["label"].str.upper()
df["label"].value_counts()

label
REAL    3171
FAKE    3164
Name: count, dtype: int64

### **4. Initialize Pipeline**

In [4]:
pipeline = FakeNewsPipeline(
    model=NaiveBayesNewsModel(),  
    remove_stopwords=False,       
    stem=False,                   
    feature_max_features=5000
)

print(pipeline)

FakeNewsPipeline(model=NaiveBayesNewsModel(MultinomialNB), preprocessor=Preprocessor(remove_stopwords=False, stem=False))


### **5. Train the Model**

In [5]:
pipeline.fit(df["text"], df["label"])
print("Training complete!")

Training complete!


### **6. Make Predictions**

In [6]:
preds = pipeline.predict(df["text"][:10])
print("Sample Predictions:", preds)

all_preds = pipeline.predict(df["text"])

Sample Predictions: ['FAKE', 'REAL', 'REAL', 'REAL', 'REAL', 'FAKE', 'FAKE', 'FAKE', 'REAL', 'REAL']


### **7. Evaluate Accuracy**

In [7]:
results = pipeline.evaluate(df["text"], df["label"])
print("Overall Accuracy:", results["accuracy"])

Overall Accuracy: 0.8909234411996843


### **8. Inspect Preprocessing**

In [8]:
print("Original text:", df["text"][0])
print("Cleaned text:", pipeline.preprocessor.clean(df["text"][0]))

Original text: Daniel Greenfield, a Shillman Journalism Fellow at the Freedom Center, is a New York writer focusing on radical Islam. 
In the final stretch of the election, Hillary Rodham Clinton has gone to war with the FBI. 
The word “unprecedented” has been thrown around so often this election that it ought to be retired. But it’s still unprecedented for the nominee of a major political party to go war with the FBI. 
But that’s exactly what Hillary and her people have done. Coma patients just waking up now and watching an hour of CNN from their hospital beds would assume that FBI Director James Comey is Hillary’s opponent in this election. 
The FBI is under attack by everyone from Obama to CNN. Hillary’s people have circulated a letter attacking Comey. There are currently more media hit pieces lambasting him than targeting Trump. It wouldn’t be too surprising if the Clintons or their allies were to start running attack ads against the FBI. 
The FBI’s leadership is being warned that 

**9. Try Different Models**

In [13]:
pipeline.model = SVMNewsModel()
pipeline.fit(df["text"], df["label"])
svm_preds = pipeline.predict(df["text"][:10])
print("SVM Predictions:", svm_preds)

SVM Predictions: ['FAKE', 'FAKE', 'REAL', 'FAKE', 'REAL', 'FAKE', 'FAKE', 'FAKE', 'REAL', 'REAL']


### OPTIONAL: Feature Engineering

In [14]:
from sklearn.feature_extraction.text import TfidfVectorizer

pipeline.feature_extractor.vectorizer = TfidfVectorizer(
    max_features=5000,
    ngram_range=(1, 2),
    preprocessor=pipeline.preprocessor.clean
)

pipeline.fit(df["text"], df["label"])
preds = pipeline.predict(df["text"][:10])
print("Predictions with bigrams:", preds)

Predictions with bigrams: ['FAKE', 'FAKE', 'REAL', 'FAKE', 'REAL', 'FAKE', 'FAKE', 'FAKE', 'REAL', 'REAL']


### **Quick Dataset Stats**

In [15]:
print("Dataset size:", len(df))
print(df["label"].value_counts())

Dataset size: 6335
label
REAL    3171
FAKE    3164
Name: count, dtype: int64
