# TruthyNet

Given a short social media claim and some metadata, predict whether it is likely true (1) or likely false/misleading (0), based on patterns we see in previously fact-checked posts.

What we'll learn here is that a Neural Network doesn't actually understand if something is **TRUTH** or **A LIE**. In fact, if we're clever, we can quite easily trick a Neural Network into giving us what we want rather than what you would think the model should predict.

We're going to start with a set of metadata that describes a social media post or news story. These are engineered features derived from posts using NLP and other feature engineering functions.
1. source_credibility_score (0–1)
    * 0 = new or historically unreliable account
    * 1 = long-lived, verified, trusted fact-based account
2. has_citation (0/1)
    * 1 if the post includes a link to a source marked “reputable” in your synthetic world (journal, gov site).
3. emotional_tone_score (0–1)
    * 0 = neutral/clinical
    * 1 = extremely emotional (“outrage!!!”, “SHOCKING”, etc.).
4. all_caps_ratio (0–1)
    * Percentage of characters that are uppercase.
5. exclamation_count (integer or capped at, say, 5–10).
6. reading_level (rough scale, e.g. 1–10)
    * Higher = more complex, formal language.
7. user_past_accuracy (0–1)
    * Historical fraction of this user’s posts that were labeled true in the past.
8. target
    * 1 = truthy
    * 0 = not-truthy


As some examples, the original posts may have been something like these:

_“City health officials report that vaccination rates increased from 72% to 79% in the past year, according to the 2024 Public Health Annual Report.”_
* source_credibility: 0.93 --> official city health account
* has_citation: 1 --> yes
* emotional_tone_score: 0.10 --> neutral
* all_caps_ratio: 0.00 --> no shouting
* exclamation_count: 0 --> none
* reading_level: 8.2 --> slightly technical
* user_past_accuracy: 0.91 --> historically accurate source
* label: 1 (truthy)

_“BREAKING!!! The city just canceled ALL property taxes for anyone who shares this post TODAY!!!”_
* source_credibility: 0.18 --> random account, not trusted
* has_citation: 0 --> no
* emotional_tone_score: 0.91 --> lots of urgency
* all_caps_ratio: 0.35 --> fair amount of caps
* exclamation_count: 7 --> very shouty
* reading_level: 4.1 --> simple
* user_past_accuracy: 0.27 --> often wrong or no evidence of being right
* label: 0 (not truthy)

## Part 1: Create a small MLP
* Input Layer: our 7 features
* Hidden Layer: 1-2 hidden layers, 8-16 units; activation function ReLU or tanh (compare them)
* Output Layer: 1 unit; activation function sigmoid; outputs "probability this is true"

In [145]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

In [146]:
truths_df = pd.read_csv('social_truthy_dataset.csv')
truths_df.head()

Unnamed: 0,post_id,source_credibility,has_citation,emotional_tone,all_caps_ratio,exclamation_count,reading_level,user_past_accuracy,label_is_true
0,1,0.639738,1,0.052059,0.040442,0,8.50742,0.88185,1
1,2,0.905696,1,0.137043,0.188744,0,5.981962,0.66958,1
2,3,0.893307,1,0.330387,0.102059,1,6.082793,0.838317,1
3,4,0.878963,0,0.334067,0.019346,0,10.0,0.803179,1
4,5,0.969615,1,0.290018,0.063099,0,7.871881,0.71993,1


In [147]:
# Split into training / testing sets
feature_cols = [
    'source_credibility',
    'has_citation',
    'emotional_tone',
    'all_caps_ratio',
    'exclamation_count',
    'reading_level',
    'user_past_accuracy'
]

X = truths_df[feature_cols].values
y = truths_df['label_is_true'].values

# IMPORTANT NOTE:
#   The "stratify=y" parameter ensures that we have similar balances of 0/1 values in our test and train data.
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=67, stratify=y
)

In [148]:
# Don't forget to scale!!
# IMPORTANT NOTE:
#   See how we did fit_transform() on X_train and then just transform() on X_test?
#   This is very important. The scaler should only be fit once.
#   If you "refit" the scaler for just the test data, then all the scaling factors get rewritten,
#   and a 0/1 in the training data doesn't mean the same thing as a 0/1 in the test data.

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [149]:
# Build and train a small multi-layer perceptron (MLP)
#   * hidden_layser_sizes = (a,b,c,d,...) depending on the size and number of hidden layers
#   * activation = type of activation function shape we want
#   * See documentation for more details on parameters
#     https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html

mlp = MLPClassifier(
    hidden_layer_sizes=(8,),
    activation="relu",
    solver="adam",
    max_iter=500,
    random_state=67
)

mlp.fit(X_train_scaled, y_train)

In [150]:
# n_iter_ will show us how many iterations were required and
# loss_ will tell us current loss
print(f'Number of iterations performed: {mlp.n_iter_}')
print(f'Current loss value: {mlp.loss_}')

Number of iterations performed: 134
Current loss value: 0.32274261294292417


## Part 2: What makes the NN _think_ something is true?

Look at the weights and try to describe what it is that makes something more "truthy" to this NN.

In [151]:
# Look at model performance using our test data
y_pred = mlp.predict(X_test_scaled)
y_proba = mlp.predict_proba(X_test_scaled)[:, 1]

acc = accuracy_score(y_test, y_pred)
print(f"Accuracy: {acc:.3f}\n")

print("Classification report:")
print(classification_report(y_test, y_pred))

print("Confusion matrix:")
print(confusion_matrix(y_test, y_pred))

Accuracy: 0.892

Classification report:
              precision    recall  f1-score   support

           0       0.89      0.85      0.87       168
           1       0.89      0.93      0.91       232

    accuracy                           0.89       400
   macro avg       0.89      0.89      0.89       400
weighted avg       0.89      0.89      0.89       400

Confusion matrix:
[[142  26]
 [ 17 215]]


### Is it a "good" model?
* Hows the accuracy?
* How is the balance in the confusion matrix?

In [152]:
# MLPClassifier stores:
# - mlp.coefs_[0]: weights from input layer to hidden layer (shape: n_features x n_hidden)
# - mlp.coefs_[1]: weights from hidden layer to output layer (shape: n_hidden x n_outputs)

input_hidden_weights = mlp.coefs_[0]   # shape (7, 8)
hidden_output_weights = mlp.coefs_[1]  # shape (8, 1)

input_hidden_df = pd.DataFrame(
    input_hidden_weights,
    index=feature_cols,
    columns=[f"h{i}" for i in range(input_hidden_weights.shape[1])]
)

hidden_output_df = pd.DataFrame(
    hidden_output_weights,
    index=[f"h{i}" for i in range(hidden_output_weights.shape[0])],
    columns=["output"]
)

input_hidden_df

Unnamed: 0,h0,h1,h2,h3,h4,h5,h6,h7
source_credibility,0.148794,0.372811,0.349167,-0.425671,-0.634031,0.038583,-0.600702,0.582214
has_citation,0.554407,-0.631576,0.728768,0.250496,-0.208013,-0.276909,-0.130979,0.063361
emotional_tone,-0.379482,-0.483993,0.079162,0.257569,-0.281377,0.317323,0.29569,0.146954
all_caps_ratio,-0.340121,-0.186697,-0.057406,0.361622,0.668861,0.164485,0.107976,-0.385506
exclamation_count,0.397987,-0.203225,-0.326629,0.559749,-0.314203,0.170846,0.384783,-0.620323
reading_level,0.47739,-0.373818,0.324018,-0.22304,-0.385018,-0.346125,-0.369427,0.471718
user_past_accuracy,-0.476145,0.459098,-0.132422,0.387248,-0.355152,0.041556,-0.378661,0.51714


### How would you characterize each of this hidden layers?

Remember: 
* Larger weights mean the larger that input is, the more excited the neuron is
* More negative weights mean the larger that input is, the more inhibited the neuron is

In [153]:
hidden_output_df

Unnamed: 0,output
h0,0.357139
h1,0.705156
h2,0.276714
h3,-0.496456
h4,-0.723756
h5,0.326015
h6,-0.481304
h7,0.568857


In [154]:
# Approximate feature influence on the output
# This is a rough, linearized view ignoring ReLU nonlinearity
# Multiply input->hidden weights by hidden->output weights (broadcasting on hidden dimension)

approx_feature_influence = input_hidden_weights @ hidden_output_weights[:, 0]

feature_influence_df = pd.DataFrame({
    "feature": feature_cols,
    "approx_output_weight": approx_feature_influence
}).sort_values("approx_output_weight", ascending=False)

feature_influence_df

Unnamed: 0,feature,approx_output_weight
0,source_credibility,1.715756
5,reading_level,0.719248
6,user_past_accuracy,0.671812
1,has_citation,-0.010701
2,emotional_tone,-0.334405
4,exclamation_count,-0.624411
3,all_caps_ratio,-1.150271


## Part 3: Let's outsmart our NN

What would an obviously absurd claim look like that still looks very "truthy" to our NN?

In [155]:
# Here's a handy function for doing a single prediction...
def predict_single_post_prob(feature_dict):
    """
    feature_dict: dict mapping feature name -> value (original scale, not standardized)
    """
    x = np.array([[feature_dict[col] for col in feature_cols]])
    x_scaled = scaler.transform(x)
    prob_true = mlp.predict_proba(x_scaled)[0, 1]
    return prob_true

In [156]:
# Let's create some scores for this post
#
# “In 2025, the city of Springfield officially extended the calendar to 14 months per year to improve budget planning, according to the 2025 Municipal Finance Review.”

absurd_truthy_post = {
    "source_credibility": 0.95,  # very trusted source
    "has_citation": 1,
    "emotional_tone": 0.05,      # very calm
    "all_caps_ratio": 0.0,
    "exclamation_count": 0,
    "reading_level": 8.5,        # formal language
    "user_past_accuracy": 0.92   # historically accurate user
}

truthy = predict_single_post_prob(absurd_truthy_post)*100
print(f'{truthy:0.2f}% truthy')


97.10% truthy


In [157]:
# True but "clickbaity" post
#
# “BREAKING!!! The city council FINALLY passed the clean water bill TODAY!!!”

true_but_shouty_post = {
    "source_credibility": 0.6,   # mid-level
    "has_citation": 0,           # no citation in the text
    "emotional_tone": 0.9,       # very emotional
    "all_caps_ratio": 0.4,
    "exclamation_count": 8,
    "reading_level": 5.0,
    "user_past_accuracy": 0.6    # mixed record
}

truthy = predict_single_post_prob(true_but_shouty_post)*100
print(f'{truthy:0.2f}% truthy')

4.25% truthy


#### How and Why would you take advantage of your new found knowledge of the weights in this model?