<a href="https://colab.research.google.com/github/fjadidi2001/fake_news_detection/blob/main/DANSE_Mar26.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **Step-by-Step Workflow to Apply DANES Methodology on `facebook-fact-check.csv` Dataset**

---

## **📌 Step 1: Install Required Libraries**

---

---
## **📌 Step 2: Load and Explore the Dataset**
Start by loading and understanding your dataset.

### 🔹 **Check for Missing Values**
---

---
## **📌 Step 3: Define Target Variable**
- If `Rating` column contains fact-checking labels, convert it into a binary/multi-class target variable.
---

---

## **📌 Step 4: Preprocess Text Data (Text Branch)**
Since DANES uses deep learning models, we need to **clean, tokenize, and embed** the text.

### 🔹 **Convert Text to Word Embeddings**
Use **Word2Vec, FastText, or GloVe** to obtain word embeddings.

---







```python
tokenizer = Tokenizer(num_words=5000, oov_token="<OOV>")
tokenizer.fit_on_texts(df["cleaned_text"])

# Convert to sequences and pad them
sequences = tokenizer.texts_to_sequences(df["cleaned_text"])
max_length = 100  # Adjust based on text length distribution
padded_sequences = pad_sequences(sequences, maxlen=max_length, padding="post", truncating="post")
```

---

## **📌 Step 5: Prepare Social Context Features (Social Branch)**
We normalize numerical social features (`share_count`, `reaction_count`, `comment_count`).

```python
from sklearn.preprocessing import StandardScaler

social_features = ["share_count", "reaction_count", "comment_count"]
scaler = StandardScaler()
df[social_features] = scaler.fit_transform(df[social_features])
```

If categorical features like `Post Type` are useful:
```python
df = pd.get_dummies(df, columns=["Post Type"], drop_first=True)
```

---

## **📌 Step 6: Train-Test Split**
```python
from sklearn.model_selection import train_test_split
import numpy as np

X_text = np.array(padded_sequences)
X_social = df[social_features].values
y = df["Rating"].values  # Target variable

# Split data (80% train, 20% test)
X_text_train, X_text_test, X_social_train, X_social_test, y_train, y_test = train_test_split(
    X_text, X_social, y, test_size=0.2, random_state=42
)
```

---

## **📌 Step 7: Build the DANES Model (Deep Learning)**
Now, we create the **Text Branch (LSTM)** and **Social Branch (MLP/CNN)** and combine them.

```python
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Embedding, LSTM, Dense, Input, Concatenate, Dropout

# Define Text Branch
input_text = Input(shape=(max_length,))
embedding_layer = Embedding(input_dim=5000, output_dim=128, input_length=max_length)(input_text)
lstm_layer = LSTM(64, return_sequences=False)(embedding_layer)

# Define Social Branch
input_social = Input(shape=(len(social_features),))
dense_layer = Dense(32, activation="relu")(input_social)

# Concatenate Text & Social Branch
merged = Concatenate()([lstm_layer, dense_layer])
dropout = Dropout(0.3)(merged)
output = Dense(1, activation="sigmoid")(dropout)  # Binary classification

# Build Model
model = Model(inputs=[input_text, input_social], outputs=output)
model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])
model.summary()
```

---

## **📌 Step 8: Train the Model**
```python
model.fit([X_text_train, X_social_train], y_train, validation_split=0.2, epochs=10, batch_size=32)
```

---

## **📌 Step 9: Evaluate the Model**
```python
loss, accuracy = model.evaluate([X_text_test, X_social_test], y_test)
print(f"Test Accuracy: {accuracy:.2f}")
```

To check classification performance:
```python
from sklearn.metrics import classification_report

y_pred = (model.predict([X_text_test, X_social_test]) > 0.5).astype("int")
print(classification_report(y_test, y_pred))
```

---

## **📌 Step 10: Save the Model for Future Use**
```python
model.save("danes_facebook_fake_news.h5")
```