<a href="https://colab.research.google.com/github/anupammaurya6767/FakeProfileIdentifier/blob/main/fakeProfile.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Project Title: 🕵️‍♂️ Fake Profile Detector 🕵️‍♀️

## Introduction:

In the era of social media and online interactions, the proliferation of fake profiles has become a significant concern. 🌐 Fake profiles can be used for various malicious purposes, including misinformation, scams, and cyberattacks. Detecting these deceptive profiles is crucial to maintain a secure and trustworthy online environment.

Our project, the Fake Profile Detector, aims to tackle this challenge using the power of machine learning and data analysis. 🤖💻 By building an intelligent model, we endeavor to distinguish between genuine and fake profiles with a high degree of accuracy.

Our mission is to create a tool that safeguards online communities and platforms by automatically identifying suspicious accounts. 👤❌ We believe that with the right combination of data preprocessing, feature engineering, and model selection, we can significantly improve the accuracy of fake profile detection, contributing to a safer online experience for all users.

## Random Forest Classifier

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Load the training dataset
train_data = pd.read_csv('/content/drive/MyDrive/data/train.csv')
# Load the testing dataset
test_data = pd.read_csv('/content/drive/MyDrive/data/test.csv')

# Split the training dataset into features (X_train) and the target variable (y_train)
X_train = train_data.drop('fake', axis=1)  # Features for training
y_train = train_data['fake']  # Target variable for training

# Split the testing dataset into features (X_test) and the target variable (y_test)
X_test = test_data.drop('fake', axis=1)  # Features for testing
y_test = test_data['fake']  # Target variable for testing

# Create a Random Forest classifier
clf = RandomForestClassifier(n_estimators=100, random_state=42)

# Train the model on the training data
clf.fit(X_train, y_train)

# Make predictions on the test data
y_pred = clf.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

# Print other classification metrics
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))


Accuracy: 0.92
              precision    recall  f1-score   support

           0       0.90      0.93      0.92        60
           1       0.93      0.90      0.92        60

    accuracy                           0.92       120
   macro avg       0.92      0.92      0.92       120
weighted avg       0.92      0.92      0.92       120

[[56  4]
 [ 6 54]]


## Data Augmentation:

Augment dataset by generating synthetic examples of both fake and genuine profiles. Techniques like oversampling, SMOTE (Synthetic Minority Over-sampling Technique), or data augmentation for text data can be applied to balance your dataset and potentially improve model performance.

In [2]:
from imblearn.over_sampling import SMOTE

# Apply SMOTE to the training data
smote = SMOTE(random_state=42)
X_train_resampled, y_train_resampled = smote.fit_resample(X_train, y_train)

# Now, you can train your model on X_train_resampled and y_train_resampled


## Model Selection:

Experiment with different machine learning algorithms beyond Random Forest, such as Gradient Boosting (e.g., XGBoost, LightGBM), Support Vector Machines (SVM), or neural networks. Each algorithm may capture different patterns in the data.

In [3]:
import xgboost as xgb

# Create an XGBoost classifier
xgb_model = xgb.XGBClassifier(n_estimators=100, random_state=42)

# Train the XGBoost model
xgb_model.fit(X_train_resampled, y_train_resampled)

# Make predictions
y_pred_xgb = xgb_model.predict(X_test)

# Evaluate the XGBoost model
accuracy_xgb = accuracy_score(y_test, y_pred_xgb)
print(f"XGBoost Accuracy: {accuracy_xgb:.2f}")


XGBoost Accuracy: 0.95
