## **AI GENERATED IMAGE DETECTION**

**Problem Statement**

With the advent of generative AI, it has become easily difficult to separate real data from AI-Generated.
The goal is to develop a model that can identify a fake photo created by AI.

### **IMPORTING PACKAGES**

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import f1_score
from sklearn.metrics import precision_recall_curve
from sklearn.metrics import confusion_matrix
from tensorflow.keras import layers, models
from tensorflow.keras.callbacks import History

%matplotlib inline

### **LOAD TRAIN DATA**
* *A total of 5250 Data Points with 1200 features in the Training dataset*

In [None]:
df_train = pd.read_csv('Dataset/train.csv')

### **DATA PREPROCESSING**

##### Split train data into features and labels

In [None]:
imgFeaturesTrain = df_train.drop(['labels'], axis=1)
imgLabelsTrain = df_train[['labels']]

#### Normalize the data in each row

In [None]:
scaler = MinMaxScaler()
normalized_data = imgFeaturesTrain.apply(lambda row: scaler.fit_transform(row.values.reshape(-1, 1)).flatten(), axis=1)

# Create a new DataFrame with the normalized data
normalized_imgFeatures = pd.DataFrame(normalized_data.tolist(), columns=imgFeaturesTrain.columns)

normalized_imgFeatures.head()

#### Converting normalized_imgFeatures and imgLabels to NumPy arrays

In [None]:
normalized_imgFeatures = normalized_imgFeatures.to_numpy()
imgLabels = imgLabelsTrain.to_numpy()

#### Reshape the normalized_imgFeatures array

In [None]:
reshapeImgFeaturesTrain = np.reshape(normalized_imgFeatures, (-1, 20, 20, 3))

In [None]:
X_train, X_test, y_train, y_test = train_test_split(reshapeImgFeaturesTrain, imgLabels, test_size=0.3, random_state=42)

### **CNN MODEL FOR CLASSIFICATION**

In [None]:
# Define the CNN model
cnnModel = models.Sequential([
    layers.Conv2D(filters=32, kernel_size=(3, 3), activation='relu', input_shape=(20, 20, 3)),
    layers.MaxPooling2D(pool_size=(2, 2)),
    layers.Conv2D(filters=64, kernel_size=(3, 3), activation='relu'),
    layers.MaxPooling2D(pool_size=(2, 2)),
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(1, activation='sigmoid')
])

# Compile the model
cnnModel.compile(optimizer='adam',
                 loss='binary_crossentropy',
                 metrics=['accuracy'])

#### Train the model

In [None]:
cnnModel.fit(X_train, y_train, epochs=20, batch_size=50)

#### Evaluate the model on test data

In [None]:
loss, accuracy = cnnModel.evaluate(X_test, y_test)

#### Make predictions

In [None]:
predictions = cnnModel.predict(X_test)
binary_predictions = (predictions > 0.5).astype(int)

### **ACCURACY METRICS**

#### **CONFUSION MATRIX**

In [None]:
# Compute confusion matrix
cm = confusion_matrix(y_test, binary_predictions)

# Define class labels 
class_labels = ['Class 0', 'Class 1']

# Plot confusion matrix
plt.imshow(cm, interpolation='nearest', cmap=plt.cm.Reds)
plt.colorbar()
tick_marks = np.arange(len(class_labels))
plt.xticks(tick_marks, class_labels)
plt.yticks(tick_marks, class_labels)
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.title('Confusion Matrix')

# Add text annotations in each cell
thresh = cm.max() / 2
for i in range(cm.shape[0]):
    for j in range(cm.shape[1]):
        plt.text(j, i, format(cm[i, j], 'd'),
                 horizontalalignment="center",
                 color="white" if cm[i, j] > thresh else "black")

plt.tight_layout()
plt.show()


#### **F1 SCORE CALCULATION**

In [None]:
f1 = f1_score(y_test, binary_predictions)

print("F1 Score:", f1)

### **USING TEST DATA TO MAKE PREDICTIONS**

#### **Load Test Data**
* *A total of 2250 Data Points with 1200 features in the Test dataset*

In [None]:
df_test = pd.read_csv('Dataset/test.csv')

In [None]:
# Splitting test data into features and id
df_test_features = df_test.drop(['id'], axis=1)
df_test_id = df_test[['id']]

In [None]:
test_features = df_test_features

In [None]:
test_features = test_features.to_numpy()

In [None]:
test_features = np.reshape(test_features, (-1, 20, 20, 3))

In [None]:
predictions_Test = cnnModel.predict(test_features)

In [None]:
binary_predictions_Test = (predictions_Test > 0.5).astype(int)

In [None]:
labels = pd.DataFrame(binary_predictions_Test)

### **SOLUTION DATAFRAME**

In [None]:
solDF = pd.DataFrame()
solDF['id'] = df_test_id
solDF['labels'] = labels

In [None]:
solDF.head(10)