### CS 697
### Real vs Fake Face Detection Group Project
### ZIRADDIN KAZIMLI & MOHIT

In [1]:
import tensorflow as tf
import cv2
from cv2 import imread, imshow
import os  # accessing directory structure
import numpy as np
from tensorflow import keras

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from tensorflow.keras.applications import VGG16
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.utils import shuffle 

from tkinter import Tk
from tkinter.filedialog import askopenfilename # Interactive mode

In [2]:
# Define paths to your dataset folders
real_folder = '/Users/ziko/Desktop/CSFall23/CS697/project/training_real'
fake_folder = '/Users/ziko/Desktop/CSFall23/CS697/project/training_fake'

In [None]:
# Lists to store image paths and labels
image_paths = []
labels = []

In [4]:
# Load real images
real_images = [os.path.join(real_folder, filename) for filename in os.listdir(real_folder)]
image_paths.extend(real_images)
labels.extend([1] * len(real_images))  # 1 for real

# Load fake images
fake_images = [os.path.join(fake_folder, filename) for filename in os.listdir(fake_folder)]
image_paths.extend(fake_images)
labels.extend([0] * len(fake_images))  # 0 for fake

In [5]:
# Convert the lists to NumPy arrays
image_paths = np.array(image_paths)
labels = np.array(labels)

#### Converting the lists to NumPy arrays can be beneficial for several reasons:

#### Efficiency: 
NumPy arrays are more memory-efficient and faster for numerical operations compared to Python lists. This is especially important when you're working with large datasets and performing operations on the data.

#### Uniform Data Structure: 
NumPy arrays provide a uniform data structure, which can simplify many aspects of data manipulation and analysis. It ensures that all elements have the same data type, which can help prevent unexpected data type-related errors.

#### Compatibility with Machine Learning Libraries: 
Many machine learning libraries, including TensorFlow and PyTorch, expect input data in the form of NumPy arrays. Converting our data to NumPy arrays makes it easier to work with these libraries.

In [6]:
# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(image_paths, labels, test_size=0.2, random_state=42)


In [7]:
# Define a function to load and preprocess an image
def load_and_preprocess_image(image_path, target_size):
    image = cv2.imread(image_path)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) # Convert image to RGB
    image = cv2.resize(image, target_size)
    image = image/255.0 # Normalize pixel values
    return image

In [8]:
# Define the target size for your images (e.g., (224, 224) for a 224x224 size)
target_size = (224, 224)

In [9]:
# Define your CNN model
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(224, 224, 3)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(128, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Print the model summary
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 222, 222, 32)      896       
                                                                 
 max_pooling2d (MaxPooling2  (None, 111, 111, 32)      0         
 D)                                                              
                                                                 
 conv2d_1 (Conv2D)           (None, 109, 109, 64)      18496     
                                                                 
 max_pooling2d_1 (MaxPoolin  (None, 54, 54, 64)        0         
 g2D)                                                            
                                                                 
 conv2d_2 (Conv2D)           (None, 52, 52, 128)       73856     
                                                                 
 max_pooling2d_2 (MaxPoolin  (None, 26, 26, 128)       0

In [10]:
def classify_and_display_new_image(image_path):
    while True:
        # Load and preprocess the new image
        new_image = load_and_preprocess_image(image_path, target_size)

        # Make predictions
        prediction = model.predict(np.expand_dims(new_image, axis=0))

        # Determine the result
        if prediction > 0.5:
            result = "Real"
        else:
            result = "Fake"

        # Display the image and prediction
        image = cv2.imread(image_path)
        cv2.putText(image, f"Prediction: {result}", (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)

        # Create a pop-up window to display the image
        cv2.imshow("Real vs. Fake Image Detection", image)
        
        # Wait for a key event
        key = cv2.waitKey(0)

        # Check if the key is the ESC key (27)
        if key == 27:
            cv2.destroyAllWindows()
            break  # Exit the loop if ESC key is pressed
        elif key == ord('n'):
            # Let the user choose another image
            new_image_path = askopenfilename(title="Select another Image")
            if not new_image_path:
                break  # Exit the loop if the user cancels the selection
            image_path = new_image_path

In [11]:
# Interactive mode: Let the user choose an image to classify
root = Tk()
root.withdraw()
image_path = askopenfilename(title="Select an Image")

# Classify and display the chosen image
classify_and_display_new_image(image_path)



error: OpenCV(4.5.5) /Users/xperience/actions-runner/_work/opencv-python/opencv-python/opencv/modules/imgproc/src/color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cvtColor'


### SUMMARY

#### Data Loading:
The code loads real and fake images from specified folders.
It creates lists (image_paths and labels) to store image paths and corresponding labels (1 for real, 0 for fake).
The data is then split into training and test sets.

#### Image Preprocessing:
There's a function (load_and_preprocess_image) to read, convert to RGB, resize, and normalize pixel values of an image.

#### Convolutional Neural Network (CNN) Model:
A simple CNN model is defined using the Sequential API from TensorFlow's Keras.
The model has convolutional layers with max pooling, followed by flattening, dense layers, and a final output layer with sigmoid activation for binary classification.

#### Model Compilation:
The model is compiled using the Adam optimizer and binary cross-entropy loss.

#### Prediction and Display:
There's a function (classify_and_display_new_image) to classify and display a chosen image using the trained model. It uses OpenCV to show the image with a prediction label.
The user can press the 'n' key to select another image or the ESC key to exit.