## Step 1: Data Exploration and Preparation

In this notebook, we'll start by loading the deepfake detection dataset from Hugging Face. We will explore its structure and visualize some examples to understand what we're working with.

### 1.1: Install and Import Libraries

First, let's make sure we have the necessary libraries installed. `datasets` is from Hugging Face and will help us download the data. `matplotlib` is for plotting images.

In [None]:
%pip install datasets matplotlib Pillow

In [None]:
from datasets import load_dataset
import matplotlib.pyplot as plt
import random

### 1.2: Load the Dataset

We'll load the `saakshigupta/deepfake-detection-dataset-v3` dataset. It's already split into `train` and `test` sets, which is convenient.

In [None]:
dataset = load_dataset("saakshigupta/deepfake-detection-dataset-v3")

print("Dataset loaded successfully!")
print(dataset)

### 1.3: Explore the Dataset Structure

Let's look at a single example from the training set to understand the features.

In [None]:
train_dataset = dataset['train']
sample = train_dataset[0]

print("Sample from the dataset:")
print({k: v for k, v in sample.items() if k != 'image'}) # Print all except the image data itself
sample['image']

### 1.4: Visualize Real vs. Fake Images

Visualizing the data is a key step. Let's find one real image and one fake image to display side-by-side. The `label` is 0 for fake and 1 for real.

In [None]:
# Find one real and one fake image
real_images = [data for data in train_dataset if data['label'] == 1]
fake_images = [data for data in train_dataset if data['label'] == 0]

real_sample = random.choice(real_images)
fake_sample = random.choice(fake_images)

# Create a figure to display the images
fig, ax = plt.subplots(1, 2, figsize=(12, 6))

ax[0].imshow(real_sample['image'])
ax[0].set_title(f"Real Image\nLabel: {real_sample['label']}")
ax[0].axis('off')

ax[1].imshow(fake_sample['image'])
ax[1].set_title(f"Fake Image\nLabel: {fake_sample['label']}")
ax[1].axis('off')

plt.show()

## Next Steps

Now that we have loaded and explored the data, the next step is to pre-process it and prepare it for training our CNN model. This will involve:

1.  **Resizing and Normalizing Images**: Ensuring all images are the same size and their pixel values are scaled appropriately.
2.  **Creating DataLoaders**: Setting up efficient pipelines to feed data to our model during training.

We will tackle this in the next notebook, `2_Model_Training.ipynb`.