## Preprocessing

Below are example codes for preprocessing a raw dataset of videos into face frames, with `dlib` and `facenet_pytorch.MTCNN`. The dataset being used is a small, sample dataset of 10 images in each class, with its directory structured like the one mentioned in the [README.md](../README.md).

**Please be aware:** you might need to do manual intervention (re-running) as the face detection algorithms of `dlib` and `facenet_pytorch.MCTNN` might not detect faces in the frames. For such cases, you will be alerted like below when executing `preprocess()`:

```bash
ERROR:root:No face detected on ./videos-organized/Combined/FAKE/acdkfksyev.mp4 frame 233
ERROR:root:No face detected on ./videos-organized/Combined/FAKE/alnkzqihau.mp4 frame 102
```

In [1]:
# Ensure __init__.py is being run before the script
%run __init__.py

### With MTCNN

In [2]:
from src.preprocessing.mtcnn_preprocessor import MTCNNPreprocessor

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
prep = MTCNNPreprocessor(
    dataset_path="../data/raw/Sample",
    classes=["REAL", "FAKE"]
)

prep.preprocess(
    save_to="../data/preprocessed/MTCNN-Sample",
    n_frame=2,
    cut_amount=0.15,
    batch_size=10,
    seed=42,
)

Extracting ../data/raw/Sample/REAL to ../data/preprocessed/MTCNN-Sample/REAL done in 22.46s
Extracting ../data/raw/Sample/FAKE to ../data/preprocessed/MTCNN-Sample/FAKE done in 21.85s


### With Dlib

In [4]:
from src.preprocessing.dlib_preprocessor import DlibPreprocessor

In [5]:
prep = DlibPreprocessor(
    dataset_path="../data/raw/Sample",
    classes=["REAL", "FAKE"]
)

prep.preprocess(
    save_to="../data/preprocessed/Dlib-Sample",
    n_frame=2,
    cut_amount=0.15,
    batch_size=10,
    seed=42,
)

Extracting ../data/raw/Sample/REAL to ../data/preprocessed/Dlib-Sample/REAL done in 36.63s
Extracting ../data/raw/Sample/FAKE to ../data/preprocessed/Dlib-Sample/FAKE done in 37.57s
