Synthetic and Authentic Forensic Lab (SAFL)

This repository contains the datasets and materials associated with the article:

Sergio A. Falcón-López, et al.
Forensic Analysis of Manipulated Images and Videos
Submitted to Applied Sciences (MDPI), 2025.

Relation to the Article

The SAFL dataset was created specifically for the experiments described in the article.
It includes:

Authentic and synthetic (Deepfake) images and videos.
The exact subsets used for the evaluation of conventional forensic tools and modern Deepfake-detection models.
File structures and naming conventions referenced in the manuscript tables and figures.

This repository enables full reproducibility of the experiments and verification of the reported metrics.

Author and Ownership

The dataset and repository are authored by:

Sergio A. Falcón-López
ORCID: https://orcid.org/0009-0002-7106-6691
Email: sfalcon@scc.uned.es

GitHub username: oigres5

Repository Structure

The repository is organized as follows:

audios/: Contains both real and deepfake-generated audio samples.
- real/: Original, non-manipulated audio files.
- fake/: Deepfake-generated audio files.
images/: Contains real and deepfake-generated images.
- real/: Original images used in the experiments.
- fake/: Deepfake-generated images.
videos/: Directory with real and deepfake videos.
- real/: Original source videos.
- fake/: Deepfake-generated videos.
src/: Contains modified or adapted scripts based on external tools.
README.md: This documentation file.

Dataset Description

Images

-Real: 2102 samples

These samples were selected from CelebA dataset.

-Deepfake: 2095 (+300 Updated 06/2025)samples

1009 samples generated from https://thispersondoesnotexist.com
613 samples generate with FaceApp
453 samples from generated videos with DeepFaceLab
20 samples by Dall-E2
300 samples by Dall-E3 (Updated 06/2025)

Videos

-Real: 212 samples

These samples were selected from Celeb-DF dataset.

-Deepfake: 204 samples

100 generated with Avatarify
104 generated with DeepFaceLab

Audios

-Real: 2000 samples in Spanish language extracted from LibriVox project audiobooks

-Deepfake: 2000 samples generated with Text-To-Speech (TTS) method

System Requirements

The src/ directory contains helper scripts that wrap or slightly adapt existing forensic and Deepfake-related tools from external projects.
The complete and up-to-date software requirements for each tool are documented in their original repositories, for example:

ManTraNet – https://github.com/ISICV/ManTraNet
Image Forgery Detection with CNN – https://github.com/kPsarakis/Image-Forgery-Detection-CNN
Mesonet – https://github.com/DariusAf/MesoNet

In our experiments, all scripts in src/ were executed under the following minimum environment:

Software

Operating system: Linux (Ubuntu 20.04+) or Windows 10/11
Python: 3.8 or higher
Git: 2.25 or higher
CUDA 11+ and NVIDIA drivers (only required for GPU-accelerated training/inference)

Hardware

CPU: Quad-core processor or better
RAM: at least 8 GB (16 GB recommended for video processing)
Disk space: at least 20 GB free for datasets and intermediate results
GPU (optional but recommended): NVIDIA GPU with ≥ 4 GB VRAM for DeepFaceLab-based workflows

Versioning and Commit Transparency

Past changes to the dataset have been reviewed and documented in this repository. All future commits will include:

Clear descriptions of the changes performed.
Exact counts of affected files.
Justification whenever aggregate statistics or percentages change.

Dataset Version History

2025-06-07 Initial public release of the SAFL dataset (v1.0). (Git commit: 3f6f20088360acfdcfbdde5186d4c7f67c775f89 on branch main).

Data Protection and Consent

The “real” samples in SAFL are not recordings collected directly by the authors:

images/real/: contains a subset of face images taken from the CelebA dataset, released by its authors for non-commercial research.
videos/real/: contains a subset of face videos taken from the Celeb-DF dataset, released by its authors as a public benchmark for Deepfake forensics.
audios/real/: contains original audio files extracted from the LibriVox project (https://librivox.org/)

No additional personal identifiers (e.g., names, social media accounts, or textual labels of individuals) are included in SAFL beyond the anonymous file naming used for the experiments. The dataset does not contain images or recordings collected from private individuals specifically for this work.

Users of SAFL are responsible for ensuring that their use of the data complies with the original dataset licenses (e.g., CelebA, Celeb-DF) and with any applicable data protection regulation (such as GDPR when processing biometric data).

License

This dataset is released under the following license:

Creative Commons Attribution 4.0 International (CC BY 4.0)
https://creativecommons.org/licenses/by/4.0/

This allows reuse with proper citation.

How to Cite

If you use this dataset, please cite:

Sergio A. Falcón-López, Synthetic and Authentic Forensic Lab (SAFL) Dataset. https://github.com/oigres5/SAFL

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Synthetic and Authentic Forensic Lab (SAFL)

Relation to the Article

Author and Ownership

Repository Structure

Dataset Description

System Requirements

Software

Hardware

Versioning and Commit Transparency

Dataset Version History

Data Protection and Consent

License

How to Cite

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
audios		audios
images		images
src		src
videos		videos
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Synthetic and Authentic Forensic Lab (SAFL)

Relation to the Article

Author and Ownership

Repository Structure

Dataset Description

System Requirements

Software

Hardware

Versioning and Commit Transparency

Dataset Version History

Data Protection and Consent

License

How to Cite

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages