Skip to content

saraferreirascf/Photos-Videos-Manipulations-Dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Authors

  • Sara Ferreira - Department of Computer Science; Faculty of Sciences; University of Porto, Porto, Portugal; sara.ferreira@fc.up.pt
  • Mário Antunes - Computer Science and Communication Research Centre (CIIC), School of Technology and Management, Polytechnic of Leiria; Leiria; Portugal; mario.antunes@ipleiria.pt
    INESC TEC, CRACS; Porto; Portugal
  • Manuel E. Correira - Department of Computer Science; Faculty of Sciences; University of Porto, Porto, Portugal; mdcorrei@fc.up.pt
    INESC TEC, CRACS; Porto; Portugal

Objects and faces manipulations dataset

This dataset represents a compilations of several dataset that contains real photos and video frames and forged ones. This forged photos and video frames contais several types of manipulation like copy-move, splicing and deepfake.

Name Fake Real
CelebA-HQ dataset - 10000
Flickr-Faces-HQ dataset - 10000
100K Faces Project 10000 -
This person does not exist 10000 -
COVERAGE dataset 97 97
Columbia Image Splicing Dataset 180 183
Dataset created by us 14 14
Celeb-DFv1* 795 158

*This dataset only contains videos. Between 3-4 fps were extracted from each video and added to the final dataset

The final dataset already labeled is available here

Features

  • Features extraction with Discrete Fourier Transform implementation
  • Images and videos classification with SVM-based model
  • Combines both objects and faces.

Experimental setup

Pipeline

In order to transform the simple dataset to a labeled dataset it is needed some pre-processing. The goal here is to use the photos and videos present in this dataset to classify other photos and videos. To achieve this, the first step is to extract features from each file. Afterwards, it will be possible to compare this features with the features of multimedia content target of investigation, inferring if they are manipulated or not. This features will be extracted using the method "Unmasking deepfake using Simple Features". To automate this feature extraction process a python script was created. To use this script it is needed to identify the folder where the files to extract features are and the number of files to be analyzed (normally the minimum between the two classes). After extracting the features of a photo or video frame, this file will be classified considering the folder where it is. All files in the folder "fake" are going to be classified with 0 and all files in the folder "real" will be classified with 1. After iterating through all the files, extracting all features and labeling, the result is a fully labeled dataset.

Publications

  • Ferreira, S., Antunes, M., & Correia, M. E. "Forensic analysis of tampered digital photos"; 25th Iberoamerican Congress on Pattern Recognition (CIARP); May 2021; Porto; Portugal; to be published in Springer Lecture Notes on Computer Science.
  • Ferreira, S., Antunes, M., & Correia, M. E. (2021). Exposing Manipulated Photos and Videos in Digital Forensics Analysis. Journal of Imaging, 7(7), 102. doi:10.3390/jimaging7070102
  • Ferreira, S.; Antunes, M.; Correia, M.E. A Dataset of Photos and Videos for Digital Forensics Analysis Using Machine Learning Processing. Data 2021, 6, 87. https://doi.org/10.3390/data6080087

About

Dataset for multimedia manipulation detection

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages