Skip to content

pprindeville/truepic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

TruePic image analysizing micro-service

We implement a very basic service which attempts to detect if a submitted image has been Photoshopped or not.

This service runs on port 8080, and should accept forwarded requests being proxied by a full-featured server like Apache with mod_proxy configured.

Because we only support a single, simple service, we don't implement a full RESTful API (and indeed, there's no server- side state, so there would be no point in doing so).

Dependencies

We require the following packages installed:

  • g++ compiler
  • Gnu make
  • Poco libraries and headers (Net, JSON, Util, and Foundation modules)
  • exempi libraries and headers

Building

Just run make.

Running

On the server side, run ./picserver or gdb -ex run ./picserver if you're feeling uncertain.

To test it from the command line, run:

$ curl --data-binary @my-path-here http://localhost:8080/$(basename my-path-here)

for instance:

$ curl --data-binary @$HOME/Downloads/truepic.jpg http://localhost:8080/truepic.jpg
{
  "is_valid" : true,
  "name" : "truepic.jpg",
  "tests" : {
    "creator_tool_is_photoshop" : true,
    "create_modify_mismatch" : true
  }
}
$ 

A JSON result is returned, indicating that:

  1. the image was accepted for analysis and no resource shortages occurred;
  2. the name that the image was submitted as (for confirmation);
  3. a hash of tests that were performed and their results;

For now, only two tests are performed, below.

Caveats

We only handle JPEG files, and of those only ones smaller than 128MB. There's a very limited number of tests we do, to wit:

  1. look for a CreatorTool annotation starting with "Adobe Photoshop ...";
  2. check for a mismatch between the CreateDate and the ModifyDate, ignoring the nanoseconds only the former (which seem to be absent on the later);

Security

We don't accept filenames that are longer than 64 characters or don't consist of alpha-numeric characters, dot, hyphen, or underscore.

We don't accept files of more than 128MB of data.

We require the Content-Length: header to be present.

Limitations

The list of checks above in Caveats is quite short. The exempi library is poorly written and documented, and exceedingly badly architected. Image parsing is closely tied to the filesystem, which precludes processing images that are already present in memory, and that's a serious shortcoming (and performance limitation for a web-service that might otherwise be able to do all processing in-memory).

Had I known that exempi was as awful as it was, I would have spent more time looking for another library, even if it needed to be build from source.

Iterating over arrays is not well-documented, and I was not able to (given the time available) figure out how to detect more than one instance of the xmpMM:History attribute.

Other than requiring disk I/O, the XMP annotation analysis is fairly simple linear parsing of the image and should scale well, even for large images.

Image processing itself (as described below in Further improvements) would be of exponential complexity, and not scale as well as image size increased.

Further improvements

Additional tests that could be done:

  1. Look for comments in the headers, as comments are usually indicative of editors rewriting the header (most devices don't insert comments);
  2. if a thumbnail (usually generated by the device itself) is present, regenerate the thumbnail from the full image and compare the two--a discrepancy might indicate the image was modified but the thumbnail wasn't updated to reflect this;
  3. there could be increasingly sophisticated image-processing based tests for discrepancies in color temperature, edge gradients, sharpness; artifacts of scaling, etc;
  4. neural network training could be applied to sets of unmodified and editted images and this training could be used to spot Photoshopped images;

etc.

Use of a better XMP library would allow all of the data to be handled in memory, and hence not require temporary file creation, disk access, etc.

About

Simple server to detect photoshopped images

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors