Skip to content

MAT integration for binary metadata removal #151

Closed
wants to merge 1 commit into from

2 participants

@gnusosa
gnusosa commented Nov 11, 2013

MAT is a toolbox composed of a GUI application, a CLI application, and a library.
https://mat.boum.org/

Based on issue #119

Part of the San Francisco Hackathon.

@gnusosa
gnusosa commented Nov 11, 2013

This implementation utilizes MAT's full capabilities. Since drops are usually binary files, we need an implementation that can probe against the most common file formats.

@gnusosa
gnusosa commented Nov 11, 2013

Currently MAT's library can only handle files, and not file streams and memory files.
I'm hacking MAT's library right now to work with file streams and memory files.
It seems easy enough to add that feature to MAT's. I will keep you posted.

@gnusosa
gnusosa commented Nov 11, 2013

In the future, the capability of alerting that a drop is not clean in source.py would be needed.
We should also add guidelines like the Technological Protection of GloblaLeaks document:
https://docs.google.com/document/d/1ZrndvBj9eTg-ooIRfKbXxX18Ie-ODlcjnHjKXSY78Ew

To let the users dropping the information be aware of dangerous metadata, and how to use MAT if they need to inspect specific metadata of each drop. Let them choose what to remove.

@garrettr

In the future, the capability of alerting that a drop is not clean in source.py would be needed.

This is covered by #122. Current proposal there uses Javascript, but that is not a requirement (and the issue is focused on the idea of warning or preventing sources from uploading documents with identifying metadata, not any specific implementation thereof).

@garrettr

This looks good! Please add tests, and squash all the commits into one so it's easier for me to review and merge.

@garrettr

You will also need to provide a way for MAT to be automatically installed as part of the setup process for both developers and deployments. We currently use pip to install Python dependencies from deaddrop/requirements.txt. Since MAT is not available on PyPi, it would probably be best to install it from the Git repository. You should be able to do this with pip.

@gnusosa gnusosa Added MAT to source.py as a metadata purge option.
- MAT consist of a python library that makes
use of other metadata tools dedicated to specific
file-formats.

Always nice tidy and neat.

- Added gitignore expression
for Emacs buffers.

Added checkbox for metadata purge.

- Also added more validation on the
file selection for the cleanup.

Better validation on what file to write to.

Added MAT to requirements.txt

- Note: MAT is not available in
  PyPi, therefore, we clone it
  from the Tor repository.

Added MAT's dependencies to requirements.txt

Added exiftool dependency to Debian/Ubuntu script.

Added tests for binary file upload.
8ca5809
@gnusosa
gnusosa commented Nov 18, 2013

@garrettr I squashed the commits, and added tests. I added binary files for tests.
Please review.

@garrettr

Can you test that the upload file actually has the intended metadata removed?

@garrettr

@dolanjs what do you think about this dependency? (the capabilities it provides are quite awesome)

@garrettr

Now that we are using Vagrant for the dev setup, we don't have to worry about supporting platforms other than Debian/Ubuntu, so the .deb requirement is fine. I'm including this, or some variation thereof, in the 0.3 roadmap. To merge, we'll need to:

  1. Rebase this on top of develop
  2. Finish writing those unit tests (see previous comment)
@gnusosa
gnusosa commented Feb 26, 2014

Will do. Sorry, I've been running other errands.

@gnusosa
gnusosa commented Mar 3, 2014

Rebased to develop branch, it can be found in #326.
Closing this pull request.

@gnusosa gnusosa closed this Mar 3, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.