Skip to content

This is a prototype of a semi-automatic data anonymization app for German documents.

License

Notifications You must be signed in to change notification settings

openredact/openredact-app

Repository files navigation

OpenRedact

Semi-automatic data anonymization for German documents.


MIT license Code style: Black Code style: prettier Frontend Tests Backend Tests Black & Flake8

⚠️ Disclaimer ⚠️: This is a prototype. Do not use for anything critical.

⚠️ Note ⚠️: This tool focuses on the text content. Metadata will not be anonymized.

Description

This repository is the home to the OpenRedact app, a webapp for semi-automatic anonymization of German language documents. OpenRedact is a Prototype Fund project, funded by the Federal Ministry of Education and Research. A detailed description of the project and prototype can be seen here.

Using OpenRedact to anonymize documents

CLI

You can use the CLI script backend/cli/redact.py to anonymize a directory of documents in an unsupervised manner.

./redact.py --input_dir "path/to/documents/" --output_dir "out/directory/"

Call ./redact.py --help for usage instructions and important notes.

Webapp

OpenRedact works with document file formats

This screencast walks you through the anonymization of a document, from upload to download of the anonymized file.

OpenRedact supports different anonymization methods

This screencast demonstrates the different anonymization methods that OpenRedact supports. The modifications on the left are immediately previewed on the right.

OpenRedact comes with an annotation tool

The automatically detected and proposed personal data can be corrected and extended by the user using our annotation tool.

Annotate personal data inside a text

OpenRedact tells you how good its automatic personal data detection is

Based on the manual corrections and extensions, we can assess the mechanism for automatic detection of personal data.

Show scores and metrics for the automatic detection of personal data

Deployment

The app is best deployed using Docker.

Run the full stack using Docker-Compose

We have pre-built Docker images available at https://hub.docker.com/u/openredact.

Pull and start the containers by running:

# Clone the repo
git clone https://github.com/openredact/openredact-app.git
cd openredact-app

# Pull images & start containers
docker-compose pull
docker-compose up

This will host the backend at port 8000 (and http://localhost/api) and the frontend at port 80. Once started, you can access the webapp at http://localhost/.

Run the frontend using Docker

cd frontend
docker build -t openredact/frontend .
docker run -p 80:80 openredact/frontend

This will build the frontend inside a node Docker container and deploy the result in an nginx container. For more details about this procedure see React in Docker with Nginx, built with multi-stage Docker builds, including testing.

Run the backend using Docker

cd backend
docker build -t openredact/backend .
docker run -p 8000:8000 openredact/backend

API Documentation

Documentation of the API is available at the endpoints /docs (Swagger UI) and /redocs (ReDoc), e.g. http://127.0.0.1:8000/redoc. The OpenAPI specification can be found here.

Development

First, follow the instructions in the backend or frontend readme. Then, continue with the instructions below.

Developing using Docker

If you want to use our Docker setup for development, run:

docker-compose -f docker-compose.dev.yml up

Don't forget to add the project's directory to the list of allowed file sharing resources in the Docker Desktop preferences.

Install the pre-commit hooks

pre-commit is a Python tool to manage git pre-commit hooks. Running the following code requires the backend dev requirements to be set up as explained here. We have pre-commit hooks for formatting and linting Python and JavaScript code (black, flake8, prettier and eslint). Note that the tests, being slower than formatters and linters, are run by CI. So don't forget to run them manually before committing.

pre-commit install
git config --bool flake8.strict true  # Makes the commit fail if flake8 reports an error

To run the hooks:

pre-commit run --all-files

How to contact us

For usage questions, bugs, or suggestions please file a Github issue. If you would like to contribute or have other questions please email hello@openredact.org.

License

MIT License

About

This is a prototype of a semi-automatic data anonymization app for German documents.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published