HMA (Hasher Matcher Actioner) is a tool from Meta to detect content that's been copied (or slightly modified) from sources already identified.
This repository provides a Matrix-specific extensions to HMA for (primarily) the Matrix ecosystem to benefit from in a familiar "Matrix way" of deploying applications. Use of this repository is not required to set up an HMA instance - it just wraps up some of the functionality to be familiar to Matrix developers/server operators.
This repo provides a Docker image which is layered on top of HMA's image. Configuration is done via environment variables rather than Python (if you'd like to use/override the Python config, HMA directly is probably a better choice).
The following environment variables can be specified:
-
HMA_DB_HOST(defaultlocalhost) - The hostname (with port if required) for your PostgreSQL database running HMA. -
HMA_DB_NAME(defaulthma) - The name of the database on the PostgreSQL server to use. -
HMA_DB_USER(defaulthma) - The username to access the PostgreSQL database. -
HMA_DB_PASS(no default - required) - The password for the above user. -
HMA_API_KEY(no default - required ifHMA_API_KEY_REQUIREDis notfalse) - The API key to require on requests. The UI may not function with an API key set - use HMA's Docker Compose/development setup for experimentation with the UI. -
HMA_API_KEY_REQUIRED(defaulttrue) - Set this tofalseto disable the API key requirement. This can allow you to use the UI properly when there's no API key. Ignored when an API key is specified. -
HMA_WORKER_ROLE(no default - required) - The type of functionality to enable on this particular instance. This allows for load balancing some/all aspects of HMA's operations. Reverse proxying and routing traffic to workers is left as an exercise for the reader 😇.The role must be one of the following:
HASHER- Services the/h/*API endpoints for hashing. Note that hashing can be resource intensive.MATCHER- Services the/m/*API endpoints for matching.CURATOR- Services the/c/*API endpoints for managing content, banks, exchanges, etc.CRON- Runs scheduled tasks and builds the internal index for the matcher(s). There should be no more than one of these running at a time.UI- Services the/ui/*endpoints. Note that the UI might not function if an API key is set. The UI worker is the only worker that's not required to run a complete HMA instance.
The above can then be provided to the hma-matrix Docker image to run a worker:
docker run -d -e HMA_WORKER_ROLE=UI [...] -p 127.0.0.1:5100:5100 ghcr.io/matrix-org/hma-matrix:[version]See the GHCR repo for available tags/versions.
See the HMA API docs for more information on the API endpoints themselves.
Note
It's possible with this setup to specify different API keys for different functions. All of the same type of worker will need to be using the same API key, but the hashers can use a different API key from the curators for an amount of role-based access control.
Note
If there's a config option you'd like to set (or override) from HMA directly, you can do so by prefixing the option with OMM_. For example, OMM_MAX_REMOTE_FILE_SIZE=1073741824 will set the max remote file size to 1gb.
To quickly set up a local HMA stack for developing applications which use HMA, the Docker Compose file from this repo can be used.
git clone https://github.com/matrix-org/hma-matrix.git && cd hma-matrixdocker compose up -d- Visit
http://localhost:5100/ui
Caution
This stack is set up by default without an API key to allow use of the UI. It's recommended to set an API key in production deployments.
When API authentication is enabled, supply the API key as a Bearer token in the Authorization header:
curl -h "Authorization: Bearer ${HMA_API_KEY}" ...TODO: Not yet written.
Ideas:
- Synapse quarantined media exchange/import
- The same but for MMR
- Maybe policy list support if we can figure out how to make that work safely?
- "flagged as spam" from policyserv/mjolnir/draupnir/meowlnir/etc
This repository puts out new versions when HMA does and when there's functionality worth releasing, such as changes to the config.py file.
Structure: [hma-version]-matrix.[increment] where [increment] is on a per-[hma-version] basis.
Example: 1.0.21-matrix.2 is HMA v1.0.21, second release by this repo in the v1.0.21 series.