Extract website information for the cleanURI service.
Configuration is done using environment variables:
PORT
: Port for the HTTP endpoint (default8080
, only change when running locally!)AMQP_HOST
: RabbitMQ hostAMQP_USER
: RabbitMQ userAMQP_PASS
: RabbitMQ passwordAMQP_VHOST
: RabbitMQ virtual host, defaults to '/'EXTRACTION_TASK_QUEUE
: AMQP queue (inbound) for receiving extraction tasks from the Canonizer
This handler uses the reply-to
header for result message binding and therefore has no outbound routing key in its configuration.
With the configuration stored in a file .env
, the service can be run as follows:
docker run --rm \
--env-file .env \
mrtux/cleanuri-extractor
The service does not store any state and therefore needs no mount points or other persistence.
Please make sure to pin the container to a specific version in a production environment.
This project uses the Micronaut Framework.
Version numbers are determined with jgitver.
Please check your IDE settings to avoid problems, as there are still some unresolved issues.
If you encounter a project version 0
there is an issue with the jgitver generator.
For local execution the configuration can be provided in a .env
file and made available using dotenv
:
dotenv ./mvnw mn:run
Note that .env
is part of the .gitignore
and can be safely stored in the local working copy.
The project depends on cleanURI-common with a Maven artifact that is currently hosted as a GitHub package. Please refer to the README from cleanURI-common on how to resolve the dependency locally.
The cleanURI-common dependency is resolved by jitpack. Since these dependencies are built on-demand it may take a moment to download.
The build is split into two stages:
- Packaging with Maven
- Building the Docker container
This means that the Dockerfile expects one (and only one) JAR file in the target directory. Build as follows:
mvn --batch-mode --update-snapshots clean package
docker build .
Why not do everything with maven and JIB? So far I have not been able to integrate JIB with the mechanism that determined which tags should be build (e.g. only build latest when on main branch). After 5h of trying I settled with this solution:
- Maven is sufficiently reliable to create reproducible builds, and we can make use of the build cache.
- The Dockerfile allows for the usual integration into image build and push.
The whole process is coded in the docker-publish workflow and only needs to be executed manually for local builds.
- Stefan Haun (@penguineer)
PRs are welcome!
If possible, please stick to the following guidelines:
- Keep PRs reasonably small and their scope limited to a feature or module within the code.
- If a large change is planned, it is best to open a feature request issue first, then link subsequent PRs to this issue, so that the PRs move the code towards the intended feature.
🚧 Please note that the Canonizer/Extractor API are still work in progress. Any contributions to these should be coordinated to avoid going in different directions.
MIT © 2022 Stefan Haun and contributors