Skip to content

All-in-one Docker image of ClamAV with Celery worker, REST API and clamd

License

Notifications You must be signed in to change notification settings

elemental-lf/chowder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Travis CI status

All-in-one Docker image of ClamAV with Celery worker, REST API and clamd

This repository contains a Docker image which includes the ClamAV engine and multiple different ways to access the engine. It is intended to be deployed with Kubernetes but can also be used with Docker.

Modes of operation

When instantiating the image as a container the mode the container should be running in needs to be specified. There are four possible modes:

  • freshcleam: In this mode the container runs the freshclam daemon. It updates the anti-virus databases in the /var/lib/clamav directory.

  • celery-worker: Celery is a distributed task queue framework for Python. In this mode a Celery worker is started which publishes one task with the following signature:

    scan(fs: str, file: str, timeout: int = 3600, clamscan_options: Dict[str, str] = None, unlink: bool = False)

    The parameters for scan are:

    • fs: Name of a PyFilesystem URL
    • file: Name of a file to be scanned
    • timeout: Timeout for the clamscan call
    • clamscan_options: This is a dictionary of options that are passed directly clamscan. The key of the dictionary items is the option name (without the leading dash or dashes). The value is the argument of the respective option. If an option has no argument the value should be set to None.
    • unlink: If this boolean value is set to True the file is unlinked after being scanned.

    The task returns a tuple consisting of a boolean value indicating if a virus was found (True) or or (False) and a multi-line string containing the output of clamscan.

    Resources are accessed via PyFilesystem, support for accessing S3 object stores via fs-s3 is included.

    All resources are scanned with clamscan to circumvent the 4GB limit of clamd and of the REST API which also connects to clamd. This has the disadvantage that the whole anti-virus pattern database needs to be loaded by each invocation of clamscan which takes about 20 seconds (on my hardware). Furthermore to scan S3 objects they need to be downloaded into the local filesystem in full to be scanned.

    To configure the Celery workers to connect to Celery backends the Celery configuration needs to be mounted as /celery-worker/config/celeryconfig.py inside the container. It contains configuration variable assignments as per the Celery documentation. To get the results of the scans a results backend is needed.

    The task needs to be called by name. It is possible to use send_task for this or to define a signature.

  • clamd: This mode starts the clamd daemon inside the container. It listens on TCP port 3310 and on the Unix domain socket /var/run/clamav/clamd.sock. The TCP port can be exposed to the outside world if wanted. The Unix daemon socket is currently not used. This mode is untested apart from observing a successful startup of clamd.

  • rest: In this mode Solita's ClamAV REST proxy is started. It connects to clamd via TCP on localhost, port 3310 so a companion clamd container in the same network namespace is needed. This mode is untested apart from observing a successful startup of the proxy.

The mode needs to be supplied as single argument to the container's entry-point. This is done via the Kubernetes args option in container specifications. When using docker-compose or Docker Swarm this would be command.

Usage with Kubernetes

To deploy Chowder with Kubernetes it is best to use the provided Helm chart. It can be found in charts/chowder. If you're not using Helm the manifest templates in charts/chowder/templates will still be a good starting point for building your own manifests.

The Helm chart comes with a few configuration options:

First of a all it is possible to activate or deactivate each of the at most four containers that comprise each pod of the deployment. The freshclam container should normally always be present. If it does not exist the anti-virus pattern databases which have been baked into the image at the time of its build are used and not updated. The other options reflect the modes of operation listed above.

The configuration for the Celery worker needs to be supplied under the key containers.celeryWorker.config. It is injected into the container via a ConfigMap.

containers:
  clamd:
    enabled: false
  freshclam:
    enabled: true
  celeryWorker:
    enabled: true
    config: |
      [... Celery Worker configuration ...]
  rest:
    enabled: false

To use the REST API or talk to clamd directly the corresponding services can be activated. The port number the respective service listens on can also be configured.

services:
  rest:
    enabled: false
    type: ClusterIP
    port: 8080
  clamd:
    enabled: false
    type: ClusterIP
    port: 3310

By default the deployment consists of five pods. clamd and the REST API have an internal scaling mechanism each, so one pod can handle a number of connections simultaneously. But the Celery workers is just started with one worker process per pod, so they need to be scaled by increasing the number of replicas. This can be done automatically be enabling the horizontal autoscaler below.

replicaCount: 5 

With the standard settings the Helm chart will use the latest image. For production deployment it is recommened to specify a release version instead of using latest. In that case the pullPolicy can be set to IfNotPresent.

image:
  repository: elementalnet/chowder
  tag: latest
  pullPolicy: Always

For scanning files directly a data volume can be mounted into the Celery worker container:

containers:
  celeryWorker:
    dataVolume:
      enabled: false
      # Mount path inside the Celery worker container
      mountPath: /data
      reference:
        persistentVolumeClaim:
          claimName: your-pvc

It is possible to specify resources for the containers. Currently all containers get the same resource allocation. This might turn out to be suboptimal and separate resource specifications might be needed in the future. A horizontal pod autoscaler can be enabled to adjust the number of replicas automatically.

resources: {}
  # limits:
  #  cpu: 100m
  #  memory: 128Mi
  # requests:
  #  cpu: 100m
  #  memory: 128Mi

horizontalPodAutoscaler:
  # Remember to set resources above if you enable this
  enabled: false
  minReplicas: 1
  maxReplicas: 10
  targetCPUUtilizationPercentage: 50

The last three options relate to pod placement:

nodeSelector: {}

tolerations: []

affinity: {}  

Usage with Docker

Currently there are no examples on how to use this image with docker or docker-compose or on how to deploy it inside Docker Swarm. Contributions are welcome.

Available images

A pre-built Docker image is present on Docker Hub under https://hub.docker.com/r/elementalnet/chowder. The current master branch is available under the tags latest and master. Releases are available with their respective version as the tag. All images are built automatically via Travis CI.

Credits

This work is in part based on https://github.com/UKHomeOffice/docker-clamav. Thank you!

About

All-in-one Docker image of ClamAV with Celery worker, REST API and clamd

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published