Skip to content

settings used to install Prefect in a mixed docker-compose/k8s cluster setup

License

Notifications You must be signed in to change notification settings

g-clef/malwarETL-prefect

Repository files navigation

malwarETL-prefect

settings used to install Prefect in a k8s cluster.

This is the only component (though, I grant, an important one) that is not installed primarily in k8s. The reason for that is that my k8s cluster setup (lxc/lxd/juju/charmed-kubernetes) seems to have a bug where pods cannot talk to services where the selected pod to answer the service is on the same node (hairpinning traffic). This could, in theory, be handled by setting a bunch of pod anti-affinity rules, or by doing the yak-shaving to find the bug in either lxd, charmed's implementation of flannel, or some other thing, but frankly, I've got better things to do with my time.

So, instead, the system is set up as follows:

  • a VM on my NAS system runs the core of prefect (the UI, the storage, hasura, graphql, etc) in a VM with docker-compose
  • That VM has postgresql installed on it as the persistent storage for the system (so postgres is not in docker, but is persistent)
  • I installed prefect code on the VM in a venv with the [kubernetes] option
  • I used the prefect agent install command to generate the kubernetes deploy for the agent, and pointed it to a custom URL for job templates
  • I modified the stock prefect job template to include environment variables for connection info for ElasticSearch, and added PersistentVolume mounts for the malware repository and the results directories. This became my custom prefect job template.
  • I added a simple nginx docker container to serve the custom job template
  • I added a docker container registry to hold the job images, and added a directory on the system for those images.

Postgres prefect setup

The postgres setup was fairly vanilla (create a prefect user, give it a password, create a prefect database owned by that user). The trick is exposing that database to the docker system. I ran into two problems:

  1. I had to add
      - "host.docker.internal:host-gateway"

to each container config in the docker-compose that needed to talk to the database, and make the database hostname host.docker.internal. 2) I had to modify pg_hba.conf to allow connections to the postgres db from the docker IP address ranges.

Docker storage for jobs

The easiest way for me to push jobs up to a central place and have k8s pull them back down was to run a docker registry on the ui-server. That's easy enough to set up in docker-compose, but adds some complications.

First, since I didn't want to expose that server to the world, I couldn't get a cheap certificate for it. That means you either have to configure your docker client to recognize that the registry server is insecure, or make a self-signed certificate. I went the latter route (from instructions here: https://docs.docker.com/registry/insecure/). The problem with a self-signed certificate is that it won't be recognized by your local docker instance, so you won't be able to push to it. I had to follow the instructions here: https://blog.container-solutions.com/adding-self-signed-registry-certs-docker-mac to add it to my local system's trust store. Once that was done, I could push to the registry locally.

Using a self-signed cert also means that the docker instances in the cluster won't be able to pull from it. I could, in theory, try to configure the docker on every cluster node to mark the registry as insecure, but that seemed like a mess also. Instead, I'd like to install the certificate on each node. That's do-able with the daemon-set that's in the kubernetes folder. If you're not running this at a home lab, and have access to a docker registry with a real certificate, you probably won't need the daemonset. (daemonset from http://hypernephelist.com/2021/03/23/kubernetes-containerd-certificate.html). One note about that daemonset: it's set up specifically for a k8s 1.21+. It turns out that there was an impact to shifting from dockershim to containerd...and this specific use-case was one of those impacts.

Unfortunately, since the certificate is tied the the domain name of the registry server, the daemonset configuration is now very specific to my environment. I'll try to generalize that in the future.

Prefect flow config

To register a flow in this system with prefect you do something like:

from prefect.storage import Docker
from prefect.run_configs import KubernetesRun


@task
def say_hello():
    print("Hello world")


with Flow("hello world flow",
          storage=Docker(registry_url="prefect-ui.g-clef.net:5000"),
          run_config=KubernetesRun()) as flow:
    say_hello()

client = Client(api_server="http://prefect-ui.g-clef.net:4200/graphql")

client.register(flow=flow, project_name="test-project")

flow.run()

deploying an updated job template

If you make changes to the prefect job template, you will need to re-deploy the template-container. To do that, you'll need to rebuild the image from the dockerfile in the template_container directory, then push it up to your docker repo, update-the docker-compose file to use the new image, and restart the docker-compose-driven setup that runs prefect.

About

settings used to install Prefect in a mixed docker-compose/k8s cluster setup

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published