Since these services are usually going to be low-traffic and do not require HA, they should be able to share compute/storage resources to minimize cost. However, they should be able to scale if necessary without being completely rewritten.
The infrastructure is not expected to be vendor agnostic. However, as much as possible it should be developed using Infrastructure-as-Code principles. This is because the code serves as a form of documentation and "backup" for the services.
Services are expected to be stateless and to have their interface with the environment defined as explicitly as possible via Docker. State is to be stored exclusively in:
- A Postgresql database in a shared database cluster.
- A resilient object store (e.g. AWS S3).
Because of the cost efficiency and simplicity of their services, I have chosen Digital Ocean as a hosting provider for these services. However, they should eventually be self-hosted, and will be designed as much as possible with self-hosting in mind.
Whether hosted on DO or self-hosted, the services will be managed with Kubernetes.
For simplicity, in the future the shared Postgresql database cluster will simply be called "Postgresql" or "the database". The resilient object store will be called "Spaces" or "the object store".
Part of the intent of this system is to make it easy to deploy individual projects. Therefore, this same model is also used to deploy this project itself.
To setup the project, you first need to bootstrap the CI/CD pipeline. Then you can deploy this repository with the pipeline, thereby creating everything else automatically.
Gitea is an open-source, self-hosted Git service.
Gitea was chosen because it has low operational overhead and resource usage.
Concourse is an open-source, self-hosted "task runner" that we can use for CI/CD.
Concourse has both a Web and Worker component, but the concourse/concourse
Docker image bundles both of them together. Since CI/CD is not considered a HA service, only a single Concourse container is needed.
- Be self-sufficient: able to survive without Internet.
Running without Internet requires hosting local copies of a few cloud services that are normally taken for granted.
- Docker Registry.
- Any package repositories.
Internet is still required for initial installation and setup of most services. This system isn't targetting "air-gapped" clusters that can't connect to the Internet. Instead, the intent is to build a system which is Internet-connected, but will continue to work if the Internet goes down for an extended period.
- Shared, air-gappable block storage
- VPN
- HTTP application proxy
The personal-infrastructure
project provides scripts and configurations for deploying personal projects using Kubernetes.
Goals:
- Cost Effective:
- Low Friction: Deploying new projects should be easy and "nearly free."
- Security First:
This repository is intended for individuals who:
- Have multiple (3+) projects/services they want to deploy.
- Want to use containerization to manage project deployments.
- Want to allow projects to share compute and storage resources to minimize costs.
- Are concerned about security, but not enough to spend significant extra money.
- Want to have complete ownership over your project's deployment platform.
Hosting basic Kubernetes cluster plus persistence costs at least $25/month to host in the cloud.
Alternatively, the system could be run on one or more computers that you own and manage in your home.
- Constructing "digest emails" from Reddit, Hacker News, and other sites.
- Providing Push Notifications based on webhooks for defined events
- Git repository hosting (Public and private)
- Hosting static websites / blogs / wikis
- Provide shared folder sync across devices (Syncthing)
- Providing alternative to Google Analytics (Piwik)
- RSS Feed Reader (Miniflux)
- Run timed jobs at regular intervals.
There should also be support services, which are used to manage the overall cluster:
- Metrics / Monitoring service
Personal infrastructure services fall into one of three categories:
- Shared / utility services. Not intended to be used directly, but to be referenced by other deployments.
- Private services. Tools for personal use, e.g. Fathom Analytics.
- Public services. Either websites or webhooks that need to accept requests from the Internet.
Utility services are exposed as ClusterIP Services in k8s, which are internal to the cluster.
Private services are also exposed with ClusterIPs, which means dashboards, etc. are not available on the public Internet. They require a VPN connection.
- Syncthing service? (Something to sync to/from Spaces)
- Personal Wiki / PIM? (Requires Postgresql + Spaces?)
- Analytics service, e.g. Piwik (Requires Postgresql)
- Huginn (Requires Postgresql)
- Personal websites / projects (May require Postgresql and/or Spaces)
The following common tools are deliberately excluded from the base personal-infrastructure
distribution:
- K8s Dashboard
- Helm
- Istio
- Potential security risk if improprly configured.
- Encourages ad-hoc changes, which are damaging to IaC effectiveness.
- Benefits of Helm.
- All Kubernetes resources and configuration needed for each individual project can be encapsulated into a unit called a "chart."
- Charts provide a higher-level abstraction than Docker images but attemt to offer a similar vision -- a package that can be automatically deployed.
- Charts can include other charts as dependencies.
- There is a public registry of charts.
- Charts impose a consistent pattern for configuring k8s resources.
- All Kubernetes resources and configuration needed for each individual project can be encapsulated into a unit called a "chart."
- Costs of Helm.
- Encourages deployments of charts without understanding the resources being created.
- Requires
tiller
server to run in your cluster.
Note -- the following VPN setup is temporary, should use a diferent place for the certs.
Set the node-type
label to vpn
on a single node that you want to be the VPN. This node is going to have a folder created in /mount
that contains the VPN certificates.
kubectl label nodes my-vpn-node role=vpn
kubectl apply -f ./k8s/shared/vpn
- Set up your Kubernetes cluster. (Minikube, DO k8s, GCE, etc.)
- Tested with k8s versions:
v1.14.1
. - Required extensions:
- CoreDNS
- Not Recommended Features/Extensions:
- Kubernetes Dashboard
- Helm
- Helm Installation
- Install CLI tool
helm
on your machine. - Deploy Helm server
tiller
by runninghelm init
.
- Install CLI tool
- Helm Installation
- Istio
- Benefits of Istio 1.
- Ensure
kubectl
is configured properly. You should be able to runkubectl cluster-info
and see both the cluster master and the CoreDNS extension. - For DigitalOcean managed Kubernetes:
- Follow DO setup guide.
- Download and install
doctl
per DO instructions. - Download and install
kubectl
. Ensure kubectl minor version matches cluster minor version. - Run:
doctl kubernetes cluster kubeconfig save my-cluster-name
- Tested with k8s versions:
- Testing and diagnostic tips
- Creating a "scratch" container (a bash session running as a temporary pod, which cleans itself up automatically).
kubectl run scratch -i -t --rm --image=ubuntu --generator=run-pod/v1 -- /bin/bash
- Connecting to an internal service from localhost:
- Assume service
foo
exposes traffic on port3000
- Run:
kubectl port-forward service/foo 3000
- Now you can access
foo
service atlocalhost:3000
in your browser - Also works for pods, deployments, etc.
- Assume service
- Creating a "scratch" container (a bash session running as a temporary pod, which cleans itself up automatically).
- Set up and deploy shared resources
- VPN.
- Rationale. A VPN is used to provide another layer of security around accessing personal/internal HTTP dashboards.
- Technologies. OpenVPN is used via the
kylemanna/openvpn
Docker image. - Requirements.
- Installation.
kubectl apply -f ./k8s/vpn
- HTTP Proxy / API Gateway.
- Rationale. A self-managed HTTP proxy is used to allow public routes to be mapped to different services based on the request origin and path.
- This reduces the coupling between how services expose their HTTP interfaces, and the public network interface exposed by your cluster.
- Consider Huginn and Fathom. Using an HTTP proxy, we can allow incoming webhooks without allowing access to the dashboards outside of the VPN.
- Technologies. Ambassador was chosen for ease of use.
- Ambassador is a version of Envoy that is fully configurable with Kubernetes annotations, and intended for use as an API gateway.
- Rationale. A self-managed HTTP proxy is used to allow public routes to be mapped to different services based on the request origin and path.
- Private Docker Registry.
- Rationale. Docker images are required to deploy projects onto the cluster. Open-source images can be published to Docker Hub, but sensitive images must be published to a private registry.
- This system includes a private registry so it can continue working without Internet service. If that is not a concern, cloud alternatives like AWS ECR can be used.
- Technologies. Docker provides an official
registry
image.- Repository images will be persisted to the shared object store.
- Requirements.
- Shared object store must be installed.
registry-s3
secret must be configured.
- Installation.
kubectl apply -f ./k8s/shared/registry
- Rationale. Docker images are required to deploy projects onto the cluster. Open-source images can be published to Docker Hub, but sensitive images must be published to a private registry.
- Postgresql Database Cluster.
- Rationale. Provides a place for services to store state. Required for numerous services to function.
- Shared database cluster is used to reduce costs.
- Technologies. Postgresql is used. MySQL could also be fine.
- Requirements.
- These instructions assume that your database is being hosted in an external RDBMS.
- For example, AWS RDS.
- Should be updated to describe self-hosted Postgresql for offline usage.
- These instructions assume that your database is being hosted in an external RDBMS.
- Installation.
kubectl apply -f ./k8s/shared/postgresql
- Rationale. Provides a place for services to store state. Required for numerous services to function.
- S3-Compatible Object Store.
- Rationale. Alternative persistence backend when storing data inappropriate for relational databases.
- Allows services to store binary data and files without using a data volume per service.
- S3 compatibility is valuable to allow open-source tools like Docker Registry to use it easily.
- Technologies. At the present time, an actual cloud-based solution like AWS S3 or DO Spaces is recommended.
- Rationale. Alternative persistence backend when storing data inappropriate for relational databases.
- VPN.
For each service, create the user+database for that service with:
CREATE USER fathom WITH PASSWORD 'password';
GRANT fathom TO doadmin;
CREATE DATABASE fathom WITH OWNER fathom;
The Postgresql database is abstracted behind a service. Configure the service for your database, then:
kubectl apply -f k8s/postgresql
With that, our deployments can connect to Postgresql at postgresql.default.svc.cluster.local
with port 25060
.
Deployed via huginn-server
deployment in k8s. Connects to Postgresql via environment variables
kubectl create secret generic huginn-database --from-literal=username=huginn --from-literal=password=PASSWORD
kubectl create secret generic huginn-app --from-literal=token=TOKEN
kubectl create secret generic huginn-invite-code --from-literal=token=TOKEN
kubectl apply -R -f ./k8s/fathom
Some Huginn integrations depend on having the Huginn HTTP server available on the public Internet to receive webhook events. Therefore, Huginn is a public service.
See: https://github.com/usefathom/fathom
In database:
CREATE USER fathom WITH PASSWORD 'mypassword';
GRANT fathom TO doadmin;
CREATE DATABASE fathom WITH OWNER fathom;
In terminal:
kubectl create secret generic fathom-database --from-literal url=postgresql://fathom:PASSWORD@DB:PORT/fathom
kubectl apply -R -f ./k8s/fathom
Then, you can connect to the container with:
kubectl port-forward deployment/fathom 8080