Skip to content
John Shaughnessy edited this page Nov 7, 2023 · 9 revisions

Dev Log

2023-11-07

Adding a web-client

The server exposes a CRUD API at /api/weights. I built a yew -powered front end application (doing client-side rendering) that talks to the actix_web server.

I like to have the front-end codebase in a (mostly) separate project than the backend, mostly because the backend api feels like a different “thing” than the front end. In particular, I might have multiple clients for the same backend. The web client, the mobile client, a command line client might all talk to the same backend application (which does access control to the underlying postgres database).

So, I added another service to the docker compose config so that I can do development on the client locally. Then when I was satisfied with a basic prototype, I wrote github workflows that would build the client with trunk and upload a gzipped tarball (.tar.gz) to a GCS bucket. When I’m ready to deploy the built client, I run a workflow that copies the static files over to the staging or production servers. Finally, I added actix-files to my server with routes to serve the (unzipped) static files.

It took a lot of fiddling with Dockerfiles, workflows, cors configs, environment variables. But overall, it wasn’t bad and didn’t take very long.

AI in my workflow

I’ve been using ChatGPT as I build things for some time now, and in general I find it to be very useful. Yesterday I added copilot to emacs via https://github.com/zerolfx/copilot.el . It’s like a supercharged version of auto-complete. I’d tried it a little bit with vscode on mac, but since I rarely develop on mac, I also rarely use vs code or copilot… until now.

In the 10 years I’ve been programming professionally, these tools are probably the most noticable jump in productivity I’ve felt. In particular, the ease with which I can start working in unfamiliar fields is unparalleled. In domains where I already know what I’m doing, maybe it hasn’t helped with comprehension all that much, and only saved me some typing time. But in areas where I only roughly understand how the libraries or tools work, ChatGPT has enabled me to get things working MUCH faster than I was able to do without it.

I think the biggest reason for this is that when I’m on my own, it’s hard to follow the “narrow path” to learn just what I need to learn in order to accomplish my task. ChatGPT makes this not only possible, but it allows me to do it fearlessly. I’m not going to forget what path I took to get to where I went, because my chat history left a trail. If I need to carve more knowledge out before proceeding, I can easily do so. I don’t have to spend as much time and effort deciding how/when to learn the prerequisite material vs trying to go directly. I don’t need to wade through documentation that is not-quite-what-I’m-looking for.

All of these small time saves add up to an experience that is dramatically more streamlined than I’m used to. It’s not quite as streamlined as the natural-language-to-gpt-app that I saw in OpenAI’s keynote demo yesterday, but it’s still a major, MAJOR improvement to my quality of life.

2023-11-04

TIL Processes can consume standard input

# Today I learned that processes can consume standard input.

# Show that "docker compose exec" is cursed
ssh gcp1 << EOF
  docker compose -f track/base.yml -f track/prod.yml exec db echo "Hello, cursed world."
  echo "By default, docker compose exec consume the rest of the script as stdin."
  echo "So none of these lines will be executed."
  ls
  echo "RIP world."
EOF

# Fix #1: Disable reading from stdin with --interactive=false
ssh gcp1 << EOF
  docker compose -f track/base.yml -f track/prod.yml exec --interactive=false db echo "Hello, world."
  echo "This line and the rest of the script will run."
  ls
  echo "Goodbye, world"
EOF

# Fix #2: Redirect /dev/null to stdin
ssh gcp1 << EOF
  docker compose -f track/base.yml -f track/prod.yml exec db echo "Hello, world." < /dev/null
  echo "This line and the rest of the script will run."
  ls
  echo "Goodbye, world"
EOF

# Also, this problem isn't unique to docker:

# Let's forget to pass a file to cat
ssh gcp1 << EOF
  cat
  echo "The rest of the script is now input to cat."
  ls
  echo "Oh no!"
EOF

# Is this what "rip grep" means?
ssh gcp1 << EOF
  echo "This line will be displayed"
  grep "search-pattern"  # grep is waiting for input from stdin
  echo "This line won't be executed because grep is consuming the stdin."

EOF

# These are fixed with the same "</dev/null" trick
ssh gcp1 << EOF
  cat < /dev/null
  echo "This displays normally"
  ls
  echo "Hooray!"
EOF

# Though we probably meant to pass something to cat other than /dev/null

2023-11-01

Setting up ops

I’ve settled on a (relatively) simple workflow for deploying updates to a staging environment and then to a production environment. The workflow is captured by the new “Ops.org” file.

A bunch of small changes and a few new workflows need to be created to finish this part.

2023-10-31

Automating migrations

Migrations are inherently riskier than regular deploys. They can alter the database. What can be done to make them less of a headache?

First, I can automate their deployment such that I’m just clicking a “go” button in github actions and watching the change roll out to my servers. No ssh’ing in and running the migration manually. That’s a minor change when I’m dealing with just one database, but should be especially helpful when dealing with multiple servers.

Also, I only really have two environments set up at the moment: I have a development environment running in docker containers on my local machine, and I have a production environment running in production on a GCP VM. I would like something between this and that. I’d like a development environment that is as close to production as I get. This will require setting up another VM that I can auto-deploy to.

Finally, database migrations are scary because I can permanently delete data. Let’s change that. I’d like to set up a simple system where I create a backup before performing the migration. That way, if something goes terribly wrong, I will at least be able to restore from backup.

Of course, this is probably not the most robust solution out there. I imagine the production services have much more guarantees about never losing transactions. But I don’t know how they set that up, and I want to start with a simple system that gives me (the sole user) some assurance that I’ll probably not delete all my data on accident.

Considering Blue-Green deployments

Ok. Let’s write a simple, high level plan of what releasing a change to production will look like.

There are three types of production changes we are considering:

  1. A client change
  2. A server app change
  3. A server app change WITH a migration
  4. A client change. The client is built with rust/yew. It is built and bundled into a set of static assets that the server application serves to browser clients. Pushing to production simply means:
  • Pushing the code to a github feature branch
  • Automatically running a “build” step that creates the static asset bundle (and potentially runs automated tests)
  • Uploading the static asset bundle somewhere (like a simple file store on gcp. Whatever the equivalent to an s3 bucket is).
  • To test this new code against the “dev” server, I pass a query string parameter to the dev server which tells it to fetch and serve the static assets from the file store (instead of the latest production bundle).
  • From here, I might run another series of automated tests (this time, incorporating the live dev server).
  • When I am ready to deploy to “prod”, I merge to main. This automatically adds a “tag” of some sort to the static asset or to a shared key/value store somewhere in the infrastructure so that all the app servers know which static assets are the “prod client”.
  1. A server app change. The server is built with actix_web, diesel, and rust. When I push to a feature branch, a new docker image is built and uploaded to a docker registry. Automated tests run. Separately, I can deploy this application to a development server (probably whichever blue/green server is not currently in use). I can run whatever tests I want in this development environment. When I am ready to “push to prod”, either the prod server’s container is restarted (and the new image is fetched), or the dev server BECOMES the prod server (as in the case of green/blue deployments.
  2. A server change WITH a migration. I can’t really test this change lightly. When I push the change, the server application image is built, and I can optionally run a migration job. The migration job will run the migration(s) on whichever of the blue/green server is currently the dev server. I will then deploy the application image into a container running on that instance. When I am satisfied that the migration and the application are running correctly, that instance will become the live one (the blue/green flip will occur).

Important unanswered question(s):

  • The live production environment is only ever talking to one database. How do changes get “synced” during a blue/green deployment? What precautions do I need to take? What simplifications can I make?
  • Is this is a reasonable deployment strategy? Are there any major oversights or misunderstandings?
  • Can this be made simpler without being made less safe or less redundant?

2023-10-28

Speeding up automated builds

I noticed that my automated workflow to build the application took \~4 minutes to run. Most of this time was waiting for cargo to download and compile dependencies that don’t change between commits. This step only needs to happen if the dependencies change. That is, if Cargo.lock changes.

So, to speed this up, I created a new base-builder image where the dependencies have been precompiled. I use this image as my starting point to build the application. This docker image will only be updated when Cargo.lock changes.

My build speed increased \~400% in the common case, from 4 minutes to 1 minute.

Encountering timezone conflicts

A migration that came with diesel was for automatically managing updated_at fields on records. It would set a new value of the field anytime I tried to update anything in the record.

This is good. I want this.

The problem is that it was written for Postgres’s TIMESTAMP datatype, whereas I previously decided to use unix timestamps for everything. It was easier to change the migration than to change my data, but I’ve added a TODO list item to change from unix timestamps to PostgreSQL TIMESTAMP and chrono::NaiveDateTime so that I can take advantage of the many time-related functions in PostgreSQL and chrono/rust.

Migrations

Since I changed a migration it’s likely time to automate the process of tearing down my database and starting fresh. Later, I’ll want to keep a backup and manage changes in a more robust way but so far since there’s no real data in the database, it’ll be faster and easier to tear it all down.

For now, I just did it “manually”:

docker run --rm -it -v ~/track:/app --network docker_default johnshaughnessy/migration-runner /bin/bash

cd /app/server/

DATABASE_URL=postgres://postgres:redacted@db:5432/postgres diesel migration revert
DATABASE_URL=postgres://postgres:redacted@db:5432/postgres diesel migration revert
DATABASE_URL=postgres://postgres:redacted@db:5432/postgres diesel migration revert

DATABASE_URL=postgres://postgres:redacted@db:5432/postgres diesel migration run

CRUD

I simplified the API for weights to follow a very standard “create, read, update, delete” form.

Testing

I moved unit tests into the model file where the database operations were written.

I’m not sure yet how to test the API functions specifically. I haven’t thought about it much, but given how many minor mistakes I caught in the unit tests for the model (and how much of a pain it is to test these things manually), I would like to write some automated tests for the API functions too.

It might be the case that as I get better, these functions effectively never have bugs in them, and testing is just bloating the codebase and wasting time. But, I’m certainly not at that point yet. There were some tests in the elixir codebase I worked on that were like that. I couldn’t think of a situation where some of the tests would catch something regressing. But then again I didn’t do all that much development in that codebase, and it was significantly more stable than our game client.

2023-10-27

I migrated from sqlite to postgres, and started using diesel.

I set up Github workflows to automate building and deploying the application. I have not automated database migrations yet.

Instead, to run database migrations in production I followed these steps:

  • I sshed into the prod vm.
  • I spun up a (throw-away) docker container using a rust image in the same docker compose environment/network:
docker run --rm -it -v ~/track:/app --network docker_default rust:1.73-slim-buster /bin/bash
  • I installed some prerequisites in the container. Namely libpq-dev:
apt-get update && apt-get install -y build-essential libpq-dev
  • I installed diesel_cli in the container:
cargo install diesel_cli --no-default-features --features postgres
  • I saved a snapshot of the container as an image I can reuse:
docker commit a84c48dd9ca7 johnshaughnessy/migration-runner

# sha256:a482b7941204854c84b2d3b1e067ca058a1cca4f8aee777643ef499a84fb8d3b
  • I made sure the DATABASE_URL environment variable was configured and ran the migrations.
DATABASE_URL=postgres://postgres:redacted@db:5432/postgres diesel migration run
  • I made sure that the migrations had run successfully. Since I don’t have automated tests yet, I just manually created some entries with curl and listed them.
  • I shut down the throw-away container.