Search for royalty-free music by sonic similarity
Earworm is a search engine for music licensed for royalty-free commercial use, where you search by sonic similarity to something you already know. Either upload some mp3s or connect your Spotify account, then use clips from songs you like to search for similar songs that are licensed for you to use in your next great video game or YouTube video (always verify the license terms first; many licenses still require attribution!). You can search for the overall closest fit, or choose to focus matching genre, mood, or instrumentation. All the audio processing is done in your browser; your music never leaves your device.
A 2d projection of some embeddings, in this case colored by mood
This is a Work in Progress! Some features are actively being developed, and I'm currently evaluating multiple models, so search results may change (hopefully improve!) over time. Check the current live version at https://earworm.dev.
The modeling and webapp enviornments are independent. To set up the modeling environment, you'll need Conda.
git clone https://github.com/reppertj/earworm.git
cd earworm/modeling
conda env create -f environment.tml
To run the webapp locally in development mode, you'll need Docker and Docker Compose.
git clone https://github.com/reppertj/earworm.git
cd earworm
In the earworm
directory, create a .env
file modeled after .env.example
. To start the backend, run:
docker-compose up
This will take several minutes the first time, as the images need to be built. After everything is running, you can visit
localhost/docs
to see the backend Swagger UI. In development mode, both the backend and the worker containers will live reload after changes to *.py
files in the backend
directory.
The frontend container only serves the compiled version of the app via nginx. To work on the frontend in development with live reloading, devtools, etc:
cd frontend
npm install
npm start
You can now visit the frontend at localhost:3000
. The frontend proxies requests to the backend mapped via docker to localhost:80/api
, so you can take advantage of local live reloading while working on the full stack at once.
In development mode, both the frontend and celeryworker containers install an instance of Jupyter lab, so you can drop in to the container to interactively investigate how things are working. To start the notebook server on the worker, for instance:
docker-compose exec celeryworker bash
$JUPYTER
This will print a link with an access token that you can visit on your local machine (outside the container).
Here is one way to manually deploy the site, using a combination of an S3-compatible object storage service and VPS instances running Ubuntu with attached block storage to persist (a) the database and (b) https certificates. This can also serve as the basis for automated deployments. You can deploy the project on a "cluster" of a single node or across multiple nodes.
-
Ensure that Docker on your build machine is logged into a private container registry, and the following environment variables point to where these containers should be pushed to the registry (without the tag suffixes):
DOCKER_IMAGE_BACKEND DOCKER_IMAGE_CELERYWORKER DOCKER_IMAGE_FRONTEND
-
Set the
TAG
environment variable as appropriate, e.g.,dev
,stag
,latest
-
Run
scripts/build-push.sh
-
You'll need at least one server running Docker and Docker Compose. These instructions were tested on Docker 19.03.13 and Docker-Compose 1.27.4, running on Ubuntu 20.04 LTS. Follow the recommended instructions here to install Docker and Compose. On a fresh VPS, you'll probably also want to
apt-get update && apt-get upgrade
. -
For previews, you'll need an S3-compatible bucket (does not need to be AWS) and a user with permissions scoped to read-write access to this bucket. The bucket does not need to be public. Add these credentials to the
.env
file. -
This project benefits hugely from a CDN to cache and serve the Tensorflow JS models from the edge, since the web worker begins downloading these after the first interaction. Cloudflare's free tier is one good option for this.
-
For persistent storage for the database, one option is to use a block storage provider supported by Rex Ray. Alternatively, you can use label constraints in the Docker-Compose file to ensure that the database is always deployed to the same node and attached to the same volume (see the comments in
docker-compose.yml
andtraefik.yml
). For Rex Ray, follow the linked instructions to create a properly scoped credential for your cloud provider. You'll need to do this on any node that might need to access an attached volume. For example, on DigitalOcean (change the region as needed):read -e -s -p "DigitalOcean access token?" DOBS_TOKEN
(paste your token at the prompt)
docker plugin install rexray/dobs DOBS_TOKEN=${DOBS_TOKEN} DOBS_REGION=nyc1 DOBS_CONVERTUNDERSCORES=true
(
DOBS_CONVERTUNDERSCORES
is needed because DigitalOcean doesn't accept underscores in volume names, butdocker stack deploy
automatically appends prefixes with underscores in the docker stack file.)
-
(Optional) For simplicity, point a separate public A or CNAME record to each node's IP address using a domain that you own; if all nodes are on a private network, you can use private addresses to avoiding opening ports needed for swarm mode on your public IP.
-
At least one node must be designated as a manager. Set the server
hostname
to match your DNS records set up in step (1):export USE_HOSTNAME=leader.example.com echo $USE_HOSTNAME > /etc/hostname hostname -F /etc/hostname
If you are using multiple nodes, repeat steps (1) and (2) for each node.
-
Put Docker into swarm mode on the first manager, specifying the public or private IP address configured in step (1) if applicable. If the server only has one address, then you can just run
docker swarm init
. If you see an error like:Error response from daemon: could not choose an IP address to advertise since this system has multiple addresses on interface eth0 (138.68.58.48 and 10.19.0.5) - specify one with --advertise-addr
do not assume that these are the only IPs available. Check
ip addr show
and your private network configuration. You may need to configure your firewall, especially if you use a public IP.docker swarm init --advertise-addr <Node IP Address>
-
(Optional) Set up additional manager nodes. On the first manager node, run:
docker swarm join-token manager
and follow the instructions.
-
(Optional) Set up worker nodes. On the first manager node, run:
docker swarm join-token worker
and follow the instructions.
-
Check that the cluster has all the nodes connected:
docker node ls
In production, this project runs two instances of Traefik. The first acts as the main load balancer/proxy, covering the entire cluster and handling HTTPS, generating certificates, etc. Then, inside the cluster, another instance of Traefik directs traffic to services inside the app (nginx serving the frontend, FastAPI, management tools, etc.). This allows you to run additional services behind other paths, without them interfering with each other, all on the same cluster. First, set up the main proxy:
-
On a manager node in your cluster, create a network to be used by Traefik; you'll make this network accessible from the internet.
docker network create --driver=overlay traefik-public
-
Copy the
traefik.yml
to the manager. This is a docker compose file for the main proxy.curl -L https://raw.githubusercontent.com/reppertj/earworm/master/traefik.yml -o traefik.yml
-
Create environment variables for the Traefik setup (for experimentation/staging, use "https://acme-staging-v02.api.letsencrypt.org/directory" as the certificate authority; otherwise, you'll risk hitting the Let's Encrypt rate limits if you need to debug).
CA_SERVER=https://acme-v02.api.letsencrypt.org/directory TRAEFIK_EMAIL=admin@example.com TRAEFIK_USERNAME=admin TRAEFIK_DOMAIN=traefik.internals.example.com
-
Set up a password to be used for authenticating into the Traefik web interface and console UI:
export HASHED_PASSWORD=$(openssl passwd -apr1) # (Type in your password and verify)
-
Get the swarm ID and create a label for the node on which you want the proxy to run, so Traefik always runs on the same node and attaches to the named volume storing the certificates.
export NODE_ID=$(docker info -f '{{.Swarm.NodeID}}') docker node update --label-add traefik-public.traefik-public-certificates=true $NODE_ID
-
Deploy the stack:
docker stack deploy -c traefik.yml traefik
Make sure everything is running smoothly:
docker stack ps traefik docker service logs traefik_traefik
You should now also be able to visit
$TRAEFIK_DOMAIN
and login to the Traefik dashboard over HTTP basic auth. If you can't, this is the time to troubleshoot SSL certificates, firewalls, DNS, etc.
-
Create your .env file on a manager node.
cd earworm vim .env # (copy and paste the contents of .env with production settings)
-
On each node, login to your container registry. For example, using the GitHub container registry and a personal access token scoped to read:
export CR_PAT=secret echo $CR_PAT | docker login ghcr.io -u USERNAME --password-stdin
-
Add the label for the db to one of the nodes.
STACK_NAME
is the name of the docker stack you want to use, something likeearworm-app
. This must align with the label constraint in your.env
file to ensure that postgres is always deployed alongside the block storage volume for the database. Without this, the DB container will fail to launch, as the label constraint won't be satisfiable.docker node update --label-add $STACK_NAME.app-db-data=true $NODE_ID
-
Finally, deploy the stack from a manager node. Set environment variables for the name of the docker stack, container names, and container registry tag.
export TAG=stag export STACK_NAME=earworm-app export DOCKER_IMAGE_BACKEND="<uri/of/backend-container/in/registry>" export DOCKER_IMAGE_CELERYWORKER="<uri/of/celery-container/in/registry>" export DOCKER_IMAGE_FRONTEND="<uri/of/frontend-container/in/registry>" bash scripts/deploy.sh
-
Check the service status and logs:
docker stack ls docker stack ps $STACK_NAME docker service logs <service-name> -f
If all went well, in a few minutes, your site should be live at $DOMAIN
.
On your build machine:
export TAG=stag
bash scripts/build-push.sh
On a manager node:
export TAG=stag
export STACK_NAME=earworm-app
export DOCKER_IMAGE_BACKEND="<uri/of/backend-container/in/registry>"
export DOCKER_IMAGE_CELERYWORKER="<uri/of/celery-container/in/registry>"
export DOCKER_IMAGE_FRONTEND="<uri/of/frontend-container/in/registry>"
bash scripts/deploy.sh
docker service update --force <backend_service_name> -d
docker service update --force <celeryworker_service_name> -d
docker service update --force <frontend_service_name> -d
You site should be back up after a few seconds, with the message queues and database uninterrupted, and with new migrations run.
MIT license. See LICENSE
for more information.