A self-hostable analytics service with a straightforward API to collect events from any source.
I was bored and felt like writing my own analytics service over the weekend.
- Standard website analytics collection
- Custom metrics collection
- UTM query collection
- Optional anonymized location collection
- Customizable UI
- Date range selection and comparison
- Public URL sharing
You need docker + docker-compose installed for a quick production start or you
can figure out how we install and run things via the
Dockerfile and set it up
If you want to install things without docker then you'll need the following dependencies:
You can also check the
Dockerfile for an exact list of dependencies and adjust
package names for your desired platform.
This is a standard Django project. If you know how to run Django, or want to look up any Django tutorial on how to run Django, you shouldn't have a problem getting this project running on almost anything.
If you have all of the above dependencies installed you can use my Makefile to
run and install python and node dependencies locally. Running
make will check
that you have the proper dependencies installed and if not it will try and
install them for you. It will then create you a fresh database and run
Checking outdated dependencies
This can be done in both yarn and pipenv with the following two commands:
pipenv update --outdated yarn outdated
You can then upgrade the outdated dependencies with the following two commands:
pipenv update yarn upgrade
I recommend testing everything after this to make sure it's all working.
Optimizing images with webp
My development system runs Ubuntu so I installed the official webp utils from
apt install webp.
cwebp -q 90 -m 6 -o output.webp input.png
The easiest way to run this project is to run it using
docker-compose up --build -d if you have
installed. This will start the server and have you running at port 8000. The
first time you do this make sure you run migrations with
docker-compose run web python manage.py migrate. Make sure you setup the
.env file before running, you can copy the sample from
samplefiles/env.sample into the root of the project as
.env and change the
The default user is
admin with the password
admin. We also create an example
property so you can see how the analytics look and a property to collect metrics
User location data
I'm unsure how I want to handle user location data at the moment. I'm not really interested in someone's personal location but I do like to know where people are coming from region wise. This helps me know if I need to add translations to my projects or if I need to add maybe a CDN/caching/server to a new region.
For that reason I've added a simple way to enable or disable location data. I don't want to store user IPs so location data isn't retroactive. If you want to enable IP address lookups you can download a free or paid one from MaxMind on maxmind.com.
Once you get a database drop it into the
data directory on your server and
db.mmdb. Note that we are only using the binary database, not the
Once added then we'll automatically start recording location data but leave out the IP address and any directly identifiable information.
You can configure the database path in settings.
All data is stored in
/srv/data/analytics/ and your repo is stored in
/srv/git/analytics.git/. You can backup both of these folders and you'll have
a 100% backup of everything except changes you may have made to the
.env file which should be easy enough to recreate but you can back
those up too!
This quickstart requires that you have an Alpine Linux server running with a domain name pointed to it. I'm currently using Linode as my host since they support Alpine Linux nicely. If you don't want to use Linode or Alpine Linux you can use these instructions and just change the apk commands at the start to whatever Linux distro you're using.
IMPORTANT NOTE: Change
analytics.bythewood.me to your domain name where
relevant in these instructions.
TIP: During the ufw portion to enable the firewall I recommend only allowing
your IP address or your ISP's IP address range which you can find on whois
lookups at the top. For example, replace
126.96.36.199/20 with your IP or your
ISP's IP range.
ufw allow from 188.8.131.52/20 proto tcp to any port 22
I allow my local ISP's range because I have a DHCP lease from them and I get tired of logging into my server from my hosting provider's UI to update it. It's good enough security and much better than nothing!
apk update && apk upgrade && apk add docker docker-compose caddy git iptables ip6tables ufw ufw allow 22/tcp && ufw allow 80/tcp && ufw allow 443/tcp && ufw --force enable echo -e "#!/bin/sh\napk upgrade --update | sed \"s/^/[\`date\`] /\" >> /var/log/apk-autoupgrade.log" > /etc/periodic/daily/apk-autoupgrade && chmod 700 /etc/periodic/daily/apk-autoupgrade rc-update add docker boot && service docker start mkdir -p /srv/git/analytics.git && cd /srv/git/analytics.git && git init --bare
git clone firstname.lastname@example.org:overshard/analytics.git && cd analytics git remote remove origin && git remote add origin email@example.com:/srv/git/analytics.git git push --set-upstream origin master
mkdir -p /srv/docker && cd /srv/docker && git clone /srv/git/analytics.git analytics && cd /srv/docker/analytics cp samplefiles/Caddyfile.sample /etc/caddy/Caddyfile && sed -i 's/analytics.example.com/analytics.bythewood.me/g' /etc/caddy/Caddyfile cp samplefiles/env.sample .env && sed -i 's/analytics.example.com/analytics.bythewood.me/g' .env cp samplefiles/post-receive.sample /srv/git/analytics.git/hooks/post-receive mkdir -p /srv/data/analytics/db && chown -R 1000:1000 /srv/data/analytics docker-compose up --build --detach && docker-compose run web python3 manage.py migrate --noinput && docker-compose run web sqlite3 db.sqlite3 "PRAGMA journal_mode=WAL;" ".exit" rc-update add caddy boot && service caddy start
I choose to use an sqlite3 database since that handles all my usecases just fine. My first recommendation for scaling this project would be to use a PostgreSQL database. If you want to get fancy then a time-series database like Timescale would make a lot of sense. The foundation of this project is pure Django so it shouldn't be hard to swap in a different database.
I won't be providing any user support for this project. I'm more than happy to accept good pull requests and fix bugs but I don't have the time to help people run or use this project. I appologize in advance for this. Maintaining mutliple OSS projects has taught me that I need to step back from trying to provide support to avoid burnout.