I developed this service as a side project with the main goal to learn Go lang. The name 'curto' in Neapolitan meas short.
Before running the application you need to set these envs: REDIS_ENDPOINT, REDIS_USERNAME, REDIS_PASSWORD, MACHINE_ID
For example if you want to connect to a local Redis
export REDIS_ENDPOINT=localhost:6379
export REDIS_USERNAME="username"
export REDIS_PASSWORD="password"
export MACHINE_ID=0
you can then start application with
./run.sh
curl -X GET "http://localhost:8080/shorten?url=https://www.mfvitale.me"
the response will be the shortened URL
http://localhost:8080/eBIGnfbLW
Note that the URL returned uses the config 'domain' inside config.yml file. You can also specify it through 'DOMAIN' env
curl -X GET "http://localhost:8080/eBIGnfbLW"
the response will be a '303 See Other' HTTP status
<a href="https://www.mfvitale.me">See Other</a>.
If you want to deploy it on k8s you need to create:
- A secret with name
redis-secrets
and addREDIS_ENDPOINT
. Optionally you can addREDIS_USERNAME
andREDIS_PASSWORD
- A config map with name
app-config
where you can specify thedomain
with your custom domain. - Build the image locally naming it
mfvitale/curto:latest
You can just run kubectl apply -f .deploy/
I have deployed the service on Render so you can reach the service here
The service has been designed to be fast, scalable, and cloud-native.
So I chose to use:
- Snowflake algorithm to have a fast and distributed id generation instead using Database auto-increment feature.
- Base62 conversion to avoid reusing know (but long resulting) hash function (SHA-1, CRC32) that requires managing collision
- Use Redis to store <shortURL, longURL> mapping
The service has been implemented as an application and not an API, so it also manages the redirection for a short URL.
The easiest way to implement an URLShortner service is to keep the pair <shortURL, longURL>
A short URL is like http://shortServiceUrl{hashValue} where 'hashValue' is the result of a hash function.
A first approach is to use SHA-1 hash function that generates something like 'da39a3ee5e6b4b0d3255bfef95601890afd80709' but the problem is that is too long (eventually much longer than the longURL) so to cope with that we need to cut it to a defined size. This method can lead to hash collisons. I don't want to manage it.
Another approach is to use base 62 conversions. With this approach, we convert a base 10 number to its base 62 representation. Where we can take this number? The number can be a generated ID.
Also for the numeric ID generation, there are different approaches:
- use the Database auto-increment feature
- use Snowflake
I discarded the first choice because:
- Hard to scale with multiple data centers
- Don't want to set up a database only to generate IDs
- Single point of failure in case of not replicated Database
- It does not scale well when a server is added or removed ( in case of a multi-master replication)
In the first instance, I deployed the service with a K8s Deployment and as MachineId for the Snowflake algorithm, I used the PID of the process running in the pod. The problem with this approach is that:
- different pods can have the same PID
- PID can change when the pod restarts
So I decided to switch from Deployment to StatefulSet to have a persistent id for the pods. I used this id as MachineId.
- introduce a rate-limiter to prevent resource starvation
- use helm to deploy