In a service heavily relying on user generated content, scaling globally poses the challenge of seeding content for a new region, to initially bring a platform to self-sufficiency. One common growth hack to circumvent this issue is using machine translations to display content only available in other langauges in a user's native languages.
This can require using local APIs in markets with restricted access to mainstream services like Google Cloud Translation or Microsoft's Cognitive Platform. Therefore, I propose a service which abstracts other translation APIs behind a common HTTP API. The backend implementation uses Golang and allows for failover between any number of services configured. New upstream service connectors are implemented with one method. Additionally, this service optionally connects to a cache backend to store translations results. This accomodates upstream services that impose restrictive rate or total translation limits.
Currently, there are only two backend implementation. Furthermore, there is currently no backend for external cache services, and no expiry timer for cached items.
Future improvements planned are:
- Integration tests for upstreams
- Concurrently calling multiple upstreams
- Enabling load-balancing to multiple upstreams / expanding failover options
- Controlling the cache backend / settings at runtime
- Limiting simultaneous upstream requests
- Backfilling / cache warming for likely future translations
- Implementing a streaming RPC endpoint
- Rate-limiting by different user groups
- Front-end to manually perform translations
- API for injecting company-specific vocabulary
- Allowing for templated translations / placeholders
- Adding support for more translation services, and cache backends
As a language, Go was a good fit, because it provides both concurrency primitives and a well-tested HTTP implementation.
An added benefit is the speedup over Python / Java. This is a simple, low-level abstraction, so the faster the better.
For the external interface, I decided to implement a single HTTP endpoint, and used Content-Langauge
and
Accept-Language
headers. There is a spec, so this is easier to support.
The caching backend is implemented as a simple Put
/ Get
/ Has
interface, allowing for straightforward expansion
to use external key-value stores. Translation services are called asynchronously - this enables future changes toward
calling multiple upstream services concurrently, and selecting the first one to answer. Currently one service is called
at a time. Failover is currently implemented by pushing a service to the end of the queue for requests to run on.
Upstream credentials are passed as environment variables. Currently, there are two variables processed:
GOOGLE_API_KEY
: If specified, enable the Google Cloud Translation backend with given key.ENABLE_MOCK
: If specified, enables the mock backend. This is used for testing upstream failure handling.
- The cache package is fully unit tested. It plays a part in every request and is critical to reducing upstream load.
handler.go
has full coverage as well. It handles every request, making it a critical piece of code.sanity_test.go
contains tests runninggofmt
andgovet
. Code is more often read than written, so this was a no-brainer.
My name is Leonhard Kuboschek. This code uses libraries written by many people. Thank you. All code contained within this repository was authored solely by me. I'm currently working full-time at a startup (50 hrs / week), and we have a launch coming up. Thus I've been working on this after hours, and on the weekends only. Additionally, this is one of my earlier projects utilizing Go as a language.
This service authenticates all requests by checking the presence of a JSON Web Token (JWT). There is currently no further authorization strategy, all authenticated requests are allowed.
JWTs are checked with the following parameters:
- Algorithm:
HS256
- Secret Key: Set using environment variable
SECRET_KEY
At present, no claims are required on the tokens.
The server accepts HTTP POST
requests to port 8080, the path is /
.
The headers required are:
Content-Language
specifying the language that the request content is assumed to be in.Accept-Language
specifying the target language.Authorization
containing the stringBearer
followed by a JSON Web Token (see authentication section)
The text to be translated is sent in the request body; The content type shall be text/plain
.
The response will contain the Content-Language
header for the target language, as well as the translated text in the
response body. In case of an error, standard HTTP status codes are used for signaling.
There is a sample deployment running at translate dot leo dot codes. It authenticates requests by JSON Web Token. If you would like to receive testing credentials, please send me an email (see homepage links below).
curl -X POST \
'http://localhost:8080/' \
-H 'accept-language: en' \
-H 'content-language: de' \
-d 'Also wirklich!'
`docker run -p 8080:8080 -e GOOGLE_API_KEY=$GOOGLE_API_KEY -e SECRET_KEY=yoursecretkey kuboschek/translate-server`