Skip to content

A caching translation server with pluggable backends.

License

Notifications You must be signed in to change notification settings

kuboschek/translate-server

Repository files navigation

Travis Docker Build Status

Translate Server

The Problem

In a service heavily relying on user generated content, scaling globally poses the challenge of seeding content for a new region, to initially bring a platform to self-sufficiency. One common growth hack to circumvent this issue is using machine translations to display content only available in other langauges in a user's native languages.

This can require using local APIs in markets with restricted access to mainstream services like Google Cloud Translation or Microsoft's Cognitive Platform. Therefore, I propose a service which abstracts other translation APIs behind a common HTTP API. The backend implementation uses Golang and allows for failover between any number of services configured. New upstream service connectors are implemented with one method. Additionally, this service optionally connects to a cache backend to store translations results. This accomodates upstream services that impose restrictive rate or total translation limits.

Scope

Currently, there are only two backend implementation. Furthermore, there is currently no backend for external cache services, and no expiry timer for cached items.

Future improvements planned are:

  • Integration tests for upstreams
  • Concurrently calling multiple upstreams
  • Enabling load-balancing to multiple upstreams / expanding failover options
  • Controlling the cache backend / settings at runtime
  • Limiting simultaneous upstream requests
  • Backfilling / cache warming for likely future translations
  • Implementing a streaming RPC endpoint
  • Rate-limiting by different user groups
  • Front-end to manually perform translations
  • API for injecting company-specific vocabulary
  • Allowing for templated translations / placeholders
  • Adding support for more translation services, and cache backends

Technical Choices

As a language, Go was a good fit, because it provides both concurrency primitives and a well-tested HTTP implementation. An added benefit is the speedup over Python / Java. This is a simple, low-level abstraction, so the faster the better. For the external interface, I decided to implement a single HTTP endpoint, and used Content-Langauge and Accept-Language headers. There is a spec, so this is easier to support.

The caching backend is implemented as a simple Put / Get / Has interface, allowing for straightforward expansion to use external key-value stores. Translation services are called asynchronously - this enables future changes toward calling multiple upstream services concurrently, and selecting the first one to answer. Currently one service is called at a time. Failover is currently implemented by pushing a service to the end of the queue for requests to run on.

Upstream credentials are passed as environment variables. Currently, there are two variables processed:

  • GOOGLE_API_KEY: If specified, enable the Google Cloud Translation backend with given key.
  • ENABLE_MOCK: If specified, enables the mock backend. This is used for testing upstream failure handling.

Testing Strategy

  • The cache package is fully unit tested. It plays a part in every request and is critical to reducing upstream load.
  • handler.go has full coverage as well. It handles every request, making it a critical piece of code.
  • sanity_test.go contains tests running gofmt and govet. Code is more often read than written, so this was a no-brainer.

About The Author

My name is Leonhard Kuboschek. This code uses libraries written by many people. Thank you. All code contained within this repository was authored solely by me. I'm currently working full-time at a startup (50 hrs / week), and we have a launch coming up. Thus I've been working on this after hours, and on the weekends only. Additionally, this is one of my earlier projects utilizing Go as a language.

HTTP Interface

Authentication

This service authenticates all requests by checking the presence of a JSON Web Token (JWT). There is currently no further authorization strategy, all authenticated requests are allowed.

JWTs are checked with the following parameters:

  • Algorithm: HS256
  • Secret Key: Set using environment variable SECRET_KEY

At present, no claims are required on the tokens.

Request Format

The server accepts HTTP POST requests to port 8080, the path is /.

The headers required are:

  • Content-Language specifying the language that the request content is assumed to be in.
  • Accept-Language specifying the target language.
  • Authorization containing the string Bearer followed by a JSON Web Token (see authentication section)

The text to be translated is sent in the request body; The content type shall be text/plain.

Response Format

The response will contain the Content-Language header for the target language, as well as the translated text in the response body. In case of an error, standard HTTP status codes are used for signaling.

Sample Deployment

There is a sample deployment running at translate dot leo dot codes. It authenticates requests by JSON Web Token. If you would like to receive testing credentials, please send me an email (see homepage links below).

Sample request using cURL

curl -X POST \
'http://localhost:8080/' \
-H 'accept-language: en' \
-H 'content-language: de' \
-d 'Also wirklich!'

Docker Image

`docker run -p 8080:8080 -e GOOGLE_API_KEY=$GOOGLE_API_KEY -e SECRET_KEY=yoursecretkey kuboschek/translate-server`

Releases

No releases published

Packages

No packages published