Skip to content

A streaming version of the Unbabel's BEC using GCP Pub/Sub and Apache Beam.

Notifications You must be signed in to change notification settings

pedrodeoliveira/unbabel-bec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Backend Engineering Challenge 2.0

pipeline status

This project contains a solution to the Unbabel's Backend Engineering Challenge reformulated to a more realistic context, in which the translation events arrive in real time.

Idea

The goal of this project is to solve the original challenge, but assuming the translation events occur in real time. In this reformulated challenge, we also assume that we care about the metrics by client.

For solving this problem the Apache Beam framework was chosen given that it is able to solve this problem in both a batch and streaming ways. We can also leverage on the GCP's Dataflow runner to run a fully scalable and managed pipeline.

In summary, we have in each folder:

  • batch the solution to the batch problem, where we process an input json file and write to an output json file.
  • streaming the solution to streaming problem, where we have: multiple publishers (publisher.py), simulating the clients translation events (Unbabel's translation API); the streaming pipeline (streaming_pipeline.py) performing the pipeline processing; and a subscriber (subscriber.py) that reads the output from the pipeline and prints the information.

Note: For the messaging in the streaming problem, the Cloud Pub/Sub is used.

More details will be provided here in the following days.

To Do List

  • Formulate problem to solve, choose technologies to use and define architecture.
  • Solution to batch pipeline problem (batch_pipeline.py).
  • Solution to streaming pipeline problem (streaming_pipeline.py).
  • Containerize project code (psoliveira/unbabel-batch and psoliveira/unbabel-streaming).
  • Setup CI using GitLab CI (Pipelines).
  • Write README.md.
  • Add Tests.
  • Add test stage to CI pipeline.
  • Added .yaml files (streaming/k8s) for deploying architecture in a Kubernetes cluster.
  • Finalize README.md.

About

A streaming version of the Unbabel's BEC using GCP Pub/Sub and Apache Beam.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published