Skip to content

A data engineering pipeline that allows recording millions of Amharic and Swahili speakers reading digital texts on app and web platforms.

Notifications You must be signed in to change notification settings

luelhagos/STT-data-collection

 
 

Repository files navigation

STT-data-collection

The purpose of this challenge is to build a data engineering pipeline that allows recording millions of Amharic speakers reading digital texts in-app and web platforms.

Table of content

Introduction

There are many text corpuses for Amharic and Swahili. Our client 10 academy wants to gather vast amount of quality audio data from diffrent applications by displaying text corpus and record users reading the displayed text. And build robust, large scale, fault tolerant, highly available Kafka cluster that can be used to post a sentence and receive an audio file.

Installation

  • kafka installation
  • airflow installation
  • spark installation

Folders

  • data :
  • notebooks :
  • scripts :
  • tests :

Technolologies

Contributers

About

A data engineering pipeline that allows recording millions of Amharic and Swahili speakers reading digital texts on app and web platforms.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 84.5%
  • Python 9.4%
  • JavaScript 3.8%
  • CSS 1.4%
  • HTML 0.9%