Skip to content
@pashtodao

pashtodao

Motivation

Around 60 million people over the world speak the Pashto language. However, with such prominent speakers, it still needs to be considered a low-resource language because of the low availability of digital content online.

Goal

Our goal is to create a community for advancing the Pashto language adoption in digital products (Automatic Speech Recognition, transcription, digital dictionaries, grammar correction, text-to-speech systems etc.)

Process

We aim to create open source (publicly available) projects with the help of volunteers. We will create a timeline to achieve specific goals by each quarter of the year.

Challenges

We must tackle some challenges to uplift Pashto from a low resource to a web-rich language. However, one of the biggest challenges for content creation in the Pashot language is typing grammatically correct sentences using the available keyboards.

Our first goal is to create an automatic speech recognition system in Pashto that will transcribe spoken words into written Pashto. We need training data in the Pashto language to create such a system. Usually, this training data is created through another open-source project called Mozilla Common Voice. Unfortunately, Pashto is one of those few languages with no data in the Common Voice project.

Our top challenges, in order of priority, are as follows:

  1. Complete translation of Common Voice portal to Pashto
  2. Create sentences in the Pashto language for Common Voice
  3. Collect utterances against sentences collected for Common Voice
  4. Train/fine-tune the ASR AI model
  5. Devise a unified approach towards Pashto langue corpus creation

Contribution

Coming soon...

Popular repositories Loading

  1. .github .github Public

Repositories

Showing 1 of 1 repositories

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…