Skip to content

Spoken to Signed

Amit Moryossef edited this page May 8, 2024 · 27 revisions
flowchart TD
    A0[Spoken Language Audio] --> A1(Spoken Language Text)
    A1[Spoken Language Text] --> B[<a href='https://github.com/sign/translate/issues/10'>Language Identification<a/>]
    A1 --> C(<a href='https://github.com/sign/translate/tree/master/functions/src/text-normalization'>Normalized Text</a>)
    B --> C
    C & B --> Q(<a href='https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/Segmenter'>Sentence Splitter</a>)
    Q & B --> D(<a href='https://github.com/sign-language-processing/signbank-plus'>SignWriting</a>)    
    C -.-> M(<a href='https://github.com/ZurichNLP/spoken-to-signed-translation' title='We would like to move away from glosses'>Glosses</a>)
    M -.-> E
    D --> E(<a href='https://github.com/sign-language-processing/signwriting-animation'>Pose Sequence</a>)
    D -.-> I(<a href='https://github.com/sign-language-processing/signwriting-illustration'>Illustration</a>)
    N --> H(<a href='https://github.com/sign/translate/issues/68'>3D Avatar<a/>)
    N --> G(<a href='https://github.com/sign-language-processing/pose'>Skeleton Viewer</a>)
    N --> F(<a href='https://github.com/sign-language-processing/pose-to-video' title='Help wanted!'>Human GAN</a>)
    H & G & F --> J(Video)
    J --> K(Share Translation)
    D -.-> L(<a href='https://github.com/sign-language-processing/signwriting-description' title='Poor performance. Help wanted!'>Description</a>)
    O --> N(<a href='https://github.com/sign-language-processing/fluent-pose-synthesis' title='Currently skipped. Help Wanted!'>Fluent Pose Sequence</a>)
    E --> O(<a href='https://github.com/sign-language-processing/pose-anonymization'>Pose Appearance Transfer</a>)

linkStyle default stroke:green;
linkStyle 3,5,7 stroke:lightgreen;
linkStyle 10,11,12,15 stroke:red;
linkStyle 6,8,9,14,19,20 stroke:orange;
  • Spoken Language Audio → Spoken language text:

    • Full support of local speech-to-text (and text-to-speech), in all locally supported languages. (no Firefox support)
  • Spoken Language Text → Language Identification:

    • Supports manual user language selection (107 languages)
    • Automatic identification using Google's cld3 supporting 107 languages #10
    • Automatic identification using Google's MediaPipe Solutions #116
    • Uses the initial browser language, and remembers language pair preference (#52)
  • Spoken Language Text → Normalized Text:

    • LLM Server side multilingual text normalization model
  • Spoken Language Text → SignWriting:

    • Server side multilingual machine translation model (low~ quality)
    • Client/Server side translation implementation with Bergamot (#46)
    • Serve translation models (#57)
  • SignWriting → Pose Sequence:

    • Server side implementation of sign-stitching (low~ quality), using OpenPose poses, reliant on spoken language text)
    • New server side implementation, animating directly from SignWriting(/HamNoSys) sequences (work by Rotem, #15)
    • Offline client-side inference support for the animation model
  • Pose Sequence →

    • Skeleton Viewer: Barebones viewer using our in house Pose Viewer (Fast, low power, and helpful for debugging)
    • Human GAN: Using a client side machine learning model to skin the pose like a human. Relying on a (heavy) model to generate low-resolution images (256x256), and a (fast) model to upscale the images (768x768). (#25, #58)
    • 3D Avatar: Animates a 3D human-looking avatar using machine learning (#16), including AR support.
    • Additional Features:
      • Pose sequences are transformed into videos, saving device power and memory (#45)
      • Once videos is ready, they support Copy, Download, and Share operations
  • Internationalization:

    • Supports 104 languages, and both LTR and RTL layouts.
    • Uses the user's browser/phone language, and different languages via a URL parameter.
Clone this wiki locally