-
Notifications
You must be signed in to change notification settings - Fork 0
Option Analysis: Current Language Translators & Video Conferencing Software
kristiperreault edited this page Aug 25, 2019
·
4 revisions
- SaaS offering for real-time language translation as part of Skype
- Conversion for 10 languages
- Text translator for over 60 languages
- Utilizes machine learning
- Available on Windows 7 & Up, OSX, and in desktop, mobile, wearables
- Automatic speech recognition -> speech correction -> Microsoft translate -> text to speech
- SMT vs NMT to translate, uses a model that has been trained through uploading millions of sentences, words, speech
- https://www.skype.com/en/features/skype-translator/
- https://blogs.skype.com/news/2014/12/15/skype-translator-how-it-works/
- https://www.microsoft.com/en-us/translator/business/machine-translation/
- Google Assistant bilingual capabilities only available for speaking with the Google Assistant and only in the US
- Google Pixel Buds are for translating real time in person, not a video conferencing app
- AutoML translation & translation API - train a model with phrases in desired languages, evaluate model and repeat
- Google Translate API: https://cloud.google.com/translate/
- How Pixel Buds Work: https://techxplore.com/news/2017-11-google-pixel-buds-earphones-languages.html
- Supports text translation, document translation in beta
- Roughly 25 languages supported
- Currently no bilingual video conferencing support
- 2 cents per character after the first 25,0000; 10 cents for custom models
- Documentation: https://www.ibm.com/cloud/blog/announcements/document-translation-made-easy-with-watson-language-translator
- Translate works with unstructured text, roughly 20 languages supported (https://docs.aws.amazon.com/translate/latest/dg/how-it-works.html)
- Also uses neural networks and a Decoder to decode source text, and encoder to translate to target text (both one word at a time)
- Amazon Comprehend does the Automated language detection using neural networks. It will recognize key phrases, words, language, sentiment, and syntax. Uses deep learning, async and sync processing, integrates with other AWS services, and supports customization and clustering (https://docs.aws.amazon.com/comprehend/latest/dg/what-is.html)
- Polly converts text to "life-like" speech. Multiple voice options, low latency, pay for what you translate, logging available. Only available in 3 regions, throttle limits (https://docs.aws.amazon.com/polly/latest/dg/what-is.html)
- Pricing model is simple and "pay for what you need", but multiple services can add up $$
- Most popular video conferencing applications in industry (https://www.turbinehq.com/blog/5-video-conferencing-apps, https://zapier.com/blog/best-video-conferencing-apps/)
- Currently, non of these platforms support bilingual video conferencing
- The basic flow of real time bilingual video conferencing: Condition input, language identification, automatic speech recognition, speech to text, text "cleanup", natural language processing, text to speech
- Basic system for a bilingual video conferencing application: Frontend display -> video API/Lambda -> speech to text (ASR) -> translation api (Neural network) -> text to speech -> output to service -> output to user