I've enjoyed and used the great translation service from DeepL a lot recently but could not find a simple yet deeply integrated mobile app for it. Hence this project was born:
- provide a straight forward mobile client to the DeepL translation service
- use multiple text input methods of recent iOS SDKs (clipboard, OCR via camera, voice via microphone)
- serve translated text via various outputs (clipboard, text sharing, voice)
You need Xcode 11 and iOS 13 to build and run this app. Furthermore a paid DeepL Pro Account with API access is needed for the translation - it's worth it but I earn no money out of that promotion!
For quite some time, Apple is extending Foundation and UIKit with powerful siblings - like Vision for higher level image processing, Speech for voice recognition and the fancy ML- and AR-frameworks. With TransL8 I want to explore these new APIs alongside a deep system integration:
- system extensions
- OCR via VisionKit
- Speech Recognition
- Diffable Data Source
- Context Menus
- using SF Symbols - me likes IconFonts anyway
- Property Wrapper
- Drag & Drop support on iPads
- Catalyst = open
- x-url-mechanics = open
iPad and iOS support
The UI is bascially driven by functional needs (and I am no designer anyway), but let me explain some UI and UX considerations:
- text is the main content of this app, hence there are two large text views alongside smaller action items
- although comparing translations is sometimes valuable I do not consider it the main use case especially not on mobile, hence the two languages of a text (source and destination) are overlapping, featuring a "layered" approach in the UI
- this layered UI gives a chance to focus/defocus 1st level and 2nd level action icons as well
- sadly the global "translate action button" moved into the lower-left corner
- accessibility should work but untested (as much as RTL)
- dark mode support (due to system components only atm)
General client server observations
Thanks to Moya most REST APIs are simple to use nowadays. The only stumbling blocks for a mobile client are authentication and pricing for the service. So let's discuss pricing first:
this project is mostly for fun, but putting it into the store with my API key hardcoded would put my pockets under pressure. Having no idea about the success of the app on the long run, an upfront price would not work as well. So as a first step I "hand over the costs" to the user by letting them create a DeepL Pro account and use their API key. Maybe I experiment with consumables or a subscription in the future but that distracts me too much from the fun part...
sadly DeepL has no oauth, so the UX for the user is as ugly as manually copying the API key into the app from the website (sorry). There is a onboarding feature in TransL8 to login and help grabbing the API key but I wish it would not be needed.
OCR via VisionKit
With VisionKit you can take photos and OCR the text inside for free - no need to integrate Tesseract or other proprietary vendors SDKs anymore.
VisionKitprovides a powerful default
VNDocumentCameraViewControllerwhich you create, assign yourself as a delegate and present modally to the user
- the user can take pictures, selects (skewed) rectangular parts to be scanned and even supports multiple page documents
- providing a custom user flow or different OCR mechanics is possible but obviously takes way more effort to implement
- sadly the UI of this powerful component can hardly by customized nor its behaviour (and I'm missing the loupe as I miss it everywhere in iOS 13)
This feature is based on Apples sample code.
SFSpeechRecognizeris the top-level manager to handle authorization
SFSpeechRecognitionTaskhave to be connected for the voice data to be transcribed
AVAudioPCMBufferis probably the least discoverable part of that API flow for me ("happy sample code")
- thanks to SF Symbols implementing a basic voice recording screen was easy but I need to spend more time on the UI and understand the lower level APIs
With my linguistic university background I was enthusiastic and scared of the amount of work to fullfill realtime speech recognition - but technology has come a long way and this feature (even if you enable on-device-only) is impressive: real time speech-to-text for free!
Diffable Data Source
As the posterchild of the delegation pattern,
UITableViewDataSource is known to all iOS developers. With iOS 13 UIKit provides a fresh approach to it thanks to a standard implementation called
UITableViewDiffableDataSource. Of special note and solving a major failure point with custom data sources, multiple updates and deletes on the data are handled automatically:
UITableViewDiffableDataSourceis feeded by
NSDiffableDataSourceSnapshot, each contaning different data collections at different states of your app
- as these
NSDiffableDataSourceSnapshots leverage the
Hashableprotocol, they can identify each entry and calculate the transition from a previous state to another state -> app developers can mostly concentrate on the data (collection) only
- surprisingly, enabling to delete entries needs
UITableViewDiffableDataSourceto be subclassed (as
commit editingStylestill need to be present at runtime), I was hoping this venerable behaviour to be replaced by a modern (callback) mechanism
As iOS 13 provides more context menus (and favours them instead of gestures - I'll miss the wobbling app icons) I wanted to explore the driving API
UIContextMenuInteraction. The user can select the target language via context menu on the translation button:
- the delegate of a
UIContextMenuConfigurationwhich contains the list of menu entries
- each entry provides a text-icon-state pair alongside a callback
- menus can be nested, provide a preview and offer some fine tuning with some more delegates
- there is no HIG to indicate the source of a context menu (sharing the same problems like most sophisticated gestures), so I have outlined the translation button as this mimics/supports the animation once a context menu opens
Overall this is a clean and powerful API, it's simple to start with and works well for a lot of cases.
Being a big fan of property getters and setters for blackboxing the internals of property handling, I was intrigued by
@propertyWrappers. After a nice introduction around the mechanics and syntax, it helped to make the preference handling way more readable and compact and generic at the same time (
Codable for the rescue). As a result, TransL8 now has as property wrapper around
UserDefault and the
keychain by means of KeychainSwift - even with default values.
Drag and Drop
For iPad dragging and dropping text from other apps now switches to the input text view being topmost. The
UIDropInteractionDelegate is easy to use on this one - but apart from the smaller screen it's beyond my knowledge why there is no supported on iPhone.
Intrigued by this "one switch"? Me too so I've enabled it but the UX is very alien to macOS - as expected. Long journey ahead...
Integrating a feature into others app live-cycle and user flows can be achieved by system extension for quite some time now. Sadly they are sometimes limited beyond the ususal app sandbox, but I wanted to dig deeper, how far these extension could go, wether the API has evolved over the years and ultimately what value they bring to the user:
- Keyboard extension = nope
- Action extension = done, simplified translation interface
- Share extension = done, direct and fast flow
- Today Extension/Widget = nope
- Document Provider Extension = done, leveraging
- Siri intents (open)
Providing a novel text input in the context of a translator seems natural at first but does not easily apply to a "tranlation service" where there is a
source -> translate -> destination flow. But what about a simple "Translate" button for converting the "current text"?
Sadly the available API for keyboard extensions is severely limited and broken (as of iOS 13.2, look into the
feature/keyboard-extension branch): with the
UITextDocumentProxy you can access the "current text" in two ways:
by means of the before/after strings around the cursor, but that is limited to the current paragraph only. Furthermore changing the text with that approach leads to multiple weird calls to let UIKit update the text view in between.
by means of the selected text only. That is probably the right user interaction and simplifies the action calls, but has a severe error: once the selections spans over three or more paragraphs, the inner paragraphs are dropped from the selection text and cannot be retrieved - the selected text contains the start cursor part from the first paragraph and the end cursor part from the last paragraph (most likely a bug).
Sad but true, no Keyboard extension...
As per definition, Actions should transform/convert the given content - which makes perfect sense for a translator. So a meaningful user flow is to translate a given text and serve the result back - which is what the
TransL8 in Action extension does.
Surprisingly sending back the result is ignored by most originating apps although the iOS SDK has a callback for this (
UIActivityViewController). As a first tweak TransL8 will copy the translation to the system clipboard automatically and second it extends its internal clipboard feature to a history of translations.
This extension was fun and is probably the most meaningful extension for a translator.
Although sharing is more considered of a one-way flow to send content to other services and hence the flow and intention is very different compared to an Action Extension, both extension share the same API! So one could reuse exactly the same UI and UX - but that would be a poor design choice.
Given the default
SLComposeServiceViewController design and sharing being considered a fast, one-directional path, TransL8 will use the Share Extension as a means to be the fastest translation roundtrip: it first translates the input text, second it stores this pair to its internal history and lastly copies it to the clipboard.
Took quite some consideration to streamline the flow this far but I think it's worth it.
Given the API limitations I can hardly imagine any useful feature set for a TransL8 widget. It could serve as a fast entry point into the app (was originally forbidden but is commonly used nowadays), offer single line translations (not sure about wether it can accept keyboard input) or access to the history - not convincing. Then I though about camera or speech translation but even Shazam is deep linking into the main app to do so - API limitations fight back hard! I guess I'll drop it for now...
File Provider Extension
There could be some value in offering translations as texts to other apps - up to 100 translations are stored in the app history already, why not open them up for easier text import. This extension creates ad-hoc files if the
UIDocumentBrowser requests it. At the moment any changes to these files are ignored. The content is provided by means of a formatted text with the source (language and text) first and destination (lang and text) last.
Implementation is straight forward once you understand the connected parts. The simplicity mostly derived from using standard
FileProvider components, a non-network-based approach, supporting simple read operation only and good sample code!
As a developer I could go further:
adding context menu actions to the given file icons
a custom Document Picker View Controller which in the context of TransL8 could very much use the given history view controller
put these text files into iCloud via
After exporting the translations, there should be an import into TransL8 as well: the
UIDocumentPickerViewController is dead simple to use and imports the selected file by means of a delegate call - which then imports the data back into the source text view. This currently works on plain text files only...
TransL8 is available under the MIT license