OCR support in FairScan
OCR is a commonly requested features for FairScan, at least by advanced users.
OCR can be quite valuable, but it is also a complex feature to implement well, especially on-device, without compromising UX, performance or reliability.
This issue is mainly about doing OCR right, not just adding an OCR checkbox.
This is not a short-term promise.
What FairScan aims to provide (ideal outcome)
The goal is to generate searchable PDFs with an invisible text layer:
- The scanned page remains a raster image.
- OCR text is embedded invisibly on top of the image.
- Users can:
- select and copy text,
- search text inside the PDF,
- see search highlights at the correct locations.
In the UX, OCR should feel transparent to the user when scanning a document:
- No dedicated OCR screen.
- OCR is triggered automatically with no user interaction
- OCR could be mentioned in the export screen for example to confirm that detected text is added to the document, e.g. "Text: 42k characters"
OCR will probably be opt-in since many users don't know what it is and might be confused by a longer processing time. It would be activated in the application settings.
What is explicitly out of scope (at least initially)
To keep UX simple and reliable, the following are out of scope for a first OCR version:
- Real-time OCR during capture.
- Handwritten text recognition.
- Perfect character-level placement.
- Automatic language detection.
Non-goals:
- manual OCR tuning per page
- bounding boxes shown to the user
- editable text layer
Challenges with OCR on Android
Key challenges include:
- OCR engines are very sensitive to image quality.
- Correctly mapping OCR bounding boxes from image space to PDF coordinates.
- Preserving baselines, font sizes and alignment well enough for good text selection.
- Performance on mobile devices.
Poor placement or slow execution may lead to a worse user experience than having no OCR at all.
Dependency on image post-processing
OCR results will depend on FairScan's image processing including:
- document detection and cropping,
- contrast / brightness adjustment,
- deskewing,
- sharpening?
- denoising?
- OCR-friendly resolution (300 dpi)
As of today, FairScan has no deskewing, sharpening nor denoising.
Automatic contrast and brightness adjustment should be improved to be more reliable (see #80).
Management of captured images should be revamped to keep images in the original resolution: this could be done as part of #70
Improving post-processing of captured images benefits all users, even without OCR.
OCR engines
Requirements for an OCR engine:
- open source
- offline
- can run on Android without a massive integration effort
- can run in a reasonable amount of time on a mobile device, e.g. < 1 second per page on recent devices
- doesn't have a huge impact on the size of the APK
Known options:
Managing languages
OCR engines like Tesseract have separate models per language. Those models can be heavy: like several megabytes per language for Tesseract "fast" models.
A possible way to manage languages :
- APKs don't include any model
- To activate OCR in the app settings, the user has to trigger the download of one or more language models
- When a document is scanned, all installed language models are used (that might be refined later)
FairScan would then require the "internet" permission: the privacy policy should then be updated.
OCR support in FairScan
OCR is a commonly requested features for FairScan, at least by advanced users.
OCR can be quite valuable, but it is also a complex feature to implement well, especially on-device, without compromising UX, performance or reliability.
This issue is mainly about doing OCR right, not just adding an OCR checkbox.
This is not a short-term promise.
What FairScan aims to provide (ideal outcome)
The goal is to generate searchable PDFs with an invisible text layer:
In the UX, OCR should feel transparent to the user when scanning a document:
OCR will probably be opt-in since many users don't know what it is and might be confused by a longer processing time. It would be activated in the application settings.
What is explicitly out of scope (at least initially)
To keep UX simple and reliable, the following are out of scope for a first OCR version:
Non-goals:
Challenges with OCR on Android
Key challenges include:
Poor placement or slow execution may lead to a worse user experience than having no OCR at all.
Dependency on image post-processing
OCR results will depend on FairScan's image processing including:
As of today, FairScan has no deskewing, sharpening nor denoising.
Automatic contrast and brightness adjustment should be improved to be more reliable (see #80).
Management of captured images should be revamped to keep images in the original resolution: this could be done as part of #70
Improving post-processing of captured images benefits all users, even without OCR.
OCR engines
Requirements for an OCR engine:
Known options:
Managing languages
OCR engines like Tesseract have separate models per language. Those models can be heavy: like several megabytes per language for Tesseract "fast" models.
A possible way to manage languages :
FairScan would then require the "internet" permission: the privacy policy should then be updated.