You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Inside a pdf viewer (acrobat reader, or pdf.js in the browser), you cannot search for a phrase of multiple words. The phrase matches nothing even when it is in the document.
Bug report / Feature request
Expected Behavior
When the document contains, for example, "Breakfast menu", when you click the search icon (magnifying glass) and enter text "breakfast menu", it should match the text and find it.
Current Behavior
It olny matches one word. For example, it matches "breakfast", or it matches "menu". If you try to search for two words, it fails to find a match, even when the two words are clearly together on the same line, in the document!
Possible Solution
Possibly take a look at the parameters or settings for tesseract-ocr and see if it can be made to connect words which are on the same line, into the same continuous text line.
Steps to Reproduce (for bugs)
Upload a scan of a page of text in pdf format.
Run the ocr on it.
Open the _OCR.pdf version of the pdf file which contains the recognized text.
Click the magnifying glass, enter text for two adjacent words on the same line. Search fails to find the two words. It finds only one word at a time.
Context
Searching for only one word at a time is awkward and time consuming.
Your Environment
OCR version used: Latest
Browser Name and version: Latest firefox.
Operating System and version (desktop or mobile): Windows 10, Linux Debian 8.
ownCloud/nextcloud version: (see ownCloud admin page or version.php) Latest NC.
PHP version 7.0
Database version 5.6 mysql mariadb
Are you using encryption: yes/no No.
Log File Content (nextcloud/owncloud.log of the "data"-directory)
The text was updated successfully, but these errors were encountered:
Actually I didn't recognize this before. But the problem is: ocrmypdf is working like this. I can't change this behavior. Maybe you can head to the ormypdf github issues and ask, if there is any other solution for this. But I assume it won't be possible, as long as ocrmypdf is not putting the text elements together in one text-box in the background of the picture in the pdf.
As this isn't a bug and ocrmypdf behaves like this, I will close this issue.
Inside a pdf viewer (acrobat reader, or pdf.js in the browser), you cannot search for a phrase of multiple words. The phrase matches nothing even when it is in the document.
Bug report / Feature request
Expected Behavior
When the document contains, for example, "Breakfast menu", when you click the search icon (magnifying glass) and enter text "breakfast menu", it should match the text and find it.
Current Behavior
It olny matches one word. For example, it matches "breakfast", or it matches "menu". If you try to search for two words, it fails to find a match, even when the two words are clearly together on the same line, in the document!
Possible Solution
Possibly take a look at the parameters or settings for
tesseract-ocr
and see if it can be made to connect words which are on the same line, into the same continuous text line.Steps to Reproduce (for bugs)
_OCR.pdf
version of the pdf file which contains the recognized text.Context
Searching for only one word at a time is awkward and time consuming.
Your Environment
Log File Content (nextcloud/owncloud.log of the "data"-directory)
The text was updated successfully, but these errors were encountered: