Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permission error when running from browser #201

Closed
dev-code-davis opened this issue Nov 21, 2017 · 5 comments
Closed

Permission error when running from browser #201

dev-code-davis opened this issue Nov 21, 2017 · 5 comments

Comments

@dev-code-davis
Copy link

dev-code-davis commented Nov 21, 2017

Hi, basically I have created a script that launches Ocrmypdf.

$c = ('ocrmypdf -l lav --rotate-pages --pdf-renderer tesseract --output-type pdf --sidecar output.txt input.pdf output.pdf');');
exec($c, $output);
print_r($output)

When I try to call the PHP script from the server itself: php ocr.php I get the intended result.

However, when I try to open it and run from browser, I got the following permission error:

Traceback (most recent call last):
  File "/usr/local/bin/ocrmypdf", line 7, in <module>
    from ocrmypdf.__main__ import run_pipeline
  File "/usr/lib/python3.6/site-packages/ocrmypdf/__init__.py", line 3, in <module>
    import pkg_resources
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3019, in <module>
    @_call_aside
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3003, in _call_aside
    f(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3032, in _initialize_master_working_set
    working_set = WorkingSet._build_master()
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 646, in _build_master
    ws = cls()
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 639, in __init__
    self.add_entry(entry)
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 695, in add_entry
    for dist in find_distributions(entry, True):
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2019, in find_on_path
    path_item, entry, metadata, precedence=DEVELOP_DIST
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2432, in from_location
    py_version=py_version, platform=platform, **kw
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2772, in _reload_version
    md_version = _version_from_file(self._get_metadata(self.PKG_INFO))
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2397, in _version_from_file
    line = next(iter(version_lines), '')
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2565, in _get_metadata
    for line in self.get_metadata_lines(name):
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 1467, in get_metadata_lines
    return yield_lines(self.get_metadata(name))
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 1463, in get_metadata
    value = self._get(self._fn(self.egg_info, name))
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 1572, in _get
    with open(path, 'rb') as stream:
PermissionError: [Errno 13] Permission denied: '/usr/lib/python3.6/site-packages/ruffus-2.6.3-py3.6.egg-info/PKG-INFO'

/usr/local/bin/ocrmypdf

cat /usr/local/bin/ocrmypdf

#!/usr/bin/python3.6

import re
import sys

from ocrmypdf.__main__ import run_pipeline

if __name__ == '__main__':
    sys.argv[0] = re.sub(r'(-script\.pyw?|\.exe)?$', '', sys.argv[0])
    sys.exit(run_pipeline())`

OS: Centos 7.

I'm aware that this may not be strictly OCRMYPDF related issue. But it is quite strange that I continue to get this error even when (for testing purposed) did CHMOD/CHOWN whole Python directory to more open permissions.
My initial impression is that that some of those packages require higher user access?

@dev-code-davis
Copy link
Author

Ok, after 2 days of relentless search, a team's devop suggested to call:
setenforce 0
which seems to have worked... Some kind of centos/redhat security feature.

@jbarlow83
Copy link
Collaborator

jbarlow83 commented Nov 21, 2017 via email

@dev-code-davis
Copy link
Author

dev-code-davis commented Nov 21, 2017

@jbarlow83 It would be used in intranet where just a few selected editors will be able upload those scanned PDFs.
What alternative/approach would you suggest? As to the resource usage, we could add additional server just for OCR task. Basically, we have Drupal site which uses Solr to index content. We have tackled the task of getting PDF metadata, but scanned documents still is an issue (they need to be OCRed and indexed for search purposes).
I have tested a lot of OCR libraries, and to be honest - only OCRMYPDF seemed like a solid, capable solution.

@jbarlow83
Copy link
Collaborator

jbarlow83 commented Nov 21, 2017 via email

@jbarlow83
Copy link
Collaborator

I'll close the issue now since the main concern seemed to be a platform configuration issue. If you have further related questions feel free to reopen it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants