Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle file not found errors on PDFSegmentation #6827

Closed
txau opened this issue May 27, 2024 · 1 comment
Closed

Handle file not found errors on PDFSegmentation #6827

txau opened this issue May 27, 2024 · 1 comment

Comments

@txau
Copy link
Collaborator

txau commented May 27, 2024

When for whatever cause a file is missing in the filesystem but present in the files database, PDFSegmentator fails without handling the error.

  • Properly handle the error and report the instance and file that is not found so we can track the problem
  • Avoid an end-less loop in PDFSegmentation so it doesn't try over and over again (maybe mark the segmentation as error?)
  • Make sure that the rest of the services can handle when the segmentation is not available (ie. IX, table of contents extractor, etc)

cc @gabriel-piles @fnocetti

2:38 PM: 2024-05-27T14:38:30.951Z [localhost] NoSuchKey: UnknownError at de_NoSuchKeyRes (/opt/uwazi/cores/core-1.169.0-rc1/node_modules/@aws-sdk/client-s3/dist-cjs/index.js:4809:21) at de_CommandError (/opt/uwazi/cores/core-1.169.0-rc1/node_modules/@aws-sdk/client-s3/dist-cjs/index.js:4747:19) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) at async /opt/uwazi/cores/core-1.169.0-rc1/node_modules/@smithy/middleware-serde/dist-cjs/index.js:35:20 at async /opt/uwazi/cores/core-1.169.0-rc1/node_modules/@aws-sdk/middleware-signing/dist-cjs/index.js:225:18 at async /opt/uwazi/cores/core-1.169.0-rc1/node_modules/@smithy/middleware-retry/dist-cjs/index.js:320:38 at async /opt/uwazi/cores/core-1.169.0-rc1/node_modules/@aws-sdk/middleware-flexible-checksums/dist-cjs/index.js:173:18 at async /opt/uwazi/cores/core-1.169.0-rc1/node_modules/@aws-sdk/middleware-sdk-s3/dist-cjs/index.js:97:20 at async /opt/uwazi/cores/core-1.169.0-rc1/node_modules/@aws-sdk/middleware-sdk-s3/dist-cjs/index.js:120:14 at async /opt/uwazi/cores/core-1.169.0-rc1/node_modules/@aws-sdk/middleware-logger/dist-cjs/index.js:33:22 at async S3Storage.get (/opt/uwazi/cores/core-1.169.0-rc1/app/api/files/S3Storage.js:37:22) at async readFromS3 (/opt/uwazi/cores/core-1.169.0-rc1/app/api/files/storage.js:58:20) at async Object.fileContents (/opt/uwazi/cores/core-1.169.0-rc1/app/api/files/storage.js:70:27) at async PDFSegmentation.segmentOnePdf (/opt/uwazi/cores/core-1.169.0-rc1/app/api/services/pdfsegmentation/PDFSegmentation.js:43:29) at async /opt/uwazi/cores/core-1.169.0-rc1/app/api/services/pdfsegmentation/PDFSegmentation.js:114:17 at async /opt/uwazi/cores/core-1.169.0-rc1/app/api/services/pdfsegmentation/PDFSegmentation.js:103:13 original error: { "name": "NoSuchKey", "$fault": "client", "$metadata": { "httpStatusCode": 404, "requestId": "tx00000e285947cd59c5bf9-0066549ae6-5957e6d-default", "attempts": 1, "totalRetryDelay": 0 }, "Code": "NoSuchKey", "BucketName": "uwazi-staging", "RequestId": "tx00000e285947cd59c5bf9-0066549ae6-5957e6d-default", "HostId": "5957e6d-default-default", "message": "UnknownError" }

@txau
Copy link
Collaborator Author

txau commented May 31, 2024

fixed

@txau txau closed this as completed May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants