Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Foreign key constraint violation (attachment_source_file_id_fkey) #530

Closed
adam-stanek opened this issue Jan 2, 2021 · 10 comments · Fixed by #532
Closed

Foreign key constraint violation (attachment_source_file_id_fkey) #530

adam-stanek opened this issue Jan 2, 2021 · 10 comments · Fixed by #532
Labels
bug Something isn't working or in unexpected ways

Comments

@adam-stanek
Copy link

Hello guys,

I wanted to try out docspell and got stuck unable to process any files (tried several).

Here is a snippet from the log:

docspell-joex | [ioapp-compute-7] ERROR d.j.s.LogSink - >>> 2021-01-02T14:44:27.209885Z Error 9Yf4jffqr.../gordon/process-item/High: Job 9Yf4jffqr.../gordon/process-item/High execution failed. Retrying later. 
docspell-joex | org.postgresql.util.PSQLException: ERROR: update or delete on table "filemeta" violates foreign key constraint "attachment_source_file_id_fkey" on table "attachment_source"
docspell-joex |   Detail: Key (id)=(FdC2nXDErBW-kyk8NQteHFo-6DduLgNxq5y-ruCZnDRdCpy) is still referenced from table "attachment_source".
docspell-joex |         at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2553)
docspell-joex |         at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2285)
docspell-joex |         at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:323)
docspell-joex |         at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:473)
docspell-joex |         at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:393)
docspell-joex |         at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:164)
docspell-joex |         at org.postgresql.jdbc.PgPreparedStatement.executeUpdate(PgPreparedStatement.java:130)
docspell-joex |         at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeUpdate(ProxyPreparedStatement.java:61)
docspell-joex |         at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeUpdate(HikariProxyPreparedStatement.java)
docspell-joex |         at doobie.free.KleisliInterpreter$PreparedStatementInterpreter.$anonfun$executeUpdate$5(kleisliinterpreter.scala:958)
docspell-joex |         at doobie.free.KleisliInterpreter$PreparedStatementInterpreter.$anonfun$executeUpdate$5$adapted(kleisliinterpreter.scala:958)
docspell-joex |         at doobie.free.KleisliInterpreter.$anonfun$primitive$2(kleisliinterpreter.scala:112)
docspell-joex |         at evalOn @ doobie.util.transactor$Transactor$fromDataSource$FromDataSourceUnapplied.$anonfun$apply$13(transactor.scala:280)
docspell-joex |         at $anonfun$tailRecM$1 @ doobie.util.transactor$Transactor$$anon$4.$anonfun$apply$4(transactor.scala:163)
docspell-joex |         at $anonfun$tailRecM$1 @ doobie.free.KleisliInterpreter$ConnectionInterpreter.$anonfun$bracketCase$28(kleisliinterpreter.scala:753)
docspell-joex |         at $anonfun$tailRecM$1 @ doobie.free.KleisliInterpreter$ConnectionInterpreter.$anonfun$bracketCase$28(kleisliinterpreter.scala:753)
docspell-joex |         at $anonfun$tailRecM$1 @ doobie.free.KleisliInterpreter$ConnectionInterpreter.$anonfun$bracketCase$28(kleisliinterpreter.scala:753)
docspell-joex |         at $anonfun$tailRecM$1 @ doobie.util.transactor$Transactor$$anon$4.$anonfun$apply$4(transactor.scala:163)
docspell-joex |         at $anonfun$tailRecM$1 @ doobie.free.KleisliInterpreter$ConnectionInterpreter.$anonfun$bracketCase$28(kleisliinterpreter.scala:753)
docspell-joex |         at $anonfun$tailRecM$1 @ doobie.free.KleisliInterpreter$ConnectionInterpreter.$anonfun$bracketCase$28(kleisliinterpreter.scala:753)
docspell-joex |         at $anonfun$tailRecM$1 @ doobie.free.KleisliInterpreter$ConnectionInterpreter.$anonfun$bracketCase$28(kleisliinterpreter.scala:753)
docspell-joex |         at $anonfun$tailRecM$1 @ doobie.util.transactor$Transactor$$anon$4.$anonfun$apply$4(transactor.scala:163)
docspell-joex |         at $anonfun$tailRecM$1 @ doobie.free.KleisliInterpreter$ConnectionInterpreter.$anonfun$bracketCase$28(kleisliinterpreter.scala:753)
docspell-joex |         at $anonfun$tailRecM$1 @ doobie.free.KleisliInterpreter$ConnectionInterpreter.$anonfun$bracketCase$28(kleisliinterpreter.scala:753)
docspell-joex |         at $anonfun$tailRecM$1 @ doobie.free.KleisliInterpreter$ConnectionInterpreter.$anonfun$bracketCase$28(kleisliinterpreter.scala:753)
docspell-joex | [ioapp-compute-0] DEBUG d.j.s.SchedulerImpl - Job 9Yf4jffqr.../gordon/process-item/High done Stuck. Releasing resources. 

I have used docker-compose from the repo with unmodified configuration. The version is 0.17.1 (joex-LATEST / 3e4973159aff).

@eikek
Copy link
Owner

eikek commented Jan 2, 2021

Hi and thanks for reporting! This is bad… can you give more details how you triggered this error? Can you process new files, or is nothing working right now?

@eikek
Copy link
Owner

eikek commented Jan 2, 2021

Could you give some more log lines around the error, too? What type of file was it (zip, eml, pdf)?

Sorry for all the questions :-) trying to reproduce it.

@eikek eikek added this to the Docspell 0.18.0 milestone Jan 2, 2021
@eikek eikek added the bug Something isn't working or in unexpected ways label Jan 2, 2021
@adam-stanek
Copy link
Author

Nothing is working right now. This is a log from different file, but with the same result. I tried it on some JPEGs and some single / multipage PDFs, some scanned, some with proper text. The result seems to be the same for me. It seems like the application is recognizing files as a duplicate of something, but it is completely fresh install and I double checked that I haven't tried to import the same file twice.

2021-01-02T14:42:15: Checking for duplicate files
2021-01-02T14:42:15: Creating new item with 1 attachment(s)
2021-01-02T14:42:15: Creating item finished in 40 ms
2021-01-02T14:42:15: Not an archive: application/pdf
2021-01-02T14:42:15: Converting file Some(tulesicka-vyuctovani (1).pdf) (application/pdf) into a PDF
2021-01-02T14:42:15: Storing input to file /tmp/docspell-convert/docspell-ocrmypdf8073473742302975516/infile for running ocrmypdf
2021-01-02T14:42:15: Running external command: ocrmypdf -l deu --skip-text --deskew -j 1 /tmp/docspell-convert/docspell-ocrmypdf8073473742302975516/infile /tmp/docspell-convert/docspell-ocrmypdf8073473742302975516/out.pdf
2021-01-02T14:44:16: Command `ocrmypdf -l deu --skip-text --deskew -j 1 /tmp/docspell-convert/docspell-ocrmypdf8073473742302975516/infile /tmp/docspell-convert/docspell-ocrmypdf8073473742302975516/out.pdf` finished: 0
2021-01-02T14:44:16: ocrmypdf stdout:
2021-01-02T14:44:16: ocrmypdf stderr: WARNING - 9: [tesseract] lots of diacritics - possibly poor OCR INFO - Optimize ratio: 1.00 savings: -0.0% INFO - Image optimization did not improve the file - discarded INFO - Output file is a PDF/A-2B (as expected)
2021-01-02T14:44:16: Conversion to pdf successful. Saving file.
2021-01-02T14:44:17: Closing process: `ocrmypdf -l deu --skip-text --deskew -j 1 /tmp/docspell-convert/docspell-ocrmypdf8073473742302975516/infile /tmp/docspell-convert/docspell-ocrmypdf8073473742302975516/out.pdf`
2021-01-02T14:44:17: Starting text extraction for 1 files
2021-01-02T14:44:17: Extracting text for attachment tulesicka-vyuctovani (1).converted
2021-01-02T14:44:17: Trying to strip text from pdf using pdfbox.
2021-01-02T14:44:17: Extracting text for attachment tulesicka-vyuctovani (1).converted finished in 449 ms
2021-01-02T14:44:17: Storing extracted texts …
2021-01-02T14:44:17: Extracted text stored.
2021-01-02T14:44:17: Updating SOLR index
2021-01-02T14:44:17: Text extraction finished in 535 ms
2021-01-02T14:44:17: Creating preview images for 1 files…
2021-01-02T14:44:18: Preview generated, saving to database…
2021-01-02T14:44:18: Retrieving page count for 1 files…
2021-01-02T14:44:18: Found number of pages: 9
2021-01-02T14:44:18: Update attachment AGPQqzpGRHd-HQAwLT3cpuH-Qak6pZyVYVs-ySESK4ivSHL with page count Some(9)
2021-01-02T14:44:18: Stored page count (1).
2021-01-02T14:44:18: Starting text analysis
2021-01-02T14:44:18: The text to analyse is larger than limit (22018 > 10000). Analysing only first 10000 characters.
2021-01-02T14:44:27: Checking for duplicate files
2021-01-02T14:44:27: Deleting duplicate file Some(tulesicka-vyuctovani (1).pdf)!
2021-01-02T14:44:27: Job 9Yf4jffqr.../gordon/process-item/High execution failed. Retrying later.: ERROR: update or delete on table "filemeta" violates foreign key constraint "attachment_source_file_id_fkey" on table "attachment_source" Detail: Key (id)=(FdC2nXDErBW-kyk8NQteHFo-6DduLgNxq5y-ruCZnDRdCpy) is still referenced from table "attachment_source".
2021-01-02T15:14:26: Found 1 existing item with these files.
2021-01-02T15:14:26: Found 1 attachments. Use only those from task args: Set(Ident(FdC2nXDErBW-kyk8NQteHFo-6DduLgNxq5y-ruCZnDRdCpy))
2021-01-02T15:14:26: Not an archive: application/pdf
2021-01-02T15:14:26: Conversion to pdf already done for attachment Some(tulesicka-vyuctovani (1).converted.pdf).
2021-01-02T15:14:26: Starting text extraction for 1 files
2021-01-02T15:14:26: TextExtraction skipped, since text is already available.
2021-01-02T15:14:26: Storing extracted texts …
2021-01-02T15:14:26: Extracted text stored.
2021-01-02T15:14:26: Updating SOLR index
2021-01-02T15:14:26: Text extraction finished in 96 ms
2021-01-02T15:14:26: Creating preview images for 1 files…
2021-01-02T15:14:27: Preview generated, saving to database…
2021-01-02T15:14:27: Retrieving page count for 1 files…
2021-01-02T15:14:27: Found number of pages: 9
2021-01-02T15:14:27: Update attachment AGPQqzpGRHd-HQAwLT3cpuH-Qak6pZyVYVs-ySESK4ivSHL with page count Some(9)
2021-01-02T15:14:27: Stored page count (1).
2021-01-02T15:14:27: Starting text analysis
2021-01-02T15:14:27: The text to analyse is larger than limit (22018 > 10000). Analysing only first 10000 characters.
2021-01-02T15:14:36: Found 1 existing item with these files.
2021-01-02T15:14:36: Found 1 attachments. Use only those from task args: Set(Ident(FdC2nXDErBW-kyk8NQteHFo-6DduLgNxq5y-ruCZnDRdCpy))
2021-01-02T15:14:36: Not an archive: application/pdf
2021-01-02T15:14:36: Conversion to pdf already done for attachment Some(tulesicka-vyuctovani (1).converted.pdf).
2021-01-02T15:14:36: Starting text extraction for 1 files
2021-01-02T15:14:36: TextExtraction skipped, since text is already available.
2021-01-02T15:14:36: Storing extracted texts …
2021-01-02T15:14:36: Extracted text stored.
2021-01-02T15:14:36: Updating SOLR index
2021-01-02T15:14:36: Text extraction finished in 77 ms
2021-01-02T15:14:36: Creating preview images for 1 files…
2021-01-02T15:14:37: Preview generated, saving to database…
2021-01-02T15:14:37: Retrieving page count for 1 files…
2021-01-02T15:14:37: Found number of pages: 9
2021-01-02T15:14:37: Update attachment AGPQqzpGRHd-HQAwLT3cpuH-Qak6pZyVYVs-ySESK4ivSHL with page count Some(9)
2021-01-02T15:14:37: Stored page count (1).
2021-01-02T15:14:37: Starting text analysis
2021-01-02T15:14:37: The text to analyse is larger than limit (22018 > 10000). Analysing only first 10000 characters.

The processing is now in infinite loop of text analysis. I am even unable to cancel the job. Hope it helps.

@adam-stanek
Copy link
Author

Ok, the infinite looping of text analysis is unrelated. I tried fresh install again. I wanted to try to upload the files without the "Skip files already present in docspell" option, because I suspected that it might be the culprit. Now it passes the point when it was complaining about foreign key, but I am stuck in the infinite text analysis :(

@eikek
Copy link
Owner

eikek commented Jan 2, 2021

Is it possible that docspell was restarted in between processing? These lines look like it was stopped:

2021-01-02T14:44:18: The text to analyse is larger than limit (22018 > 10000). Analysing only first 10000 characters.
2021-01-02T14:44:27: Checking for duplicate files
...
2021-01-02T15:14:27: The text to analyse is larger than limit (22018 > 10000). Analysing only first 10000 characters.
2021-01-02T15:14:36: Found 1 existing item with these files.

There are 9sec in between. And it starts with the first step in processing (checking for existing files). I think then, when restarting it wants to check for duplicates (which it shouldn't) which results in errors. Did you restart it manually?

I never observed infinite loops in text analysis, but it may take a while especially on slower machines. Did the analysis never return? Is this some "special" machine, the raspberry pi or anything like that?

@adam-stanek
Copy link
Author

I don't think so. I have definitely not restarted the process manually so unless it died for some reason and got restarted by docker automatically, then no. But nothing in the log gives me any sign of the process dying. Can double check it though, if you think it is the case.

I was trying it out on my dev machine which is rather beefy. So no PI.

Job never succeeded for me so I am not exactly sure what should analysis return. It just repeats over and over for me (repeating the same log entries you have quoted).

@eikek
Copy link
Owner

eikek commented Jan 2, 2021

That is strange, I cannot explain what that is. These lines I quoted indicate that the process stops and gets restarted. I can reproduce the foreign key problem when I kill joex in the middle of processing and start it up again. (which is a bug that I'm fixing now)

I currently suspect that the "endless loop" is caused by restarting the process so it can never finish. The line Found 1 existing item with these files comes from the very first step in processing. That's why I'm currently thinking that something kills the process… it could be out-of-memory problems, but your machine is strong so I doubt that, too. Are there maybe some docker logs that can give more insight on the process level? Could you try with a small text file that only has a few words in it? (your logs above show that there is a document with rather much text)

@adam-stanek
Copy link
Author

Ok, your message got me double check what was happening. You were right! The JOEX container was indeed getting killed by OOM.

I recently run some docker update on my machine and it seems to set limit for docker VM to 1G for some reason 🤔 After rising that crazy limit I was able to finish without a problem 👍

Thank you for your prompt reaction which put me on the right path. Shall I dig deeper into the original foreign key issue, or do you already have all you need?

@adam-stanek
Copy link
Author

Btw. it might be good idea to add some log on the process start so that this kind of problems would be bit easier to debug. I suspect that your project has potential to be run on PIs and similar low-powered devices in home labs.

@eikek
Copy link
Owner

eikek commented Jan 2, 2021

Ah thanks, nice to hear you found it! Yes the joex currently needs quite some memory. 1.5G is ok, but 1G is just not enough. I will try to bring that down eventually, this is currently needed by NLP algos. And you're right, I'll add a more prominent log line that marks the start of processing, so this is easier to spot.

I think I have all I need to fix the fk problem. Thank you a lot!

eikek added a commit that referenced this issue Jan 2, 2021
eikek added a commit that referenced this issue Jan 2, 2021
@eikek eikek mentioned this issue Jan 2, 2021
@mergify mergify bot closed this as completed in #532 Jan 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working or in unexpected ways
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants