Foreign key constraint violation (attachment_source_file_id_fkey) #530

adam-stanek · 2021-01-02T14:58:02Z

Hello guys,

I wanted to try out docspell and got stuck unable to process any files (tried several).

Here is a snippet from the log:

docspell-joex | [ioapp-compute-7] ERROR d.j.s.LogSink - >>> 2021-01-02T14:44:27.209885Z Error 9Yf4jffqr.../gordon/process-item/High: Job 9Yf4jffqr.../gordon/process-item/High execution failed. Retrying later. 
docspell-joex | org.postgresql.util.PSQLException: ERROR: update or delete on table "filemeta" violates foreign key constraint "attachment_source_file_id_fkey" on table "attachment_source"
docspell-joex |   Detail: Key (id)=(FdC2nXDErBW-kyk8NQteHFo-6DduLgNxq5y-ruCZnDRdCpy) is still referenced from table "attachment_source".
docspell-joex |         at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2553)
docspell-joex |         at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2285)
docspell-joex |         at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:323)
docspell-joex |         at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:473)
docspell-joex |         at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:393)
docspell-joex |         at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:164)
docspell-joex |         at org.postgresql.jdbc.PgPreparedStatement.executeUpdate(PgPreparedStatement.java:130)
docspell-joex |         at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeUpdate(ProxyPreparedStatement.java:61)
docspell-joex |         at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeUpdate(HikariProxyPreparedStatement.java)
docspell-joex |         at doobie.free.KleisliInterpreter$PreparedStatementInterpreter.$anonfun$executeUpdate$5(kleisliinterpreter.scala:958)
docspell-joex |         at doobie.free.KleisliInterpreter$PreparedStatementInterpreter.$anonfun$executeUpdate$5$adapted(kleisliinterpreter.scala:958)
docspell-joex |         at doobie.free.KleisliInterpreter.$anonfun$primitive$2(kleisliinterpreter.scala:112)
docspell-joex |         at evalOn @ doobie.util.transactor$Transactor$fromDataSource$FromDataSourceUnapplied.$anonfun$apply$13(transactor.scala:280)
docspell-joex |         at $anonfun$tailRecM$1 @ doobie.util.transactor$Transactor$$anon$4.$anonfun$apply$4(transactor.scala:163)
docspell-joex |         at $anonfun$tailRecM$1 @ doobie.free.KleisliInterpreter$ConnectionInterpreter.$anonfun$bracketCase$28(kleisliinterpreter.scala:753)
docspell-joex |         at $anonfun$tailRecM$1 @ doobie.free.KleisliInterpreter$ConnectionInterpreter.$anonfun$bracketCase$28(kleisliinterpreter.scala:753)
docspell-joex |         at $anonfun$tailRecM$1 @ doobie.free.KleisliInterpreter$ConnectionInterpreter.$anonfun$bracketCase$28(kleisliinterpreter.scala:753)
docspell-joex |         at $anonfun$tailRecM$1 @ doobie.util.transactor$Transactor$$anon$4.$anonfun$apply$4(transactor.scala:163)
docspell-joex |         at $anonfun$tailRecM$1 @ doobie.free.KleisliInterpreter$ConnectionInterpreter.$anonfun$bracketCase$28(kleisliinterpreter.scala:753)
docspell-joex |         at $anonfun$tailRecM$1 @ doobie.free.KleisliInterpreter$ConnectionInterpreter.$anonfun$bracketCase$28(kleisliinterpreter.scala:753)
docspell-joex |         at $anonfun$tailRecM$1 @ doobie.free.KleisliInterpreter$ConnectionInterpreter.$anonfun$bracketCase$28(kleisliinterpreter.scala:753)
docspell-joex |         at $anonfun$tailRecM$1 @ doobie.util.transactor$Transactor$$anon$4.$anonfun$apply$4(transactor.scala:163)
docspell-joex |         at $anonfun$tailRecM$1 @ doobie.free.KleisliInterpreter$ConnectionInterpreter.$anonfun$bracketCase$28(kleisliinterpreter.scala:753)
docspell-joex |         at $anonfun$tailRecM$1 @ doobie.free.KleisliInterpreter$ConnectionInterpreter.$anonfun$bracketCase$28(kleisliinterpreter.scala:753)
docspell-joex |         at $anonfun$tailRecM$1 @ doobie.free.KleisliInterpreter$ConnectionInterpreter.$anonfun$bracketCase$28(kleisliinterpreter.scala:753)
docspell-joex | [ioapp-compute-0] DEBUG d.j.s.SchedulerImpl - Job 9Yf4jffqr.../gordon/process-item/High done Stuck. Releasing resources.

I have used docker-compose from the repo with unmodified configuration. The version is 0.17.1 (joex-LATEST / 3e4973159aff).

The text was updated successfully, but these errors were encountered:

eikek · 2021-01-02T15:10:12Z

Hi and thanks for reporting! This is bad… can you give more details how you triggered this error? Can you process new files, or is nothing working right now?

eikek · 2021-01-02T15:19:01Z

Could you give some more log lines around the error, too? What type of file was it (zip, eml, pdf)?

Sorry for all the questions :-) trying to reproduce it.

adam-stanek · 2021-01-02T16:03:20Z

Nothing is working right now. This is a log from different file, but with the same result. I tried it on some JPEGs and some single / multipage PDFs, some scanned, some with proper text. The result seems to be the same for me. It seems like the application is recognizing files as a duplicate of something, but it is completely fresh install and I double checked that I haven't tried to import the same file twice.

2021-01-02T14:42:15: Checking for duplicate files
2021-01-02T14:42:15: Creating new item with 1 attachment(s)
2021-01-02T14:42:15: Creating item finished in 40 ms
2021-01-02T14:42:15: Not an archive: application/pdf
2021-01-02T14:42:15: Converting file Some(tulesicka-vyuctovani (1).pdf) (application/pdf) into a PDF
2021-01-02T14:42:15: Storing input to file /tmp/docspell-convert/docspell-ocrmypdf8073473742302975516/infile for running ocrmypdf
2021-01-02T14:42:15: Running external command: ocrmypdf -l deu --skip-text --deskew -j 1 /tmp/docspell-convert/docspell-ocrmypdf8073473742302975516/infile /tmp/docspell-convert/docspell-ocrmypdf8073473742302975516/out.pdf
2021-01-02T14:44:16: Command `ocrmypdf -l deu --skip-text --deskew -j 1 /tmp/docspell-convert/docspell-ocrmypdf8073473742302975516/infile /tmp/docspell-convert/docspell-ocrmypdf8073473742302975516/out.pdf` finished: 0
2021-01-02T14:44:16: ocrmypdf stdout:
2021-01-02T14:44:16: ocrmypdf stderr: WARNING - 9: [tesseract] lots of diacritics - possibly poor OCR INFO - Optimize ratio: 1.00 savings: -0.0% INFO - Image optimization did not improve the file - discarded INFO - Output file is a PDF/A-2B (as expected)
2021-01-02T14:44:16: Conversion to pdf successful. Saving file.
2021-01-02T14:44:17: Closing process: `ocrmypdf -l deu --skip-text --deskew -j 1 /tmp/docspell-convert/docspell-ocrmypdf8073473742302975516/infile /tmp/docspell-convert/docspell-ocrmypdf8073473742302975516/out.pdf`
2021-01-02T14:44:17: Starting text extraction for 1 files
2021-01-02T14:44:17: Extracting text for attachment tulesicka-vyuctovani (1).converted
2021-01-02T14:44:17: Trying to strip text from pdf using pdfbox.
2021-01-02T14:44:17: Extracting text for attachment tulesicka-vyuctovani (1).converted finished in 449 ms
2021-01-02T14:44:17: Storing extracted texts …
2021-01-02T14:44:17: Extracted text stored.
2021-01-02T14:44:17: Updating SOLR index
2021-01-02T14:44:17: Text extraction finished in 535 ms
2021-01-02T14:44:17: Creating preview images for 1 files…
2021-01-02T14:44:18: Preview generated, saving to database…
2021-01-02T14:44:18: Retrieving page count for 1 files…
2021-01-02T14:44:18: Found number of pages: 9
2021-01-02T14:44:18: Update attachment AGPQqzpGRHd-HQAwLT3cpuH-Qak6pZyVYVs-ySESK4ivSHL with page count Some(9)
2021-01-02T14:44:18: Stored page count (1).
2021-01-02T14:44:18: Starting text analysis
2021-01-02T14:44:18: The text to analyse is larger than limit (22018 > 10000). Analysing only first 10000 characters.
2021-01-02T14:44:27: Checking for duplicate files
2021-01-02T14:44:27: Deleting duplicate file Some(tulesicka-vyuctovani (1).pdf)!
2021-01-02T14:44:27: Job 9Yf4jffqr.../gordon/process-item/High execution failed. Retrying later.: ERROR: update or delete on table "filemeta" violates foreign key constraint "attachment_source_file_id_fkey" on table "attachment_source" Detail: Key (id)=(FdC2nXDErBW-kyk8NQteHFo-6DduLgNxq5y-ruCZnDRdCpy) is still referenced from table "attachment_source".
2021-01-02T15:14:26: Found 1 existing item with these files.
2021-01-02T15:14:26: Found 1 attachments. Use only those from task args: Set(Ident(FdC2nXDErBW-kyk8NQteHFo-6DduLgNxq5y-ruCZnDRdCpy))
2021-01-02T15:14:26: Not an archive: application/pdf
2021-01-02T15:14:26: Conversion to pdf already done for attachment Some(tulesicka-vyuctovani (1).converted.pdf).
2021-01-02T15:14:26: Starting text extraction for 1 files
2021-01-02T15:14:26: TextExtraction skipped, since text is already available.
2021-01-02T15:14:26: Storing extracted texts …
2021-01-02T15:14:26: Extracted text stored.
2021-01-02T15:14:26: Updating SOLR index
2021-01-02T15:14:26: Text extraction finished in 96 ms
2021-01-02T15:14:26: Creating preview images for 1 files…
2021-01-02T15:14:27: Preview generated, saving to database…
2021-01-02T15:14:27: Retrieving page count for 1 files…
2021-01-02T15:14:27: Found number of pages: 9
2021-01-02T15:14:27: Update attachment AGPQqzpGRHd-HQAwLT3cpuH-Qak6pZyVYVs-ySESK4ivSHL with page count Some(9)
2021-01-02T15:14:27: Stored page count (1).
2021-01-02T15:14:27: Starting text analysis
2021-01-02T15:14:27: The text to analyse is larger than limit (22018 > 10000). Analysing only first 10000 characters.
2021-01-02T15:14:36: Found 1 existing item with these files.
2021-01-02T15:14:36: Found 1 attachments. Use only those from task args: Set(Ident(FdC2nXDErBW-kyk8NQteHFo-6DduLgNxq5y-ruCZnDRdCpy))
2021-01-02T15:14:36: Not an archive: application/pdf
2021-01-02T15:14:36: Conversion to pdf already done for attachment Some(tulesicka-vyuctovani (1).converted.pdf).
2021-01-02T15:14:36: Starting text extraction for 1 files
2021-01-02T15:14:36: TextExtraction skipped, since text is already available.
2021-01-02T15:14:36: Storing extracted texts …
2021-01-02T15:14:36: Extracted text stored.
2021-01-02T15:14:36: Updating SOLR index
2021-01-02T15:14:36: Text extraction finished in 77 ms
2021-01-02T15:14:36: Creating preview images for 1 files…
2021-01-02T15:14:37: Preview generated, saving to database…
2021-01-02T15:14:37: Retrieving page count for 1 files…
2021-01-02T15:14:37: Found number of pages: 9
2021-01-02T15:14:37: Update attachment AGPQqzpGRHd-HQAwLT3cpuH-Qak6pZyVYVs-ySESK4ivSHL with page count Some(9)
2021-01-02T15:14:37: Stored page count (1).
2021-01-02T15:14:37: Starting text analysis
2021-01-02T15:14:37: The text to analyse is larger than limit (22018 > 10000). Analysing only first 10000 characters.

The processing is now in infinite loop of text analysis. I am even unable to cancel the job. Hope it helps.

adam-stanek · 2021-01-02T16:13:55Z

Ok, the infinite looping of text analysis is unrelated. I tried fresh install again. I wanted to try to upload the files without the "Skip files already present in docspell" option, because I suspected that it might be the culprit. Now it passes the point when it was complaining about foreign key, but I am stuck in the infinite text analysis :(

eikek · 2021-01-02T18:35:19Z

Is it possible that docspell was restarted in between processing? These lines look like it was stopped:

2021-01-02T14:44:18: The text to analyse is larger than limit (22018 > 10000). Analysing only first 10000 characters.
2021-01-02T14:44:27: Checking for duplicate files
...
2021-01-02T15:14:27: The text to analyse is larger than limit (22018 > 10000). Analysing only first 10000 characters.
2021-01-02T15:14:36: Found 1 existing item with these files.

There are 9sec in between. And it starts with the first step in processing (checking for existing files). I think then, when restarting it wants to check for duplicates (which it shouldn't) which results in errors. Did you restart it manually?

I never observed infinite loops in text analysis, but it may take a while especially on slower machines. Did the analysis never return? Is this some "special" machine, the raspberry pi or anything like that?

adam-stanek · 2021-01-02T19:21:52Z

I don't think so. I have definitely not restarted the process manually so unless it died for some reason and got restarted by docker automatically, then no. But nothing in the log gives me any sign of the process dying. Can double check it though, if you think it is the case.

I was trying it out on my dev machine which is rather beefy. So no PI.

Job never succeeded for me so I am not exactly sure what should analysis return. It just repeats over and over for me (repeating the same log entries you have quoted).

eikek · 2021-01-02T19:34:44Z

That is strange, I cannot explain what that is. These lines I quoted indicate that the process stops and gets restarted. I can reproduce the foreign key problem when I kill joex in the middle of processing and start it up again. (which is a bug that I'm fixing now)

I currently suspect that the "endless loop" is caused by restarting the process so it can never finish. The line Found 1 existing item with these files comes from the very first step in processing. That's why I'm currently thinking that something kills the process… it could be out-of-memory problems, but your machine is strong so I doubt that, too. Are there maybe some docker logs that can give more insight on the process level? Could you try with a small text file that only has a few words in it? (your logs above show that there is a document with rather much text)

adam-stanek · 2021-01-02T19:40:59Z

Ok, your message got me double check what was happening. You were right! The JOEX container was indeed getting killed by OOM.

I recently run some docker update on my machine and it seems to set limit for docker VM to 1G for some reason 🤔 After rising that crazy limit I was able to finish without a problem 👍

Thank you for your prompt reaction which put me on the right path. Shall I dig deeper into the original foreign key issue, or do you already have all you need?

adam-stanek · 2021-01-02T19:42:50Z

Btw. it might be good idea to add some log on the process start so that this kind of problems would be bit easier to debug. I suspect that your project has potential to be run on PIs and similar low-powered devices in home labs.

eikek · 2021-01-02T20:04:48Z

Ah thanks, nice to hear you found it! Yes the joex currently needs quite some memory. 1.5G is ok, but 1G is just not enough. I will try to bring that down eventually, this is currently needed by NLP algos. And you're right, I'll add a more prominent log line that marks the start of processing, so this is easier to spot.

I think I have all I need to fix the fk problem. Thank you a lot!

Issue: #530

eikek added this to the Docspell 0.18.0 milestone Jan 2, 2021

eikek added the bug Something isn't working or in unexpected ways label Jan 2, 2021

eikek added a commit that referenced this issue Jan 2, 2021

Fix duplicate check on restarts

7c92d0e

Issue: #530

eikek added a commit that referenced this issue Jan 2, 2021

Fix duplicate check on restarts

97dfcec

Issue: #530

eikek added a commit that referenced this issue Jan 2, 2021

Use more prominent log line to indicate start of processing

08c3f40

Issue: #530

eikek mentioned this issue Jan 2, 2021

Fk on delete bug #532

Merged

eikek added a commit that referenced this issue Jan 2, 2021

Use more prominent log line to indicate start of processing

611e480

Issue: #530

mergify bot closed this as completed in #532 Jan 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Foreign key constraint violation (attachment_source_file_id_fkey) #530

Foreign key constraint violation (attachment_source_file_id_fkey) #530

adam-stanek commented Jan 2, 2021

eikek commented Jan 2, 2021

eikek commented Jan 2, 2021

adam-stanek commented Jan 2, 2021

adam-stanek commented Jan 2, 2021

eikek commented Jan 2, 2021

adam-stanek commented Jan 2, 2021

eikek commented Jan 2, 2021

adam-stanek commented Jan 2, 2021

adam-stanek commented Jan 2, 2021

eikek commented Jan 2, 2021

Foreign key constraint violation (attachment_source_file_id_fkey) #530

Foreign key constraint violation (attachment_source_file_id_fkey) #530

Comments

adam-stanek commented Jan 2, 2021

eikek commented Jan 2, 2021

eikek commented Jan 2, 2021

adam-stanek commented Jan 2, 2021

adam-stanek commented Jan 2, 2021

eikek commented Jan 2, 2021

adam-stanek commented Jan 2, 2021

eikek commented Jan 2, 2021

adam-stanek commented Jan 2, 2021

adam-stanek commented Jan 2, 2021

eikek commented Jan 2, 2021