Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

500 error in FCREPO when attempting to add extracted_text to an existing object #11

Open
ghost opened this issue Jul 5, 2017 · 5 comments

Comments

@ghost
Copy link

ghost commented Jul 5, 2017

When updating the extracted text, I'm seeing the following error. This happens when the original file is TXT. I have observed it when adding to a PDF original_file, but this works more consistently than not.

No problem when I do a direct add of file into 'files' container via GUI and via CURL (as expected)

This is the code:

local_file = Hydra::Derivatives::IoDecorator.new(File.open(path, "rb"))
          local_file.original_name = path.split('/').last
          local_file.mime_type = Hydra::Works::DetermineMimeType.call(local_file, local_file.original_name)
          Hydra::Works::AddFileToFileSet.call(fileset,
                                              local_file,
                                              type.to_sym,
                                              versioning: false)

It works via this very manual method:

  • fileset.association(:extracted_text).build
  • PUT the file to the generated uri with curl using the FCR REST API
  • add the ExtractedText rdf:type via api or gui

I've ruled out the being the file itself and also that it's a Fedora problem proper.

I can add this file to one fileset, but not the other, which hints at something to do with the original_file file in the fileset, which is a plain text file. It seems like it's something the AF / LDP are trying to do but I'm struggling to figure out is WHAT AF / LDP are doing ... I get nothing in the logs for any of this.

In FCRepo logs:

INFO 15:06:37.142 (FedoraLdp) PUT resource 'dev/6d/9c/fd/d2/6d9cfdd2-a775-4eb7-9575-f53f816f33f0/files/24eee720-d0a4-4ab3-a345-4d32bcdba79d'
DEBUG 15:06:37.142 (FedoraBinaryImpl) Created content node at path: /dev/6d/9c/fd/d2/6d9cfdd2-a775-4eb7-9575-f53f816f33f0/files/24eee720-d0a4-4ab3-a345-4d32bcdba79d/jcr:content
ERROR 15:06:57.163 (RepositoryExceptionMapper) Caught a repository exception: java.net.SocketTimeoutException: Read timed out

In Hyku logs:

Ldp::HttpError: STATUS: 500 org.modeshape.jcr.value.binary.BinaryStoreException: java.net.SocketTimeoutException: Read timed out
	at org.modeshape.jcr.value.binary.FileSystemBinaryStore.storeValue(FileSystemBinaryStore.java:128)
	at org.modeshape.jcr.value.binary.AbstractBinaryStore.storeValue(AbstractBinaryStore.java:251)
	at org.modeshape.jcr.value.binary.BinaryStoreValueFactory.create(BinaryStoreValueFactory.java:257)
	at org.modeshape.jcr.value.binary.BinaryStoreValueFactory.create(BinaryStoreValueFactory.java:49)
	at org.modeshape.jcr.JcrValueFactory.createBinary(JcrValueFactory.java:149)
	at org.modeshape.jcr.JcrValueFactory.createBinary(JcrValueFactory.java:41)
	at org.fcrepo.kernel.modeshape.FedoraBinaryImpl.setContent(FedoraBinaryImpl.java:178)
	at org.fcrepo.http.api.ContentExposingResource.replaceResourceBinaryWithStream(ContentExposingResource.java:612)
	at org.fcrepo.http.api.FedoraLdp.createOrReplaceObjectRdf(FedoraLdp.java:361)

See samvera-tech post

@ghost ghost modified the milestone: 1.0.0.beta.kf Jul 25, 2017
@ghost ghost added the ready label Jul 25, 2017
@ghost ghost added assigned and removed ready labels Sep 9, 2017
@ghost ghost removed this from the 1.0.0.beta.kf milestone Sep 15, 2017
@ghost
Copy link
Author

ghost commented Dec 3, 2017

In the end I decided to create separate files rather than add my own 'extraced_text', but this problem was never solved.

@ghost ghost removed the assigned label Jan 19, 2018
@whikloj
Copy link

whikloj commented Apr 26, 2019

Hey @geekscruff ,

I'm looking at this issue from the Fedora side and (trying) to set up a Hyku box to test. Can you give me any details about the size, structure of the object and whether your system was under load when this happened. Also did it happen consistently with the above mentioned object?

@ghost
Copy link
Author

ghost commented Apr 29, 2019

Hello @whikloj ... the was all done on a dev instance, and I was running a migration which was adding a PDF and TXT file to a Hyrax/Hyku work, and then adding an 'extracted text' to the TXT, so another file into the 'file set'. It failed consistently on that, but would add the extracted text to the (larger) PDF.

@whikloj
Copy link

whikloj commented Apr 29, 2019

So, just so I've got this clear in my mind. The PDF is your pcdm:Object with a pcdm:FileSet containing the TXT text (which is provided) but then you extract the text from the PDF and add that to the same pcdm:FileSet.

My understanding of the Samvera content model is weak, so correct me where I am wrong.

pcdm:Object (the PDF) -> pcdm:FileSet -> pcdm:File (PDF)
                       ↳ pcdm:FileSet -> pcdm:File (TXT)
                                       ↳ pcdm:File (extracted text) [ this causes the boom ]

@ghost
Copy link
Author

ghost commented Apr 30, 2019

Yes, and ...

pcdm:Object (the PDF) -> pcdm:FileSet -> pcdm:File (PDF)
                                       ↳ pcdm:File (extracted text) [ no boom ]
                       ↳ pcdm:FileSet -> pcdm:File (TXT)
                                      

In the end I dispensed with the extra TXT and added extracted text to the PDF FileSet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant