-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zero byte files from Pure Deposit API #379
Comments
@aalbinclark does this relate to the filesize in any way? The example is a thesis, which could be a large document. [Edit: in the logs provided, there is a 'sizelong=64496457' - which looks like 64M. Not that big in the grand scheme of things, but possibly big enough to cause timeouts] |
@aalbinclark EPrints Apache error log messages would be really useful. (If you have separate error logs for HTTP and HTTPS be sure to check both). Also, it would be useful if you could confirm whether you have only just setup Pure to deposit to EPrints or whether the issue has only occurred since upgrading either Pure or EPrints. If the issue has only occurred since an upgrade to one of the systems, please let us know what version you were running prior to the upgrade. |
EPrints has been running for over fifteen years and has been updated via the Pure Deposit API for many years. We installed a new instance of EPrints (3.4.5) mid-November 2023.
I've compiled a fuller set of logs for another eprint https://eprints.lancs.ac.uk/id/eprint/217096/. File size is only 6MB. Pure audit logsEPrints Apache
|
@aalbinclark - for the Pure connector, I normally expect to see a 401 (unauthorised) followed by a 200 response. This could mean:
The POST on 7th April did have the follow-up request - which got a 200 response, and looks like it was successful. The GET on the 9th resulted in a 6M response. What changed between the 6th and 7th - either in EPrints, Pure, or channels between them? One thing from the logs is the timing of the deposit - shortly before midnight, to shortly after midnight. Do any jobs start on the EPrints server at midnight that might impact things e.g. database or storage backups? In case it's useful, I've got a modified version of the PDA modules that allows you keep the raw files as sent across from Pure - both the EPrints XML and the Document payloads. Search this gist: It might be useful to see exactly what's being sent. If that all looks normal, then the files can be used to put together some tests using e.g. Postman. Happy to describe the modified PDA code in more detail if you want. Will - drop me a line J.Salter@leeds.ac.uk if you want. |
@aalbinclark It is important to appreciate that there are three components here not just two:
The PDA Handler was developed by Pure rather than using EPrints REST API (like I think @jesusbagpuss's implementation uses). This means in it is not officially supported by EPrints. It was only quite recently that I agreed with Pure that we could make a reference version of the PDA Handler available on EPrints GitHub Organisation. As @jesusbagpuss has already said you would expect 401s followed by 200s so at an HTTP interface level everything seems to be working as expected. The issue looks like it might lie with how the PDA Handler calls whatever functions it uses to store a file to disk. I am not aware of any changes off hand that would have affected calls to EPrints functions that store files to disk, especially not any which would cause an intermittent issue like this. As @jesusbagpuss previously queried, I could believe an issue were smaller files uploading successfully but larger ones always failing but the same file uploading sometimes and not others feels unlikely to be a coding issue and more likely an enviromental one. I would advise that you compare the reference version of the PDA Handler against what you are currently running on your repository to see if there are any differences. This will at least allow us to rule out any bug that may have been fixed. Unfortunately the latest version I am aware of that uses this reference version of the PDA Handler is only on version 3.4.3. To my knowledge there has not been any intermittent issues with zero byte files for this version. It would also be useful that you could confirm that the only thing that has changed between the tests you did with EPrints 3.4.2 and EPrints 3.4.5. Did you tests run on the same version of operating system with similar configuration (e.g. SELinux state)? Were the underlying applications (e.g. Apache, mod_perl, MySQL, etc.) the same versions and running much the same configuration? Finally, although the access logs look at expected, some warnings may still be being recorded in the Apache error logs, please can you check if there is you can find anything. P.S. Just looking at the access log more closely, it looks like PUT commands always have a 401 followed by a 200 but the POST commands to upload a file almost always have a 401 but then no further 200, except at 07/Apr/2024:23:57:36. From what I have experienced from how Pure uses the PDA Handler, it never send credentials until it is explicitly told to do so with a 401 response. So what is odd here is what changes Pure's behaviour at 07/Apr/2024:23:57:36. I cannot think of any reason for this, unless something has changed about EPrint record 217096. It may be worth checking the history tab for this eprint to see if there are any changes between 07/Apr/2024:00:02:36 and 07/Apr/2024:23:57:36. Alternatively maybe something about the equivalent record in Pure has changed which means it is will to try uploading the file with credentials, when before it was not. |
Closing as this is not a development task or a bug directly related to the EPrints codebase (i.e. it involves third-party application/plugins) |
There is an issue with Pure (5.28.2) depositing files in EPrints (3.4.5). The file storage repeatedly fails for an eprint before eventually working. There is no obvious pattern.
EPrints shows the failed uploads as duplicate documents with zero byte files, eventually showing the successfully uploaded file:
https://eprints.lancs.ac.uk/id/eprint/216341/
Pure audit log (reverse chronological order)
The text was updated successfully, but these errors were encountered: