New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
delete Solr permissions docs when a dataverse, dataset or data file a… #7756
Conversation
It looks like the XML required for deleting in the Solr admin console was filtered out of my description. Testers/reviewers can contact me for clarification Can also use JSON for this |
FWIW - I think you just need a backslash before any < or > char to have them show (as I did here). |
<delete><query>id:datafile_39_draft_permission</query></delete> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this looks good. A couple of general comments:
• the way we are dealing with the solr docs seems inconsistent (this was true before this new code), which makes this generally less maintainable, i.e. more work to review / QA - it moght be nice to introduce some consistency here, but I could also see that being out of scope
• another tech debt thing, but which I feel is more in scope - these commands have existed for a long time, before we added an "onSuccess" mechanism for non transactional things. All these solr deletes (the new ones and existing) should really be in onsuccess, so we don't get in a weird state where the delete transaction fails, but the solr delete succeeds. This could lead a user to assuming it deleted (since no longer in search results), but where the object is still available with the direct link.
…ect delete transactions have failed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving to the onSuccess methods looks great - the one other thing that would be worth adding (I think) is try clauses around the solr stuff, that can be caught to return true or false, based on success.
See for example:
https://github.com/IQSS/dataverse/blob/develop/src/main/java/edu/harvard/iq/dataverse/engine/command/impl/UpdateDatasetVersionCommand.java
and then also the logic it does to log the failures:
LoggingUtil.writeOnSuccessFailureLog(this, failureLogText, dataset);
Steve worked on the above if you need guidance on how that log works.
I changed some code in DestroyDatasetCommand to conform with the way Steve is handling exceptions to make the onSuccess method return false when an exception occurs. Some of the other solr methods however, are not currently throwing exceptions out to the calling methods. I looked at the feasability of changing these methods, for instance deleteMultipleSolrIds, to throw exceptions to the caller, rather than wrapping up the exception message in an IndexResponse. The problem is that it would involve making changes to a lot of other code that has not a lot to do with solr permissions docs. I think such a change should be articulated in a separate issue |
…methods handle exceptions. It would probably be better for indexing methods to throw exceptions to calling methods that need to coordinate datbase changes and such with indexing and thus need to be aware when indexing fails
This works for most cases and all described by the ticket it fixes but destroying a published dataset does not remove the permissions from the index. The destroy use case was not covered under original issue so out of scope. Details on how to reproduce are below. To reproduce:
|
What this PR does / why we need it: Currently, when a dataverse or dataset is deleted, the Solr documents for the objects are deleted, but not the permission documents. This does not directly affect the application, but leaves orphan documents in the Solr index. The change adds code to delete the Solr permissions documents from the index
Which issue(s) this PR closes:
Closes #7254
Special notes for your reviewer:
The documents in question have ids in the form of datafile_54_permission, dataset_38_draft_permission, dataverse_35_permission e.g. with a suffix of "_permission" which is defined in the static final String IndexServiceBean.discoverabilityPermissionSuffix in the code. So the document dataverse_35_permission is separate but linked to the dataverse_35 document.
Suggestions on how to test this:
As is pointed out in the original issue #7254 the following call will produce output of Solr documents that exist in Solr but are not linked to records in Postgres.
curl -s http://localhost:8080/api/admin/index/status | jq .
with the relevant output in the payara server log looking like this (from my local dev environment )
permissionsInIndexButNotDatabase: {"permissions":["datafile_39_draft_permission","datafile_41_draft_permission","datafile_43_draft_permission","datafile_54_permission","dataset_38_draft_permission","dataset_40_draft_permission","dataset_42_draft_permission","dataset_44_draft_permission","dataset_46_draft_permission","dataverse_14_permission","dataverse_15_permission","dataverse_16_permission","dataverse_17_permission","dataverse_18_permission","dataverse_19_permission","dataverse_20_permission","dataverse_21_permission","dataverse_22_permission","dataverse_23_permission","dataverse_24_permission","dataverse_25_permission","dataverse_26_permission","dataverse_27_permission","dataverse_28_permission","dataverse_29_permission","dataverse_2_permission","dataverse_30_permission","dataverse_31_permission","dataverse_32_permission","dataverse_33_permission","dataverse_34_permission","dataverse_35_permission","dataverse_3_permission"]}|#]
The code cahnge prevents further proliferation of orphan documents, it does not remove all the existing orphans.
I demonstrated the proper working of the code changes by clearing out the bad solr documents by running updates of the form
id:datafile_39_draft_permission
through the Solr admin console ( there are other ways this could be done, I'm sure), and then creating a new dataverse, adding a dataset with a datafile and then deleting them and observing by running the above curl command again that the deletes in the dataverse application did not produce new orphan Solr "*_permission" documents
Alternately one might simply capture the "permissionsInIndexButNotDatabase: {"permissions": ... " output before and after creating and deleting dataverses and dataset and running a diff on the two outputs
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
no
Is there a release notes update needed for this change?:
no
Additional documentation: