Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

delete Solr permissions docs when a dataverse, dataset or data file a… #7756

Merged
merged 5 commits into from May 13, 2021

Conversation

rtreacy
Copy link
Contributor

@rtreacy rtreacy commented Apr 2, 2021

What this PR does / why we need it: Currently, when a dataverse or dataset is deleted, the Solr documents for the objects are deleted, but not the permission documents. This does not directly affect the application, but leaves orphan documents in the Solr index. The change adds code to delete the Solr permissions documents from the index

Which issue(s) this PR closes:

Closes #7254

Special notes for your reviewer:
The documents in question have ids in the form of datafile_54_permission, dataset_38_draft_permission, dataverse_35_permission e.g. with a suffix of "_permission" which is defined in the static final String IndexServiceBean.discoverabilityPermissionSuffix in the code. So the document dataverse_35_permission is separate but linked to the dataverse_35 document.

Suggestions on how to test this:
As is pointed out in the original issue #7254 the following call will produce output of Solr documents that exist in Solr but are not linked to records in Postgres.

curl -s http://localhost:8080/api/admin/index/status | jq .

with the relevant output in the payara server log looking like this (from my local dev environment )

permissionsInIndexButNotDatabase: {"permissions":["datafile_39_draft_permission","datafile_41_draft_permission","datafile_43_draft_permission","datafile_54_permission","dataset_38_draft_permission","dataset_40_draft_permission","dataset_42_draft_permission","dataset_44_draft_permission","dataset_46_draft_permission","dataverse_14_permission","dataverse_15_permission","dataverse_16_permission","dataverse_17_permission","dataverse_18_permission","dataverse_19_permission","dataverse_20_permission","dataverse_21_permission","dataverse_22_permission","dataverse_23_permission","dataverse_24_permission","dataverse_25_permission","dataverse_26_permission","dataverse_27_permission","dataverse_28_permission","dataverse_29_permission","dataverse_2_permission","dataverse_30_permission","dataverse_31_permission","dataverse_32_permission","dataverse_33_permission","dataverse_34_permission","dataverse_35_permission","dataverse_3_permission"]}|#]

The code cahnge prevents further proliferation of orphan documents, it does not remove all the existing orphans.

I demonstrated the proper working of the code changes by clearing out the bad solr documents by running updates of the form
id:datafile_39_draft_permission
through the Solr admin console ( there are other ways this could be done, I'm sure), and then creating a new dataverse, adding a dataset with a datafile and then deleting them and observing by running the above curl command again that the deletes in the dataverse application did not produce new orphan Solr "*_permission" documents

Alternately one might simply capture the "permissionsInIndexButNotDatabase: {"permissions": ... " output before and after creating and deleting dataverses and dataset and running a diff on the two outputs

Does this PR introduce a user interface change? If mockups are available, please link/include them here:
no
Is there a release notes update needed for this change?:
no

Additional documentation:

@rtreacy
Copy link
Contributor Author

rtreacy commented Apr 2, 2021

It looks like the XML required for deleting in the Solr admin console was filtered out of my description. Testers/reviewers can contact me for clarification Can also use JSON for this

@qqmyers
Copy link
Member

qqmyers commented Apr 2, 2021

FWIW - I think you just need a backslash before any < or > char to have them show (as I did here).

@rtreacy
Copy link
Contributor Author

rtreacy commented Apr 2, 2021

<delete><query>id:datafile_39_draft_permission</query></delete>
is the form of the delete XML I was referring to
Thanks Jim

Copy link
Contributor

@scolapasta scolapasta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks good. A couple of general comments:
• the way we are dealing with the solr docs seems inconsistent (this was true before this new code), which makes this generally less maintainable, i.e. more work to review / QA - it moght be nice to introduce some consistency here, but I could also see that being out of scope
• another tech debt thing, but which I feel is more in scope - these commands have existed for a long time, before we added an "onSuccess" mechanism for non transactional things. All these solr deletes (the new ones and existing) should really be in onsuccess, so we don't get in a weird state where the delete transaction fails, but the solr delete succeeds. This could lead a user to assuming it deleted (since no longer in search results), but where the object is still available with the direct link.

@djbrooke djbrooke moved this from Review 🦁 to IQSS Team - In Progress 💻 in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) Apr 7, 2021
@rtreacy rtreacy moved this from IQSS Team - In Progress 💻 to Review 🦁 in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) May 6, 2021
Copy link
Contributor

@scolapasta scolapasta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving to the onSuccess methods looks great - the one other thing that would be worth adding (I think) is try clauses around the solr stuff, that can be caught to return true or false, based on success.

See for example:
https://github.com/IQSS/dataverse/blob/develop/src/main/java/edu/harvard/iq/dataverse/engine/command/impl/UpdateDatasetVersionCommand.java

and then also the logic it does to log the failures:
LoggingUtil.writeOnSuccessFailureLog(this, failureLogText, dataset);

Steve worked on the above if you need guidance on how that log works.

@djbrooke djbrooke moved this from Review 🦁 to IQSS Team - In Progress 💻 in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) May 10, 2021
@rtreacy
Copy link
Contributor Author

rtreacy commented May 11, 2021

I changed some code in DestroyDatasetCommand to conform with the way Steve is handling exceptions to make the onSuccess method return false when an exception occurs. Some of the other solr methods however, are not currently throwing exceptions out to the calling methods. I looked at the feasability of changing these methods, for instance deleteMultipleSolrIds, to throw exceptions to the caller, rather than wrapping up the exception message in an IndexResponse. The problem is that it would involve making changes to a lot of other code that has not a lot to do with solr permissions docs. I think such a change should be articulated in a separate issue

@rtreacy rtreacy moved this from IQSS Team - In Progress 💻 to Review 🦁 in IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) May 11, 2021
@rtreacy rtreacy assigned scolapasta and unassigned rtreacy May 11, 2021
…methods handle exceptions.

It would probably be better for indexing methods to throw exceptions to calling methods that need to
coordinate datbase changes and such with indexing and thus need to be aware when indexing fails
IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) automation moved this from Review 🦁 to QA 🔎✅ May 11, 2021
@kcondon kcondon self-assigned this May 13, 2021
@kcondon
Copy link
Contributor

kcondon commented May 13, 2021

This works for most cases and all described by the ticket it fixes but destroying a published dataset does not remove the permissions from the index. The destroy use case was not covered under original issue so out of scope. Details on how to reproduce are below.

To reproduce:

  1. create and a publish a dataset with a file
  2. query the db to get the ids for dataset and file (select id, dtype, identifier from dvobject where identifier like '%xyz%'; )
  3. use the destroy dataset endpoint to remove the published dataset: https://guides.dataverse.org/en/5.4.1/api/native-api.html?highlight=destroy (curl -H "X-Dataverse-key: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X DELETE https://demo.dataverse.org/api/datasets/:persistentId/destroy/?persistentId=doi:10.5072/FK2/AAA000)
  4. run the index status command
  5. check log file for permission with name that has id of destroyed dataset and file.

@kcondon kcondon merged commit d172025 into develop May 13, 2021
IQSS/dataverse (TO BE RETIRED / DELETED in favor of project 34) automation moved this from QA 🔎✅ to Done 🚀 May 13, 2021
@kcondon kcondon deleted the 7254-solr-permissions-docs-not-deleted branch May 13, 2021 21:10
@djbrooke djbrooke added this to the 5.5 milestone May 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

Solr permission document not deleted when a dataverse or dataset is deleted
5 participants