Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to Delete File DID via Undertaker #5154

Open
bjwhite-fnal opened this issue Jan 18, 2022 · 5 comments
Open

Unable to Delete File DID via Undertaker #5154

bjwhite-fnal opened this issue Jan 18, 2022 · 5 comments

Comments

@bjwhite-fnal
Copy link
Contributor

The ICARUS Rucio deployment has been attempting to delete a DID for an individual file. However, the Undertaker fails to remove the DID, since (as per Martin) it was originally designed only to be used on datasets. It would be useful to allow the Undertaker to remove file DIDs also.
Thanks!

@bari12
Copy link
Member

bari12 commented Nov 3, 2022

@bjwhite-fnal with which Rucio version did you test this?

@bjwhite-fnal
Copy link
Contributor Author

This would have been 1.26, though I'm not sure which minor version I was using back in January.

@bari12
Copy link
Member

bari12 commented Nov 3, 2022

I have to test this a bit more, but to me it looks like this should be working in 1.29 - I will add a unittest to confirm, but the parts where we exclude files are not there anymore as far as I see.

ThePhisch added a commit to ThePhisch/rucio that referenced this issue Feb 2, 2023
Added tests to cover more of the undertaker functionality. This was
triggered by issue rucio#5154.

`test_file_did_deletion` tests how the undertaker handles a file DID
that is not attached to any dataset. It asserts that the epoch tombstone
is set on the file replica, regardless of whether rules exist on the
file DID (parametrised via a pytest fixture). Furthermore, the file DID must not be deleted, as this is the
Reaper's job.

`test_file_dids_in_dataset` tests how the undertaker handles an expired
file DID that belongs to a non-expired dataset among with other files.
It asserts that the expired file is detached and its replicas receive
the epoch tombstone. Furthermore, the other files are unchanged.

`test_file_protected_by_rule` sets up a system in which two files belong
to an expired dataset. At the same time, one of these files is protected
by a rule. The expected outcome is that the unprotected file DID is
detached from the dataset and its replica is tombstoned. On the other
hand, the protected file DID is detached from the dataset, and its
replicas protected by the dataset rule are epoch-tombstoned while its
replicas protected by the external rule remain unchanged. Finally, it is
asserted that the file DIDs remain whereas the dataset DID is deleted.

`test_file_protected_by_dataset` has a similar setup to
`test_file_protected_by_rule`, but instead of a rule acting directly on
the protected file, the protected file is shared between an expired DID
and a non-expired DID. We expect the unprotected file DID to be detached
and its replica to receive the epoch tombstone. Furthermore, we expect
the protected file DID to be detached from the expired dataset but
remain attached to the non-expired dataset; with the protected replica
remaining untouched and the unprotected replica being epoch-tombstoned.

As with the pre-existing `test_removal_all_replicas2`, the tests have
been parametrised to test both `use_temp_tables: True` and `False`.

The pre-existing test `TestUndertaker::test_undertaker` was changed to
include an assertion that the expired dataset DIDs were deleted by the
undertaker. In addition, we assert that the replicas have the
epoch-tombstone (marked by `rucio.db.sqla.constants.OBSOLETE`), instead
of asserting `is not None` (which is more generic and could also include
non-epoch-tombstones).
ThePhisch added a commit to ThePhisch/rucio that referenced this issue Feb 2, 2023
In `code/did.py`, the process for DID-deletion was extended (both for
`temp_tables` and without). This was necessary because DIDs without
rules still require their replicas to be epoch-tombstoned (without the
`delete_rule`-process to handle that).

Took extra care not to add any SQL calls into the loop over all DIDs.

===

In `core/rule.py`, the case that no locks exist for a given rule was
handled. Since this would mean that the function
`__delete_lock_and_update_replica` (which sets the epoch tombstone)
would not be called, the epoch tombstone had to be set manually by
updating the Replica.

The Replica is updated by joining the ReplicationRule table to the
DataIdentifierAssociation table to the RSEFileAssociation table via the
scopenames. We select the RSEFileAssociation for which exists a rule
that indirectly protects it (rule assigned to dataset which is the
parent to a file which has a replica). We constrain this by asserting
that the rule needs to match a given rule_id, the replica must be
saved on an RSE matched by the rule's RSE Expression and the replica
must have a lock_cnt of 0.

The structure of the SQLAlchemy Query has been taken from other places
in the code. Choices such as using
`execution_options(synchronize_session=False)` stem from reusing
pre-existing code.
ThePhisch added a commit to ThePhisch/rucio that referenced this issue Feb 8, 2023
Added tests to cover more of the undertaker functionality. This was
triggered by issue rucio#5154.

`test_file_did_deletion` tests how the undertaker handles a file DID
that is not attached to any dataset. It asserts that the epoch tombstone
is set on the file replica, regardless of whether rules exist on the
file DID (parametrised via a pytest fixture). Furthermore, the file DID must not be deleted, as this is the
Reaper's job.

`test_file_dids_in_dataset` tests how the undertaker handles an expired
file DID that belongs to a non-expired dataset among with other files.
It asserts that the expired file is detached and its replicas receive
the epoch tombstone. Furthermore, the other files are unchanged.

`test_file_protected_by_rule` sets up a system in which two files belong
to an expired dataset. At the same time, one of these files is protected
by a rule. The expected outcome is that the unprotected file DID is
detached from the dataset and its replica is tombstoned. On the other
hand, the protected file DID is detached from the dataset, and its
replicas protected by the dataset rule are epoch-tombstoned while its
replicas protected by the external rule remain unchanged. Finally, it is
asserted that the file DIDs remain whereas the dataset DID is deleted.

`test_file_protected_by_dataset` has a similar setup to
`test_file_protected_by_rule`, but instead of a rule acting directly on
the protected file, the protected file is shared between an expired DID
and a non-expired DID. We expect the unprotected file DID to be detached
and its replica to receive the epoch tombstone. Furthermore, we expect
the protected file DID to be detached from the expired dataset but
remain attached to the non-expired dataset; with the protected replica
remaining untouched and the unprotected replica being epoch-tombstoned.

As with the pre-existing `test_removal_all_replicas2`, the tests have
been parametrised to test both `use_temp_tables: True` and `False`.

The pre-existing test `TestUndertaker::test_undertaker` was changed to
include an assertion that the expired dataset DIDs were deleted by the
undertaker. In addition, we assert that the replicas have the
epoch-tombstone (marked by `rucio.db.sqla.constants.OBSOLETE`), instead
of asserting `is not None` (which is more generic and could also include
non-epoch-tombstones).
ThePhisch added a commit to ThePhisch/rucio that referenced this issue Feb 8, 2023
In `code/did.py`, the process for DID-deletion was extended (both for
`temp_tables` and without). This was necessary because DIDs without
rules still require their replicas to be epoch-tombstoned (without the
`delete_rule`-process to handle that).

Took extra care not to add any SQL calls into the loop over all DIDs.

===

In `core/rule.py`, the case that no locks exist for a given rule was
handled. Since this would mean that the function
`__delete_lock_and_update_replica` (which sets the epoch tombstone)
would not be called, the epoch tombstone had to be set manually by
updating the Replica.

The Replica is updated by joining the ReplicationRule table to the
DataIdentifierAssociation table to the RSEFileAssociation table via the
scopenames. We select the RSEFileAssociation for which exists a rule
that indirectly protects it (rule assigned to dataset which is the
parent to a file which has a replica). We constrain this by asserting
that the rule needs to match a given rule_id, the replica must be
saved on an RSE matched by the rule's RSE Expression and the replica
must have a lock_cnt of 0.

The structure of the SQLAlchemy Query has been taken from other places
in the code. Choices such as using
`execution_options(synchronize_session=False)` stem from reusing
pre-existing code.
@rdimaio rdimaio self-assigned this Jan 11, 2024
@bari12 bari12 removed their assignment Jan 12, 2024
@rdimaio
Copy link
Contributor

rdimaio commented Jan 18, 2024

Related old PR for reference #6063

@rdimaio
Copy link
Contributor

rdimaio commented Jan 29, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment