Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transactional Store/ Delete #20

Closed
wants to merge 16 commits into from

Conversation

Richyl2
Copy link
Contributor

@Richyl2 Richyl2 commented Sep 13, 2019

Description

  • Initial work to synchronize the multiple Azure data stores

return await cloudBlob.CatchStorageExceptionAndThrowDataStoreException(
async (blockBlob) =>
{
// Will throw if the provided resource identifier already exists.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment right? Exists should not throw if identifier already exists?

}

/// <inheritdoc />
public async Task<Uri> AddInstanceAsStreamAsync(DicomInstance dicomInstance, Stream buffer, bool overwriteIfExists = false, CancellationToken cancellationToken = default)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: AddInstanceFromStreamAsync?

{
// Already a lease on this blob, delay before attempting to acquire the lease.
await Task.Delay(TimeSpan.FromSeconds(RetryLeaseDelaySeconds), cancellationToken);
return await AcquireLeaseAndCheckIfInvalidStateAsync(cancellationToken);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If i am reading the code correctly, basically this code will try until it can acquire lease on the blob.

Let's say a request comes in to store a new DICOM file. While it's processing, the lease will be refreshed in the background.
Let's supposed another request comes in with the same DICOM file as previous step, then this process will basically wait until the previous step is either succeeded or failed.

Did I read the logic right?

_queueClient = queueClient;
}

public ITransaction CreateDicomQueueTransaction()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming queue is used to retry cleanup if the delete fails? Queue might not be sufficient though since it might fail to insert item to queue. We probably still need some sort of backgvround service to clean up orphan items in the background.


if (instances.Length > 0)
{
// Attempt to delete the instance indexes and instance metadata.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I can see might be challenging to do is how to clean up orphan data when contents are partially deleted. The task would have to scan both blob storage and the cosmos metadata and find the state of both and then determine what action to take.

Have you considered using the cosmos metadata as a way to keep track the "state" of the record and the API behavior?

Essentially, what we have is 1 cosmos metadata document (study + series) , 1 blob per instance (study + series + instance), and 1 blob per metadata (for study).

Adding a DICOM data:

  1. Add an empty cosmos metadata document with state "Creating" and "touch" the document in the background to update the state expiration timestamp.
  2. Store the blob
  3. Store the metadata
  4. Update the cosmos index with the content and update the state to be "Created."
  • If any of the step fails before 4, we treat the API as failed and we return the error to the user.
  • We will need to cleanup any incomplete data so as part of the rollback, we will try to delete any data that was created. The cosmos metadata will be the last item to be deleted.
  • In case the rollback fails, we will have background job that scans through cosmos metadata and delete any item in "Creating" state older than X minute.

However, with step #1, we could get conflict back if cosmos index already exists. In which case, we can check the state of the existing data.

If the state is "Created", then we will return Conflict since that means the data already exists.
If the state is "Creating", we will do similar logic like yours either by waiting for something else.

Deleting a DICOM data:

  1. Update the existing cosmos metadata document with state "Deleting" and "touch" the document in the background to update the state expiration timestamp.
  2. delete the metadata
  3. delete the blob
  4. delete the cosmos index
  • Because we "soft-delete" the cosmos metadata document, the API can trigger cleanup in the background and return right away (if we want to).
  • In case of failures, we will have background job that scans through cosmos metadata and delete any item in "Deleting" state.

The retrieve and query will respect the cosmos metadata state and only return items that's in "Created" state.

Cleaning up:

  • The background job can fetch any cosmos metadata document in "Creating" (older than X minute) or "Deleting" state and delete associated blobs.

It's similar to your approach but we have one list to go against regarding what needs to be cleaned up so that logic could be a little simpler.

Of course, we will have to be able to handle the case where the cosmos metadata document is in "Creating" or "Deleting" state and another store call comes in with the same study.

I haven't fully thought through the problem so there might be holes in the logic. Since you have been thinking about this, can you see what might not work?

Copy link
Contributor

@jackliums jackliums Sep 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmm after right this i realized that this doesn't actually work because the cosmos metadata document contains information about all of the instances belonging to that study so if someone is deleting an instance for example, we will still need to be able to serve the retrieve for the rest of the instance in there so it's not just simply deleting the document. it also need to support rmeoving items from the document.

I need to think about this a bit more.

@smithago smithago closed this Mar 20, 2020
@StevenBorg StevenBorg deleted the personal/richyl2/dicom-transactional branch September 22, 2020 03:41
@StevenBorg StevenBorg restored the personal/richyl2/dicom-transactional branch September 22, 2020 03:41
@smithago smithago deleted the personal/richyl2/dicom-transactional branch October 20, 2021 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants