Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete runs in daterange in batches #433

Merged
merged 3 commits into from Sep 9, 2021

Conversation

Roemer
Copy link
Contributor

@Roemer Roemer commented Sep 7, 2021

This PR changes the behavior of the deletion of runs in a specified date range.
Previously, it used a very slow aggregate to find the instances for the runs that should be deleted and tried to delete them all at once. This timeouted at various ends in larger setups.
With this change, it will now process the deletion in batches:

  1. Get a batch of runIds
  2. Delete all associated instances for those runIds
  3. Delete the runs for those runIds
  4. Repeat until no more runs are found for the defined date-range

To make it even faster, a new index for runId is added to the instance collection.

Copy link
Collaborator

@agoldis agoldis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job @Roemer! Thanks a lot for fixing that code I am happy to see that people use Sorry Cypress to run as many test that we are facing performance issues 😬

Just a few minor comments:

  • batched iteration on results is great and simple! I think the same result could be achieved with native mongoDB cursor, but your implementation is totally find!
  • if we are really looking to optimize it, we can limit the fields we get from getRunsInDateRange by explicitly fetching only runIds. If I am not mistaken, mongoDB won't even read data from the disk and send the results back straight from the runId index on runs collection.

Comment on lines 139 to 146
async deleteRunsInDateRange(
startDate: Date,
endDate: Date,
limit: number = 0
) {
const getResult = await this.getRunsInDateRange(startDate, endDate, limit);
return this.deleteRunsByIds(getResult.runIds);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is not used, please delete


const runIds = response.map((x) => x.run[0].runId) as string[];
return await this.deleteInstancesByRunIds(runIds);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻

break;
}
// Delete the runs
const runsDeleteResponse = await resolvers.Mutation.deleteRuns(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice reuse of mutation 👍🏻

@agoldis
Copy link
Collaborator

agoldis commented Sep 8, 2021

P.S. I've merged #427 into master and updated this branch, don't be scared

@Roemer
Copy link
Contributor Author

Roemer commented Sep 8, 2021

I rebased it back to master instead of merged and removed the unneeded method and added a projection when getting the runs.
Good point about the cursor. I had a look at it and it would not really fit well into the current architecture. We would need to add the cursor to datasources/run.ts and actually return it so that we can handle the deletion of the instances, runs and runtimeouts in the mutation itself. But then we would have a datasource related object (the cursor) in the mutation itself which seems wrong. Or we would pass a function to the run datasource which would handle that all, but that also seems not so nice. So I think for a first version, we can go with this.
Testing is still pending on my side. I will probably need to do that tomorrow.

@Roemer
Copy link
Contributor Author

Roemer commented Sep 9, 2021

I now tested it and I could delete 2 months worth of data in a few seconds ;) So from my side, this PR can be merged.

@agoldis agoldis merged commit c700d89 into sorry-cypress:master Sep 9, 2021
@Roemer Roemer deleted the feature/CleanupInBatches branch September 14, 2021 11:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants