Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Periodic database cleaning #144

Merged
merged 1 commit into from
Dec 7, 2023
Merged

Periodic database cleaning #144

merged 1 commit into from
Dec 7, 2023

Conversation

olehkorkh-planeks
Copy link
Collaborator

@olehkorkh-planeks olehkorkh-planeks commented Dec 4, 2023

Fix for #82

@antoinefalisse
Copy link
Collaborator

@suhlrich and @AlbertoCasasOrtiz, tagging you both on this one (I don't want us to start deleting data). Can you both review? We can then do some testing on the dev side.

@antoinefalisse antoinefalisse merged commit 8a6150e into dev Dec 7, 2023
@antoinefalisse
Copy link
Collaborator

@olehkorkh-planeks two remarks regarding this PR:

  1. This line selects 500 sessions, the problem is that if, after some cleaning maybe, the 500 first selected sessions are good then there will be no more cleaning even if there are still bad sessions in there. That's why I had to bump to 500 to clean the entire dev db. Is there a way to adjust the task to make sure we go through the entire db? The prod db has over 40k sessions. I can see that it is not practical to filter the entire db every time, maybe we can find a way to filter it once and then we only select part of it (as is now). Let me know your thoughts.

  2. We need to use a more conservative criterion here. We noticed that we have sessions with data we want to keep but with no neutral trial. FYI it is possible to directly navigate to step5 and record data even if there is no neutral. Let's use the following:

    • delete session if empty (no trials) AND older than 7 days
    • delete session if all the existing trials are named calibration AND all the trials have status error AND older than 7 days

FYI, we also noticed that we might not have a proper backup of the DB and S3 in place, so we might want to implement that before merging this into main. Regardless, it would be good to address the two points above (then maybe we comment for now). Thanks.

FYI @suhlrich and @AlbertoCasasOrtiz

@olehkorkh-planeks
Copy link
Collaborator Author

@antoinefalisse I prepared a fixed implementation. Let me know if it works for you - #144

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants