-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
URGENT: Fix util #1324
URGENT: Fix util #1324
Conversation
move_files=True): | ||
r"""Inserts `filepaths` in the database. | ||
|
||
Since the files live outside the database, the directory in which the files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mvoes -> moves
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Minor comments. Now, what about extending the idea of transactions so we only execute code once we know the transaction is done. For example, if something fails and you are using purge_filepaths do not move them until the full transaction is done. This could also be helpful for patches. Also, the codes looks fine but I think the biggest worry is not about the changes you made but about the possible missing lines that needed changing. Is there a way to verify that things that calls to any function that has SQL are actually being run in transaction? |
Thanks @antgonza. Once everything is changed, a quick search in the code base for SQLConnectionHandler will tell us if there is anything executed outside a transaction. I agree with you re executing some code once the transaction is done (e.g. moving files). This is a tricky question and it really depends on the operation performed. For example, in purge_filepaths, if the transaction successfully completes and there is a problem removing files, it is not a big deal, as the only "bad" thing is that the file system is dirty (which is easy to write a function for cleaning it up in a cron job). However, in move_filepaths_to_uploads_folder, the function is usually executed when unlinking files from raw data. If there is a problem moving files but the transaction is committed, those files are "lost" from the user point of view: they live under the raw data folder, but we've lost the filename and the user cannot access them. In this case, you want to rollback the operation. The only idea that comes to my mind is this: What do you think @antgonza, is this a good solution? |
# alrady exists on the DB | ||
db_path = partial(join, base_fp) | ||
new_filepaths = [ | ||
(db_path("%s_%s" % (obj_id, basename(path))), id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually avoid id
as a var name, but I see this was like this in the original version of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, done.
Looks good @josenavas, just a few tiny comments and a question. |
Thanks @ElDeveloper and @antgonza I've added some functionality to the Transaction object so we can make sure that functions like purge_filepaths do not break the DB. |
self._post_commit_funcs = [] | ||
self._post_rollback_funcs = [] | ||
if error_msg: | ||
raise RuntimeError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find a test that checks for a case where 💩 hits the fan, can you add one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
Thanks @ElDeveloper |
@@ -1012,6 +1014,22 @@ def execute_fetchindex(self, idx=-1): | |||
""" | |||
return self.execute()[idx] | |||
|
|||
def _funcs_executor(self, cmds, func_str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think you still missed changing cmds here and in the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. I think I've fixed them all now...
The |
Those calls are inside the commit and rollback functions |
Ah, ok. I get the centralization logic here. |
error_msg = [] | ||
for cmd in cmds: | ||
try: | ||
cmd() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will these function calls ever need parameters passed? If so, we should have a way to do this (kwargs or whatever is easiest).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please read the documentation. There are ways or making this viable always, check the purge_filepath function.
I think this looks ready ... 👍 |
Ready for another review. Given that @squirrelo felt strong about it, I end up fixing #1325. However, I did not implment @squirrelo proposal. His proposal was hackish and not standard, so I've actually implemented a standard way of doing this. |
👍 |
This PR fixes the util.py file to use transactions.
Since some function signatures changed, I had to fix other parts of the code to make the tests to pass.
Any future PR will be based on this one, hence that is why it is urgent (the other ones will be in parallel, but they'll rely on the changes in util.py).
Some comments:
purge_filepaths
is tricky. It is not used anywhere but it has the potential of leaving the DB out of sync with the FS if it is executed inside a bigger transaction. I haven't put any check on it, but one option will be to make it use it's own transaction. Since each Transaction object has it's own connection, this will not be an issue. However, I'd like to hear what others think.get_preprocessed_params_tables
was missing tests and it was incorrect.find_repeated
was not used anywhere, I removed it as we can use skbio'sfind_duplicates