Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URGENT: Fix util #1324

Merged
merged 14 commits into from
Jul 4, 2015
Merged

Conversation

josenavas
Copy link
Contributor

This PR fixes the util.py file to use transactions.
Since some function signatures changed, I had to fix other parts of the code to make the tests to pass.
Any future PR will be based on this one, hence that is why it is urgent (the other ones will be in parallel, but they'll rely on the changes in util.py).

Some comments:

  • purge_filepaths is tricky. It is not used anywhere but it has the potential of leaving the DB out of sync with the FS if it is executed inside a bigger transaction. I haven't put any check on it, but one option will be to make it use it's own transaction. Since each Transaction object has it's own connection, this will not be an issue. However, I'd like to hear what others think.
  • get_preprocessed_params_tables was missing tests and it was incorrect.
  • find_repeated was not used anywhere, I removed it as we can use skbio's find_duplicates

@josenavas josenavas added this to the Alpha 0.2 milestone Jul 3, 2015
@josenavas josenavas mentioned this pull request Jul 3, 2015
28 tasks
move_files=True):
r"""Inserts `filepaths` in the database.

Since the files live outside the database, the directory in which the files
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mvoes -> moves

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@antgonza
Copy link
Member

antgonza commented Jul 3, 2015

Minor comments. Now, what about extending the idea of transactions so we only execute code once we know the transaction is done. For example, if something fails and you are using purge_filepaths do not move them until the full transaction is done. This could also be helpful for patches. Also, the codes looks fine but I think the biggest worry is not about the changes you made but about the possible missing lines that needed changing. Is there a way to verify that things that calls to any function that has SQL are actually being run in transaction?

@josenavas
Copy link
Contributor Author

Thanks @antgonza. Once everything is changed, a quick search in the code base for SQLConnectionHandler will tell us if there is anything executed outside a transaction.

I agree with you re executing some code once the transaction is done (e.g. moving files). This is a tricky question and it really depends on the operation performed.

For example, in purge_filepaths, if the transaction successfully completes and there is a problem removing files, it is not a big deal, as the only "bad" thing is that the file system is dirty (which is easy to write a function for cleaning it up in a cron job).

However, in move_filepaths_to_uploads_folder, the function is usually executed when unlinking files from raw data. If there is a problem moving files but the transaction is committed, those files are "lost" from the user point of view: they live under the raw data folder, but we've lost the filename and the user cannot access them. In this case, you want to rollback the operation.

The only idea that comes to my mind is this:
We add a list of tuples to the Transaction object, where each tuple is of the form [(src_path, dest_path)]. During the code we move the files w/o a problem, but we add them to this list. If the transaction at some point performs a rollback, it is in charge of moving back those files, so the DB and the filesystem are in sync.
Also, we add 2 list of strings, which will be "paths to remove on commit" and "paths to remove on rollback". In the former, the transaction will remove those files if it commits (e.g. for purge_filepaths) and in the later it will remove those files in rollback (e.g. creating a qiime_mapping_file).This way, we can make sure that all the filepaths that are in the DB are present in the filesystem, while we are allowing the filesystem to have files that they might not be in the DB (which I think is not a big deal).

What do you think @antgonza, is this a good solution?

# alrady exists on the DB
db_path = partial(join, base_fp)
new_filepaths = [
(db_path("%s_%s" % (obj_id, basename(path))), id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually avoid id as a var name, but I see this was like this in the original version of the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, done.

@ElDeveloper
Copy link
Member

Looks good @josenavas, just a few tiny comments and a question.

@josenavas
Copy link
Contributor Author

Thanks @ElDeveloper and @antgonza

I've added some functionality to the Transaction object so we can make sure that functions like purge_filepaths do not break the DB.

self._post_commit_funcs = []
self._post_rollback_funcs = []
if error_msg:
raise RuntimeError(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find a test that checks for a case where 💩 hits the fan, can you add one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

@josenavas
Copy link
Contributor Author

Thanks @ElDeveloper

@@ -1012,6 +1014,22 @@ def execute_fetchindex(self, idx=-1):
"""
return self.execute()[idx]

def _funcs_executor(self, cmds, func_str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think you still missed changing cmds here and in the tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. I think I've fixed them all now...

@squirrelo
Copy link
Contributor

The _clean_up function seems to be missing the func calls for post-rollback and post-commit (line 750).

@josenavas
Copy link
Contributor Author

Those calls are inside the commit and rollback functions

@squirrelo
Copy link
Contributor

Ah, ok. I get the centralization logic here.

error_msg = []
for cmd in cmds:
try:
cmd()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will these function calls ever need parameters passed? If so, we should have a way to do this (kwargs or whatever is easiest).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please read the documentation. There are ways or making this viable always, check the purge_filepath function.

@antgonza
Copy link
Member

antgonza commented Jul 4, 2015

I think this looks ready ... 👍

@josenavas
Copy link
Contributor Author

Ready for another review.
I've created the function exectue_fetchflatten so we can centralize in a single function the loop that we were doing in multiple places.

Given that @squirrelo felt strong about it, I end up fixing #1325. However, I did not implment @squirrelo proposal. His proposal was hackish and not standard, so I've actually implemented a standard way of doing this.

@ElDeveloper
Copy link
Member

👍

ElDeveloper added a commit that referenced this pull request Jul 4, 2015
@ElDeveloper ElDeveloper merged commit e4764b2 into qiita-spots:transaction Jul 4, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants