Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Transaction object #1276

Merged

Conversation

josenavas
Copy link
Contributor

This PR is the first of a series of PR to be able to modify Qiita DB to make use of transactions everywhere.

As discussed in the last meeting, instead of modifying the current queue system it will be better to add a Transaction object that encapsulates the transaction.

A few changes from the original design:

  • The placeholders, instead of being of the form {#:#} -> {query_idx:result_idx} are of the form {#:#:#} -> {query_idx:row_id:value_idx}. This allow for more flexibility and it is easier to reference an actual result. It also removes the need of flattening the results.
  • The execute method has a boolean parameter commit, optional and default True. This allows to execute all the current queries in the transaction without committing it, which is useful if you need an intermediate result of the transaction to execute some python code (e.g. instantiate an object to retrieve some other information needed for adding more queries to the DB)
  • The execute methods returns always all the results. You can retrieve then a specific result using the index, or using [-1] to retrieve the result of the last query. This last part was impossible before unless you knew how many results where returned by the last query.

The tests are expected to fail, since the rest of the code has not been modified. Those will be further PR.

Parameters
----------
sql : str
The sql query
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor sql -> SQL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@josenavas
Copy link
Contributor Author

Thanks @ElDeveloper ! Comments addressed

@ElDeveloper
Copy link
Member

Awesome, thanks! 👍

On (Jun-19-15|14:23), josenavas wrote:

Thanks @ElDeveloper ! Comments addressed


Reply to this email directly or view it on GitHub:
#1276 (comment)

self.assertEqual(obs._index, 0)
self.assertTrue(isinstance(obs._conn_handler, SQLConnectionHandler))

def test_replace_placeholders(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a test here for when there are a few inserts or updates followed by another select, so for example res5a would be next in the results list, as query 5.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically we need a test for when you have a select, followed by inserts that use the select, then anther select, then inserts that use the second select.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@josenavas
Copy link
Contributor Author

@squirrelo, exactly, but sometimes to generate those files you need the data in the DB. Transactions allow you to execute some queries, try to generate the files, and then commit them. There is no other way of doing this consistency that executing one (or many) SQL queries w/o committing and then commit after everything is in place...

@squirrelo
Copy link
Contributor

Is there a way of querying inside the transaction cursor, as opposed to the database itself? That way you are querying as if the items are already committed, but they actually aren't. I know sqlalchemy can do this.

Alternately, if you need the query information, you can make the query info list available through a function, e.g. get_result(index) so you can use the SQL results without having to commit.

@josenavas
Copy link
Contributor Author

@squirrelo What you're proposing is just executing the query and not commit it. You're saying this:

with connect(params) as con:
    with con.get_cursor() as cur:
        try:
            cur.execute("DO STUFF")
        except:
            con.rollback()
     if checks_pass:
         con.commit()
     else:
         con.rollback()

This the exact same thing that I'm proposing with the commit=False parameter, but the transaction object allows you to do this like this:

with Transaction('t') as t:
    t.add("DO STUFF")
    t.execute(commit=False)
    if not check_pass:
        t.rollback()

Don't let the terminology confuse you, you don't do anything "inside" a cursor, you do it directly to the DB, the difference is that you commit it or not.

@squirrelo
Copy link
Contributor

Actually what I'm proposing is more like:

with Transaction('t') as t:
    t.add("DO STUFF")
    t.add("QUERY NEEDED STUFF")
    needed_info = t.get_results(t.index-1)
    # use needed info to make files

@josenavas
Copy link
Contributor Author

@squirrelo

And how can I return the result if I don't execute the query w/o committing?
Your proposal, will be doing this internally:

def get_results(index):
    res = self.execute(commit=False)
    return res[-1]

@squirrelo
Copy link
Contributor

I've found a thing that may fix all our problems: Tornado sessions object This locks a transaction thread until explicitly freed. If this works the way I think it does, we really should be using this as our backend instead of a straight connection like we do now.

@josenavas
Copy link
Contributor Author

You're proposing 2 things:

  1. Removing psycopg2 as a dependency
  2. Coupling qiita_db with tornado.

I don't agree with this, we have already seen that is a bad idea coupling qiita_pet and qiita_db...

@josenavas
Copy link
Contributor Author

Also, I'm not seeing anything about commit/rollback on that documentation... I don't even know how this would work...

@squirrelo
Copy link
Contributor

That's fair. ISOLATION_LEVEL_SERIALIZABLE might be the best in this case then, which if I'm reading correctly would make the cursor essentially stall the database while running the execution. This is the only way I can see not having the issue of running execute(commit=False) and not worrying about the database changing before the actual database execute with commit.

@josenavas
Copy link
Contributor Author

No, that would not solve the issue.

Let me resume the issue that we have in one sentence:
We are doing commits that we shouldn't do

That's the only problem that we're having, I don't know how else I can explain this.

@squirrelo
Copy link
Contributor

I get that, but my worry is something else, like another user, starts a transaction and adds something we needed between the commit=false and the actual commit. If this is not a worry and I'm just being paranoid, then OK, but just want to make sure.

@josenavas
Copy link
Contributor Author

The question is, how another user is going to add something that we need? If we need it, we shouldn't be relying in another user to add it.

Also, with commit = False, we are locking the tables, so no other user can modify those tables, which is the idea of SQL transaction.

@squirrelo
Copy link
Contributor

OK, cool. I'm paranoid then. Carry on.

@ElDeveloper
Copy link
Member

We can make clear in the documentation that it will commit automatically, ...

Seems like a fine solution to me!

@josenavas
Copy link
Contributor Author

@ElDeveloper @squirrelo This is ready for another review.

elif self._queries:
# There are still queries to be executed, execute them
# It is safe to use the execute method here, as internally is
# wrapped in a tr/except and rollbacks in case of failure
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tr -> try

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@ElDeveloper
Copy link
Member

BTW, 👍.

def __exit__(self, exc_type, exc_value, traceback):
# We need to wrap the entire function in a try/finally because
# at the end of the function we need to set _is_inside_context to false
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This try/finally can be removed by just moving self._is_inside_context = False tothe first line of this function. Regardless of what happens after, you are always going to be outside the context when exit is called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, I can't. If I move the self._is_inside_context = False to the beginning of the function, any call to the Transaction methods (rollback, execute and commit) done in this method will raise a RuntimeError

@josenavas
Copy link
Contributor Author

Ready for another review!

squirrelo added a commit that referenced this pull request Jun 24, 2015
@squirrelo squirrelo merged commit 5f1c1b1 into qiita-spots:improve-sql-queues Jun 24, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants