Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQLAlchemy: Fix session scope / management #980

Merged
merged 18 commits into from Mar 14, 2019
Merged

Conversation

@dbczumar
Copy link
Collaborator

dbczumar commented Mar 11, 2019

The SQLAlchemyStore currently uses a single session per class instance (except for the create_experiment method). This session is never explicitly closed, and it's only committed after write operations. Because sessions represent long-lived database connections in certain framework-specific implementations like MySQL, failing to commit / roll back / close sessions can result in leaked connections and produce hanging behavior when sessions are left in intermediate states.

In contrast to our implementation, the SQLAlchemy session basics documentation recommends (requires?) that a session be committed or rolled back after any database interaction (read, write, update, etc); note that this does not mean that a commit or roll back must happen immediately - it just needs to occur eventually. This documentation also recommends that all sessions be closed via session.close() when they are no longer needed.

It is also recommended that sessions be constructed independently of the functions that use them to read or modify data. Finally, sessions should be exception-safe; unexpected errors in execution should never prevent a session from being committed / rolled back and closed. To achieve this safety, SQLALchemy suggests an approach that uses the context manager:

### another way (but again *not the only way*) to do it ###

from contextlib import contextmanager

@contextmanager
def session_scope():
    """Provide a transactional scope around a series of operations."""
    session = Session()
    try:
        yield session
        session.commit()
    except:
        session.rollback()
        raise
    finally:
        session.close()


def run_my_program():
    with session_scope() as session:
        ThingOne().go(session)
        ThingTwo().go(session)

Accordingly, this PR modifies the management of sessions in the SQLAlchemyStore to conform to these guidelines.

@dbczumar dbczumar requested a review from mparkhe Mar 11, 2019

def get_metric_history(self, run_uuid, metric_key):
metrics = self.session.query(SqlMetric).filter_by(run_uuid=run_uuid, key=metric_key).all()
return [metric.to_mlflow_entity() for metric in metrics]
with self.ManagedSessionMaker() as session:

This comment has been minimized.

Copy link
@dbczumar

dbczumar Mar 11, 2019

Author Collaborator

Technically, this method does not strictly adhere to the principle that session construction should happen independently of the function that uses the session to communicate with the database (the ManagedSessionMaker is used to create a session in the same method that later performs a DB lookup using the session).

We can define an internal _get_metric_history() method to achieve this separation, or we can recognize that the separation is a recommended SQLAlchemy pattern for avoiding errors that we may not always follow if we're confident that a certain function is implemented correctly (as I believe is the case here).

What are your thoughts, @mparkhe?

This comment has been minimized.

Copy link
@mparkhe

mparkhe Mar 12, 2019

Contributor

IMO, this is fine. I am not sure if this principle "session construction should happen independently of the function that uses the session to communicate with the database" makes too much sense from the design perspective. I suppose most of the other functions fit the spec that you might have multiple APIs sharing functions so create the session in the public API for SqlAlchemyStore since that is the first entry point.

Ideally, in fact, I might be tempted to have principle of creating a session closets to DB access. However, that might be a bit of an overuse if you have a single call stack needing to make a couple of DB calls to respond to an API request. Also they way you have entrypoint APIs creating sessions is awesome in case we want to share sessions etc. in future. If the same function can make DB calls, so be it.

This comment has been minimized.

Copy link
@dbczumar

dbczumar Mar 13, 2019

Author Collaborator

Makes sense to me!

Copy link
Contributor

mparkhe left a comment

Couple of concerns about sessions and exceptions log_metrics and log_params.


def _get_or_create(self, model, **kwargs):
instance = self.session.query(model).filter_by(**kwargs).first()
def _get_or_create(self, model, session, **kwargs):

This comment has been minimized.

Copy link
@mparkhe

mparkhe Mar 12, 2019

Contributor

super nit

To be conformant to other queries that need session, recommend putting session as first param after self.

This comment has been minimized.

Copy link
@dbczumar

dbczumar Mar 13, 2019

Author Collaborator

Done! I've moved session to be the first argument for all functions that require it.


self._save_to_db([run])
self._save_to_db(objs=run, session=session)

This comment has been minimized.

Copy link
@mparkhe

mparkhe Mar 12, 2019

Contributor

ooh! nice catch--micro optimization there.

This comment has been minimized.

Copy link
@dbczumar

dbczumar Mar 13, 2019

Author Collaborator

Thanks!

# ToDo: Consider prior checks for null, type, metric name validations, ... etc.
self._get_or_create(model=SqlMetric, run_uuid=run_uuid, key=metric.key,
value=metric.value, timestamp=metric.timestamp, session=session)
except sqlalchemy.exc.IntegrityError as ie:

This comment has been minimized.

Copy link
@mparkhe

mparkhe Mar 12, 2019

Contributor

Is this logic gonna break down? If you share the session here -- _get_or_create will throw and then for the remainder of the function you will use the same session. If I am not mistaken, run.metrics looks like an object access but is a lazy SQL lookup due to the way data model is set up as a relationship between "runs" and "metrics" tables. After an error, this SQL query is invalid because SQLAlchemy requires that you rollback a session before performing new operations. I suspect, you might have to use a new session.

Also here -- the logic might conflate with multiple recorded metrics with different values for same timestamp.

This comment has been minimized.

Copy link
@dbczumar

dbczumar Mar 13, 2019

Author Collaborator

Discussed offline - this logic would indeed break down. Solved the problem by adding an explicit commit() statement before the except statement and an explicit rollback() after the except statement with substantial explanatory comments.

# ToDo: Consider prior checks for null, type, param name validations, ... etc.
self._get_or_create(model=SqlParam, session=session, run_uuid=run_uuid,
key=param.key, value=param.value)
except sqlalchemy.exc.IntegrityError as ie:

This comment has been minimized.

Copy link
@mparkhe

mparkhe Mar 12, 2019

Contributor

Same here -- does it make sense to use a new session?

This comment has been minimized.

Copy link
@dbczumar

dbczumar Mar 13, 2019

Author Collaborator

Discussed offline - it seems to make sense to use a single session and avoid attempting to transact in an invalid state by adding an explicit commit() statement before the except statement and an explicit rollback() after the except statement. I've made these changes and added substantial explanatory comments, as above.

@@ -43,7 +41,7 @@ def _experiment_factory(self, names):

return self.store.create_experiment(name=names)

def test_default_experiment(self):
def test_default_experiment_7(self):

This comment has been minimized.

Copy link
@mparkhe

mparkhe Mar 12, 2019

Contributor

_7?? typo?

This comment has been minimized.

Copy link
@dbczumar

dbczumar Mar 13, 2019

Author Collaborator

Fixed!

Copy link
Collaborator Author

dbczumar left a comment

@mparkhe I've addressed your comments and updated the tests. They should now pass! Please review them.


def _get_or_create(self, model, **kwargs):
instance = self.session.query(model).filter_by(**kwargs).first()
def _get_or_create(self, model, session, **kwargs):

This comment has been minimized.

Copy link
@dbczumar

dbczumar Mar 13, 2019

Author Collaborator

Done! I've moved session to be the first argument for all functions that require it.


self._save_to_db([run])
self._save_to_db(objs=run, session=session)

This comment has been minimized.

Copy link
@dbczumar

dbczumar Mar 13, 2019

Author Collaborator

Thanks!

# ToDo: Consider prior checks for null, type, metric name validations, ... etc.
self._get_or_create(model=SqlMetric, run_uuid=run_uuid, key=metric.key,
value=metric.value, timestamp=metric.timestamp, session=session)
except sqlalchemy.exc.IntegrityError as ie:

This comment has been minimized.

Copy link
@dbczumar

dbczumar Mar 13, 2019

Author Collaborator

Discussed offline - this logic would indeed break down. Solved the problem by adding an explicit commit() statement before the except statement and an explicit rollback() after the except statement with substantial explanatory comments.


def get_metric_history(self, run_uuid, metric_key):
metrics = self.session.query(SqlMetric).filter_by(run_uuid=run_uuid, key=metric_key).all()
return [metric.to_mlflow_entity() for metric in metrics]
with self.ManagedSessionMaker() as session:

This comment has been minimized.

Copy link
@dbczumar

dbczumar Mar 13, 2019

Author Collaborator

Makes sense to me!

# ToDo: Consider prior checks for null, type, param name validations, ... etc.
self._get_or_create(model=SqlParam, session=session, run_uuid=run_uuid,
key=param.key, value=param.value)
except sqlalchemy.exc.IntegrityError as ie:

This comment has been minimized.

Copy link
@dbczumar

dbczumar Mar 13, 2019

Author Collaborator

Discussed offline - it seems to make sense to use a single session and avoid attempting to transact in an invalid state by adding an explicit commit() statement before the except statement and an explicit rollback() after the except statement. I've made these changes and added substantial explanatory comments, as above.

@@ -43,7 +41,7 @@ def _experiment_factory(self, names):

return self.store.create_experiment(name=names)

def test_default_experiment(self):
def test_default_experiment_7(self):

This comment has been minimized.

Copy link
@dbczumar

dbczumar Mar 13, 2019

Author Collaborator

Fixed!


return experiments[0]

This comment has been minimized.

Copy link
@dbczumar

dbczumar Mar 13, 2019

Author Collaborator

@mparkhe Looks like the public get_experiment_by_name() function was previously returning a raw SqlExperiment type. The test_get_experiment method caught this because the returned SqlExperiment lost its session scope after get_experiment_by_name() returned. This provides some validation for our reasoning that scoped sessions help avoid erroneous behavior :D!

This comment has been minimized.

Copy link
@mparkhe

mparkhe Mar 13, 2019

Contributor

Whoa!!


actual = self.session.query(models.SqlMetric).filter_by(key=tkey, value=tval)

self.assertIsNotNone(actual)

This comment has been minimized.

Copy link
@dbczumar

dbczumar Mar 13, 2019

Author Collaborator

This statement, which was the only reference to actual is vacuously true, regardless of whether or not the contents of the database match the expected values, because actual is an unexecuted query of type sqlalchemy.orm.query.Query.

Similar logic existed in several test cases. I've removed it because these cases include logic that validates the contents of the store using its public APIs, which seems to be a more appropriate mechanism of verifying that CRUD operations produce expected results.

This comment has been minimized.

Copy link
@mparkhe

mparkhe Mar 13, 2019

Contributor

nice catch!

Copy link
Contributor

mparkhe left a comment

LGTM. Great comment around session.commit() and session.rollback() in those critical functions to explain what is going on!


return experiments[0]

This comment has been minimized.

Copy link
@mparkhe

mparkhe Mar 13, 2019

Contributor

Whoa!!


actual = self.session.query(models.SqlMetric).filter_by(key=tkey, value=tval)

self.assertIsNotNone(actual)

This comment has been minimized.

Copy link
@mparkhe

mparkhe Mar 13, 2019

Contributor

nice catch!

@dbczumar dbczumar merged commit a7272b7 into mlflow:master Mar 14, 2019
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
eedeleon added a commit to eedeleon/mlflow that referenced this pull request Mar 20, 2019
* Initial fix implementation

* Test progress

* Update tests

* URI cleanup

* Docs

* Remove instances of 'self\.session'

* More test fixes

* Explanatory comments

* Test fixes

* Move session to first arg

* More test fixes, changes to exception handling

* More test fixes and exception handling

* Wrap db exceptions as MLflow internal errors

* Lint

* Revert file formatting

* Revert

* Remove unused import
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.