Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace sqlalchemy-utilities with local implementation #11696

Merged
merged 12 commits into from
Mar 24, 2021

Conversation

jdavcs
Copy link
Member

@jdavcs jdavcs commented Mar 22, 2021

What did you do?

Replaced dependency on sqlalchemy_utils with a local implementation of the functionality we actually need.

Why did you make this change?

The main goal of this is to get this dependency out of the way and move on with migrating Galaxy to SA 1.4. It is designed as a temporary or permanent replacement for sqlalchemy-utils for reasons outlined in #11459. This may be replaced with sqlalchemy-utils once that library supports SA 1.4 (effort is underway).

To do: (lower priority; after this is merged)

  1. Postgresql and MySQL tests have been run locally (they passed); however they will be skipped within the unit test workflow, since it doens't include postgres and mysql services. Since these (and unit/data/test_galaxy_mapping) are the only unit tests that require a database, I'm not sure what's the best way to handle this: add the serivces or setup a separate workflow.

  2. Handling of views can be improved: we have setup a DIY mechanism for installing them in the database; however, SQLAlchemy provides a simple way to do it via after_create/before_drop hooks: we would just need to use event.listen(metadata, 'after_create', [our DDLElement]). This is the approach taken by sqlalchemy-utils; however, we don't use it and pass an empty metadata object instead (see comment). I did not modify this as this to keep the PR focused on one thing.

Implementation Notes

  1. I considered using mocks for tests (sqlalchemy-utils does that), but decided that for testing the creation of a database or a database object such as a view it makes sense to use a real database for increased test fidelity (because this is outside the context of what SA already does and what's covered by SA's internal tests). So, technically, this makes them integration tests. However, in the context of Galaxy's testing frameworks integration tests run Galaxy in different configurations, which is not what these tests are. So I left them under unit tests (they are also fast, so that's fitting).

  2. I've dropped the things we don't use (creating materialized views, cascade on view drop, using after_create/before_drop hooks). Any of this can be added when needed. Until then, yagni.

  3. I've changed the singature for create_database and database_exists to 2 parameters: the connection URL and an optional database to manipulate. This addresses the issue of some installations not having access to the default postgres database and is similar to the solution proposed here.

  4. This code is not SA 1.4-compatible. It was first implemented as such, but other parts of Galaxy broke under SA 1.4, so this was downgraded to 1.3. It will be migrated to run under 1.4 with the rest of Galaxy.

How to test the changes?

Manually:

  • Your database_connection should be unset. Delete local sqlite database, run Galaxy; database should be created.
  • Your database_connection should be using postgresql; point it to a nonexistant database
  • To run the new unit tests:
    GALAXY_TEST_CONNECT_POSTGRES_URI='postgresql://postgres@localhost:5432/postgres' pytest test/unit/model/
  • Repeat the postgresql steps for mysql:
    GALAXY_TEST_CONNECT_MYSQL_URI='mysql+mysqldb://root@localhost/mysql' pytest test/unit/model/
    (you'll need to python -m pip install mysqlclient; you'll also need a running instance of mysql)

@mvdbeek
Copy link
Member

mvdbeek commented Mar 22, 2021

Can you also verify this creates the tables if the database exists but is empty ?

@jdavcs
Copy link
Member Author

jdavcs commented Mar 22, 2021

Can you also verify this creates the tables if the database exists but is empty ?

This does not create tables - it only creates the database and checks for its existence. If the database exists, this code won't modify it - it's not supposed to (except for creating the view - but that happens after the database has been populated, and yes, I'll be adding a test for the view part). So the state of the database (postgres or sqlite) has no effect on how this executes.

Correct database initialization under various scenarios (does not exist, exists but is empty, exists but has no migration, etc.) will be tested here #11060.

Manual testing: running this branch on an empty database results in the database initialized - as expected.
Sorry if I misunderstood what you meant - please let me know if that's the case!

@mvdbeek
Copy link
Member

mvdbeek commented Mar 22, 2021

Can you also verify this creates the tables if the database exists but is empty ?

What I meant is that previously you could start with createdb newdb. If you start Galaxy with that connection string it will create all tables. That is still happening now ?

@jdavcs
Copy link
Member Author

jdavcs commented Mar 22, 2021

Of course! (Just verified as sanity check).

Copy link
Member

@nsoranzo nsoranzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you are aware, I've been working hard on adding 1.4 support to SQLAlchemy-Utils on my branch https://github.com/nsoranzo/sqlalchemy-utils/tree/sqlalchemy14 , so I was expecting we will just use that in requirements.txt , but if you prefer removing the dependency I am not -1 on that.
I'd still prefer to not reinvent the wheel and copy all relevant parts from sqlalchemy_utils/functions/database.py as in my fork, see other comments for a couple of reasons.

lib/galaxy/model/database_utils.py Show resolved Hide resolved
lib/galaxy/model/database_utils.py Show resolved Hide resolved
@jdavcs
Copy link
Member Author

jdavcs commented Mar 22, 2021

@nsoranzo I'll be joining you on that effort (I've looked through the updates on your branch this weekend) - I just had to get this out of the way before iimproving sqlalchemy-utils. I'm very sorry, I didn't know you intended this to be used as a patched fork for galaxy - I didn't mean to override your work! I do apologize if it appeared so!
I do have a preference towards using our own implementation for this functionality versus relying on a dependency. I've outlined some of the reasoning in #11459. I'll prepare a more coherent, better structured argument when we may need to choose. The core reasoning is that we use a tiny fraction of the functionality provided by sqlalchemy-utils, and we don't need it to be compatible across mutliple databases, drivers, and versions of SQLAlchemy. I think this is faster, simpler, more flexible, and more sustainable over time.

@jdavcs jdavcs marked this pull request as ready for review March 24, 2021 07:26
@github-actions github-actions bot added this to the 21.05 milestone Mar 24, 2021
@jdavcs jdavcs merged commit bb9547a into galaxyproject:dev Mar 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants