Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQLAlchemy 2.0 #17778

Merged
merged 76 commits into from
Apr 3, 2024
Merged

SQLAlchemy 2.0 #17778

merged 76 commits into from
Apr 3, 2024

Conversation

jdavcs
Copy link
Member

@jdavcs jdavcs commented Mar 18, 2024

Ref #12541.

This PR enables SQLAlchemy 2.0 without changing any behavior. It contains bug fixes, SA 2.0 fixes, typing (not mypy-specific) error fixes, mypy error fixes, and relevant refactoring. Mypy error fixes are edits that help mypy or type-ignores that are (a) false positives, (b) require DatasetInstance models to be mapped declaratively (which will allow us to add appropriate type hints to their mapped attributes), or (c) require nontrivial refactoring or analysis that go beyond the scope of this PR, which is narrowly focused on enabling SA 2.0. To the best of my knowledge, none of the type-ignores silence mypy errors caused by incorrect usage of SQLAlchemy-related code.

There's more SA2.0-specific typing and refactoring that can be applied to the model definition. However, I think it's best to leave it for follow-up PRs because (a) much is dependent on remapping the DatasetInstance and TS RepositoryMetadata models declaratively (for which we need to rename the metadata attribute in those classes, which is difficult), and (b) there's so much more new/better SA2.0 syntax, patterns, abstractions to consider and apply to our code base than could fit in one PR.

The commits have been cleaned-up. The names are self-explanatory; more details are provided in commit descriptions were needed.

The PR is very large - sorry about that! Please let me know if I can reorganize it in any way to simplify review.

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

@jdavcs jdavcs added kind/enhancement kind/refactoring cleanup or refactoring of existing code, no functional changes area/database Galaxy's database or data access layer highlight Included in user-facing release notes at the top highlight/dev Included in admin/dev release notes labels Mar 18, 2024
@jdavcs jdavcs added this to the 24.1 milestone Mar 18, 2024
@jdavcs jdavcs force-pushed the dev_sa20 branch 2 times, most recently from 1b2b7ad to ab299a3 Compare March 25, 2024 15:39
@jdavcs jdavcs requested a review from a team April 2, 2024 04:10
Copy link
Contributor

@davelopez davelopez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jdavcs for this titanic effort! 🎉

Most of the changes look good to me. Some of the bigger refactorings on complex queries escape my limited knowledge of SQLAlchemy and those particular models/tables, so someone else might have a closer look.

I wish we had better typing in place so we don't need that many type: ignore workarounds, but that seems like something we can try to iron out in follow-ups.

lib/galaxy/managers/export_tracker.py Show resolved Hide resolved
lib/galaxy/managers/notification.py Show resolved Hide resolved
lib/galaxy/model/__init__.py Outdated Show resolved Hide resolved
@jdavcs
Copy link
Member Author

jdavcs commented Apr 2, 2024

I wish we had better typing in place so we don't need that many type: ignore workarounds, but that seems like something we can try to iron out in follow-ups.

Yes - I completely agree! @jmchilton expressed the same concern last month in Austin. Adding type-ignores was my last resort - in each case I made sure I tried all other options to fix the type without making major changes to the code base. In some cases I simply didn't find a way to do that: it would either require to add proper types to the 2 DatasetInstance class definitions (which requires metadata to be renamed, which is nontrivial), or required making significant changes to the code. I wanted to keep this PR as narrowly focused as possible (for a manageable size and scope of changes), so I made changes other than enabling SA2.0 only when it was a bug or something trivial. I'll go over every single added type-ignore as a follow-up (or follow-ups).

jdavcs added 18 commits April 2, 2024 10:08
This conflicts with dependency requirements for sqlalchemy-graphene
(used only in toolshed, new WIP client)
This does not exist in SQLAlchemy 2.0
Remove unused import

For context: https://github.com/galaxyproject/galaxy/pull/14717/files#r1486979280

Also, remove model attr type hints that conflict with SA2.0
Included models: galaxy, tool shed, tool shed install
Column types:
DateTime
Integer
Boolan
Unicode
String (Text/TEXT/TrimmedString/VARCHAR)
UUID
Numeric

NOTE on typing of nullability: db schema != python app

- Mapped[datetime] specifies correct type for the python app;
- nullable=True specifies correct mapping to the db schema (that's what
  the CREATE TABLE sql statement will reflect).

mapped_column.nullable takes precedence over typing annotation of
Mapped. So, if we have:

foo: Mapped[str] = mapped_column(String, nullable=True)

- that means that the foo db field will allow NULL, but the python app
  will not allow foo = None. And vice-versa:

bar: Mapped[Optional[str]] = mapped_column(String, nullable=False)

- the bar db field is NOT NULL, but bar = None is OK.

This might need to be applied to other column definitions, but for now
this addresses specific mypy errors.

Ref: https://docs.sqlalchemy.org/en/20/orm/declarative_tables.html#mapped-column-derives-the-datatype-and-nullability-from-the-mapped-annotation
Columns:
MutableJSONType
JSONType
DoubleEncodedJsonType

TODO: I think we need a type alias for json-typed columns: bytes understand
iteration, but not access by key.
SA 1.4: str(url) renders connection string with password
SA 2.0: str(url) renders connection string WITHOUT password
Solution: Use render_as_string(hide_password=False)
Replaces attribute_mapped_collection (SA20)
jdavcs added 22 commits April 2, 2024 10:08
Need to map declaratively to remove this
1. In 2.0, when the statement contains "returning", the result type is
   ChunkedIteratorResult, which does not have the rowcount attr,
   becuase:
2. result.rowcount should not be used for statements containting the returning clause

Ref: https://docs.sqlalchemy.org/en/20/core/connections.html#sqlalchemy.engine.CursorResult.rowcount
Otherwise there's an idle transaction left in the database (+locks)
Same as prev. commit: otherwise db locks are left
This restores the behavior under SQLAlchemy 1.4
(Note that we set the pool for sqlite only if it's not an in-memory db
Copy link
Member

@jmchilton jmchilton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The linting and unit test errors seem legitimate - are they not? - I'd fix those before merging but otherwise I've reviewed this a few times and it looks good. I am sure there will be errors - but I don't know how we would find them without merging and getting this deployed.

@jdavcs
Copy link
Member Author

jdavcs commented Apr 2, 2024

The linting and unit test errors seem legitimate - are they not? - I'd fix those before merging but otherwise I've reviewed this a few times and it looks good. I am sure there will be errors - but I don't know how we would find them without merging and getting this deployed.

@jmchilton They were legit - surfaced after I rebased; they were introduced in 77ef4a3 . I've fixed them. They did not show up in dev because one was a warning under SA 1.4 and the other was only checked by mypy after I added the SA 2.0 types to the model definitions.

Thank you for reviewing!!

@jdavcs jdavcs merged commit af53d03 into galaxyproject:dev Apr 3, 2024
51 of 54 checks passed
@jdavcs jdavcs mentioned this pull request Apr 12, 2024
4 tasks
@galaxyproject-sentryintegration

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

  • ‼️ OperationalError: (psycopg2.OperationalError) SSL connection has been closed unexpectedly sqlalchemy.orm.session in get View Issue
  • ‼️ OperationalError: (psycopg2.OperationalError) SSL connection has been closed unexpectedly sqlalchemy.orm.session in get View Issue

Did you find this useful? React with a 👍 or 👎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/database Galaxy's database or data access layer highlight/dev Included in admin/dev release notes highlight Included in user-facing release notes at the top kind/enhancement kind/refactoring cleanup or refactoring of existing code, no functional changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants