Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Dependency conflict: fastapi-utils prevents upgrade to sqlachemy 2.0 #17185

Closed
jdavcs opened this issue Dec 13, 2023 · 10 comments
Closed

RFC: Dependency conflict: fastapi-utils prevents upgrade to sqlachemy 2.0 #17185

jdavcs opened this issue Dec 13, 2023 · 10 comments
Labels
dependencies Pull requests that update a dependency file

Comments

@jdavcs
Copy link
Member

jdavcs commented Dec 13, 2023

We need to upgrade to SQLAlchemy 2.0 (#12541). However, there is a dependency conflict: Galaxy uses fastapi-utils, which requires SQLAlchemy < 2 (sqlalchemy = "^1.4,<2.0").

The only functionality that we use from fastapi-utils is class-based views. This is a small but useful convenience feature which reduces duplication by organizing related route operation functions into classes (which can be done in fastapi) and defining common dependencies on that class (which cannot be done in fastapi) instead of specifying them for each route. Usually there are 1-2 common dependencies per class (service, manager, config, serializer, etc.), although usually it's only the manager or the service that are utilized in every route function.

It is unlikely that the fastapi-utls project will add support for SQLAlchemy 2.0 in the near future. There is an issue and a PR; however, the project appears to be inactive (see this discussion in a fork: yuval9313/FastApi-RESTful#227).

The class-based-views feature has a relatively straightforward implementation, but it's not a one-liner. Its test support is minimal.

I've double-checked FastAPI; there's nothing that can substitute class-based views, I think.

I see the following options:

  1. Simplest option. Switch to an existing fork: FastApi-RESTful. It has a minimal install which does not include SQLAlchemy, so it will give us only 2 additional dependencies: annotated-types and pydantic-core (fastapi-utils doesn't require anything that's not installed for fastapi).

  2. Painful option. Fork fastapi-utils and drop all but the feature(s) we need or might need. And maintain it.

  3. Dependency-free option. Reimplement the feature as a our own native module (crediting the author, of course). This is not ideal, since this would be a fastapi utility with no Galaxy-specific functionality, and as such it really doesn't belong in our core code base.

  4. Most lightweight/nuclear option. Forget about convenience and just specify the common dependencies for each route. There are not a lot, so that might be not that bad.

Unless there are other/better options, and given that cbv is a relatively minor improvement and has to do with code readability, but not performance or added functionality, I'm leaning towards option 4 as the most practical.

What do others think?

@jdavcs jdavcs added area/database Galaxy's database or data access layer dependencies Pull requests that update a dependency file and removed area/database Galaxy's database or data access layer labels Dec 13, 2023
@mvdbeek
Copy link
Member

mvdbeek commented Dec 14, 2023

Try the fork, I don't think the rest is really an option.

@jdavcs
Copy link
Member Author

jdavcs commented Dec 14, 2023

Unfortunately that doesn't work. The implementation of cbv in the fork is significantly different from the original (fastapi-utils). Switching to the fork will raise this error on startup: https://github.com/yuval9313/FastApi-RESTful/blob/master/fastapi_restful/cbv.py#L119. Here's why: the fork checks whether the routes belonging to the router are all unique (see line 118); however it uses only the router path and the http method to disambiguate between the routes. But there're many more route attributes, and among them - include_in_schema, which we use while adding almost identical routes (here). Furthermore, even if we were to convince the maintainers of the fork to alter their deduplication method by adding an extra attribute to the route "signature" (which I doubt), even then, down the road we might decide to add some other attribute to our own FrameworkRouter(APIRouter), which again would cause the duplicate route error to kick in. Finally, this is just one of the things the fork does differently; I have not checked the rest of the code - there may be other surprises.

In my opinion, given our very heavy use of FastAPI, we should stick to the core library. Furthermore, given that the cbv feature is really non-essential (whereas its implementation adds quite a bit of complexity and increases the chance of more bugs), I suggest we go with option 4: just specify the common dependencies for each route. It's clean, it's straightforward, and it's trivial to fix. To update a module with cbv, we simply add the dependency as an argument to the methods that actually use it, remove the class and convert its methods to functions (kill the self). That's all.
Here's what it would look like: https://github.com/galaxyproject/galaxy/compare/dev...jdavcs:galaxy:dev_no_cbv?expand=1

@jdavcs
Copy link
Member Author

jdavcs commented Dec 17, 2023

As per our discussion on the backend channel, I've looked at how class-based views are implemented. I've found that there's no performance gain by using class-based views: the number of calls to foo= depends(Foo) is not reduced. In fact, the number of calls increases with class-based views. (I tested one test function in the test_tours module: 13 calls to depends() for class-based views, 10 calls for pure functions). If we look at the implementation, we'll see that the cbv decorator modifies the __init__ method of the class it's decorating; it collects the class attributes that have type annotations and sets them as instance variables on the instance of the class. But the class is still instantiated once per endpoint in most cases.

@jdavcs jdavcs mentioned this issue Dec 17, 2023
1 task
@mvdbeek
Copy link
Member

mvdbeek commented Dec 18, 2023

Thanks for checking, this however was a minute detail of the discussion and does not change my view that we should not make these sweeping changes to cut down on one tiny dependency.

@jdavcs
Copy link
Member Author

jdavcs commented Dec 18, 2023

Summarizing my point here.

  1. In our api modules we rely on class-based views, a minor convenience feature provided by an external package, fastapi-utls.
  2. fastapi-utils is incompatible with SQLAlchemy 2.0, thus causing a dependency conflict that prevents us from upgrading to SQLAlchemy 2.0.
  3. fastapi-utils is not actively maintained, so it is unlikely that it will be upgraded.
  4. The fastapi-restful fork of fastapi-utils is compatible with SQLAlchemy 2.0. However, that fork has a different implementation of class-based views which is incompatible with our code base. It may or may not be solved with a modification to the fork's implementation.
  5. We can submit a request to the fastapi-restful fork maintainers and hope that (1) they will modify their library's internal logic to accommodate our use case; and (2) this will be the only incompatibility between our code and their implementation, now and in the future. We can also maintain our own fork where we simply remove the SQLAlchemy requirement constraint, which will allow us to upgrade SQLAlchemy to 2.0, but will leave us with a half-broken fork of fastapi-utils, as well as the responsibility to maintain yet more code that we didn't even write.

Or we can just drop the dependency. Here's why I think it's a good solution:

  1. It is the simplest way to solve a major blocking issue without depending on a half-functioning package (our own fork without the 2.0 constraint) or relying on a different project to alter their library's internal logic to accommodate our (much more advanced) use case.

  2. I don't think we need class-based views: it is only a minor convenience feature. We can only factor out the common depends() arguments into class scope, but a typical endpoint has multiple arguments passed to it; so by using class-based views you'll simply have all those multiple arguments minus one. Besides, sometimes we factor out an argument that is only used in a subset of methods, and in other cases we don't - so we don't use it consistently either. Finally, I think leaving those arguments in the endpoint's function definition makes the code much more straightforward. Our code is simpler without this feature.

  3. Most importantly, dropping a non-essential dependency reduces the number of external libraries we depend upon. It is not a large dependency, but it does things we don't need, like session handling, or the now-redundant InferringRouter. It also it requires a particular version of SQLAclhemy, which is a core dependency for us; this leads to the classic "diamond" problem in "dependency hell": we depend on foo and bar, we require foo of version A, but bar requires foo of version B, and versions A and B are incompatible. With "foo" (SQLAlchemy) being at the core of our own system makes this particular example different from most other examples of some datatype dependency pulling in many others.

Dependency management is a hard problem and there's been a ton written on it. Of course we should rely on external software, and of course such reliance comes with risks. What's clear to me (from both literature and experience) is that a decision on including a dependency should not be made lightly.

In this case, there is no good reason to rely on fastapi-utils other than "[removing it is] a can of worms where we'll likely break things" (as per channel discussion). But I don't think removing it will break things: it is a lengthy but straightforward modification that is not at all complex compared to lots of other changes we make all the time.

@jmchilton
Copy link
Member

I really don't think of Galaxy as a FastAPI app - Galaxy has its own opinion about what the structure of web controllers looks like and we've sculpted it and documented it and refined it over many years. Dropping an external dependency and structuring the code more tightly toward FastAPI dependency management... actual ties us to FastAPI more strongly. External dependencies aren't just about the things you import - it is also about structuring your code to shield yourself from the external dependencies. When Marius took the big swing of bringing in FastAPI he was very cognizant of my concerns about the existing structure of the code and brought in class based views to make it feel more Galaxy and in addition to that kindness was kind and let me replace the dependency management features with a standalone library that was more shielded from FastAPI internals and was useful throughout the application and in non-FastAPI contexts. He used FastAPI to implement a Galaxy web framework - he did not make Galaxy a FastAPI application. We moved the code in direction I thought there was full consensus around and balanced everyone's concerns. This feels like another big shift but we're letting our dependencies dictate our code structure and move it in directions there isn't consensus around.

I understand the performance might be better as the result of this change, but it isn't in a portion of code we know to be critical and I do not think it is wise to take such a big swing on changes that your peers aren't a fan of.

@jmchilton
Copy link
Member

I know that you're tasked with replacing SQLAlchemy and this is an approach to accomplish a piece of that task - if Marius and I think it is not the right approach I think it is fair to say... "I've done what I think is the right approach" and put the ball in our courts to find some replacement for CBV. I don't mind doing the work to support the structure that I think is important - if you don't agree it is important. The people doing the work should have a dispropionate say in how it is done - but I think you should give us sometime to propose a less drastic alternative code structure.

@jdavcs
Copy link
Member Author

jdavcs commented Dec 18, 2023

@jmchilton thank you for taking the time to read into the issue. I do appreciate the very detailed and constructive points that you've made. If we want to structure our endpoints in classes with dependencies defined at the method level factored out, #17205 is one way to do that: we only take the one module+test that we're actually using from the original external library and maintain it from now on. That way we don't have to rely on an incompatible external dependency most of which we don't need.

@jdavcs
Copy link
Member Author

jdavcs commented Dec 18, 2023

@jmchilton On a related topic, the only other dependency conflict is graphene-sqlalchemy which is part of the new TS code, added in #15639. We are using version "3.0.0b3", which is pre-SQLAlchemy 2.0. So is "3.0.0b4". The current version is "3.0.0rc1", which is compatible with SQLAlchemy 2.0, however, it appears to come with breaking changes, so the previous beta releases don't work (e.g. ImportError: cannot import name 'convert_sqlalchemy_hybrid_property_type' from 'graphene_sqlalchemy.converter'). I haven't looked at the errors in detail. Do you have a quick solution to this? If not, I'll try to move that code to "3.0.0.rc1".

@jdavcs
Copy link
Member Author

jdavcs commented Dec 19, 2023

☝️ I'm working on it. Breaking changes affecting conversion of hybrid properties introduced in graphql-python/graphene-sqlalchemy#371. @jmchilton

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file
Projects
None yet
Development

No branches or pull requests

3 participants