Choosing an SQL Tookit for accessing a database is an important choice. In April 2023, we took some time to review the possible alternatives, based on our current needs and took the decision to use SQLAlchemy in all new projects (and migrate older ones to this stack). This page stores traces of the decision process which led us to this choice. ## Requirements We have identified some requirements: - asyncio: the database stack should be **capable** to leverage as much as possible the asyncio model, since this is what is used by FastAPI, our API framework ; nevertheless, it shouldn't force us to use an async programming pattern when we do not need it - database schema migration: the tool should be capable to handle database schema migration, or have a nice cooperation with another tool - ORM: we would like to have an ORM to simplify query writing / understanding in simple case ; but we need as well a tool which is capable to let us write our own complex queries when needed, with as litlle plain text SQL as possible - integration with dialects / DBAPI: is the stack capable to interact nicely with the DBAPI advanced features (e.g. multiple inserts with psycopg2, specific dialect features) object state management: is the stack handling the object states automatically (which objects have to be created, which are dirty)? Some other topics have also been identified while looking after existing solutions: - caching: is the tool capable to handle object caching - recent PEPs: the database stack should embrace “recent” PEPs, among which PEP 484 typing practices, PEP 557 dataclasses - weight: is the stack lightweight or should we expect significant resources impact (memory, CPU) ? documentation level (including tutorial, courses, …) - number of active committers entreprise support: is there any large company supporting the stack ? - created since - current release - license - former experience: what is the team former experience with the stack ## Stacks identified The Python stacks to interact with a database we have identified are: - Django ORM: the ORM of Django, which can be used even without Django - SQLAlchemy: the ORM used by OpenStack, Yelp, Reddit - Peewee - SQLObject - Tortoise ORM - Pony ORM - GINO - Databases Django ORM, SQLAlchemy and Peewee are the leaders, all of them being used in many significant projects / companies. Other stacks might be promising in the coming years but they won’t be considered any further for now since they are more niche players, tackling one topic very well but missing either documentation, long term support or API stability. Plus our organization needs a stack allowing easy onboarding of new contributors, and using a popular one means that contributors might probably already have some experience with it. Django ORM won’t be considered since it is tightly coupled to Django, or at least there is no clear documentation / support for using it outside a Django project. ## Pros and Cons of SQLAlchemy vs Peewee ### SQLAlchemy Pros - **Native support of asyncio** - Database migration is supported via Alembic companion tool - Caching is supported via Dogpile companion project - **Code has been adapted to many “recent” PEPs with 2.0** - SQL can be tweaked if necessary - Documentation is extensive - **Main contributor is hired at OpenStack, dedicated to support SQLAlchemy** - Many other people are contributing few changes in the code base - Project has been in place since 2006 - Strong orientation towards writing SQL in a Pythonic way - Easy to preview SQL which will be executed while coding / debugging - Clear separation between ORM and the Core - Strong support of advanced SQL operations - Strong support of PostgreSQL specific dialect ### SQLAlchemy Cons - Library is significantly big - Learning curve is significant to use all functionalities - One contributor is the author of most code changes - **Many ways to do the same thing** ### Peewee Pros - SQL can be tweaked if necessary - Library is lightweight - Documentation is extensive - **Former experience at Kiwix with the project was good** - Project has been in place since more than 10 years - Good support of advanced SQL operations - Strong support of PostgreSQL specific dialect ### Peewee Cons - **Only Alpha support for asyncio** - **Database migration does not offer native support for versioning or rollback** - **No support for “recent” PEPs** - **Only one active maintainer, limited activity, very few other contributors** - Limited transaction management ## Final decision The topics that have highlighted in bold above drived us towards the choice of SQLAlchemy. There are too many significant Cons for Peewee and there are many significant Pros for SQLAlchemy. First usage of SQLAlchemy on Zimfarm transition from Mongo to Postgresql confirmed this choice, since we had to use many advanced functionalities (custom Postgresql datatypes including JSON, upserts) and SQLAlchemy was easy and clean to tackle these.