When using a multiple database setup, alembic does not detect added foreign keys #1443

mmiller8878 · 2024-03-14T17:30:21Z

mmiller8878
Mar 14, 2024

I have alembic set up with multiple postgres databases, I am by no means an expert so followed this guide: https://medium.com/pythonistas/managing-multiple-databases-migrations-with-alembic-10025a4b3ab3. They are tracked by a single alembic.ini file and env.py, similar to here.

I am experiencing some strange issues. When adding a foreign key column to a table, the new column is detected, but the foreign key constraint is not. I've had to manually add the foreign key constraint every time to the migration file.

This only happens when adding the foreign key to a preexisting table. If creating a new table with a foreign key, the new foreign key is correctly detected and added.

I've tried to debug what'as happening in the include_object function as I had assumed that would be causing the issues, but I can't see anything in there that's causing the problem.

I've tried explicitly naming my foreign key constraints, and also auto naming, according to this guide: https://alembic.sqlalchemy.org/en/latest/naming.html. Neither seemed to make alembic pick it up.

I'm assuming this issue is a peculiarity of my multi DB setup, but just can't see where exactly it's being caused. Any insights would be welcome!

Here is my env.py file:

import asyncio
from logging.config import fileConfig
import alembic_postgresql_enum  # noqa: F401

from sqlalchemy import pool
from sqlalchemy.engine import Connection
from sqlalchemy.ext.asyncio import async_engine_from_config

from alembic import context
from app.database.models import Base
import os
from dotenv import load_dotenv

load_dotenv()

# this is the Alembic Config object, which provides
# access to the values within the .ini file in use.


config = context.config
section = config.config_ini_section

config.set_section_option(section, "DB_USER", os.environ.get("DB_USER"))
config.set_section_option(section, "DB_PASSWORD", os.environ.get("DB_PASS_ENCODED"))
config.set_section_option(section, "DB_HOST",  os.environ.get("DB_HOST"))
config.set_section_option(section, "DEV_DATABASE_NAME", "dev")
config.set_section_option(section, "LIVE_DATABASE_NAME", "live")

# Interpret the config file for Python logging.
# This line sets up loggers basically.
if config.config_file_name is not None:
    fileConfig(config.config_file_name)

# add your model's MetaData object here
# for 'autogenerate' support
# from myapp import mymodel
# target_metadata = mymodel.Base.metadata
target_metadata = Base.metadata


# other values from the config, defined by the needs of env.py,
# can be acquired:
# my_important_option = config.get_main_option("my_important_option")
# ... etc.
#
def get_url():
    return (
        f"postgresql+asyncpg://{os.environ.get('DB_USER')}:"
        f"{os.environ.get('DB_PASS_ENCODED')}@{os.environ.get('DB_HOST')}"
    )


db_url = get_url()
db_name = (
    config.config_ini_section
)  # active config ini section is the db name that we have chosen
config.set_main_option("sqlalchemy.url", f"{db_url}/{db_name}")


def run_migrations_offline() -> None:
    """Run migrations in 'offline' mode.

    This configures the context with just a URL
    and not an Engine, though an Engine is acceptable
    here as well.  By skipping the Engine creation
    we don't even need a DBAPI to be available.

    Calls to context.execute() here emit the given string to the
    script output.

    """
    url = config.get_main_option("sqlalchemy.url")
    context.configure(
        url=url,
        target_metadata=target_metadata,
        literal_binds=True,
        dialect_opts={"paramstyle": "named"},
    )

    with context.begin_transaction():
        context.run_migrations()


def do_run_migrations(connection: Connection) -> None:
    def include_object(object, name, type_, reflected, compare_to):
        if (
            type_ == "foreign_key_constraint"
            and compare_to
            and (
                compare_to.elements[0].target_fullname
                == db_name + "." + object.elements[0].target_fullname
                or db_name + "." + compare_to.elements[0].target_fullname
                == object.elements[0].target_fullname
            )
        ):
            return False

        # Make sure we don't drop the spatial_ref_sys table
        if type_ == "table" and name == "spatial_ref_sys":
            return False

        if type_ == "table":
            if object.schema == db_name or object.schema is None:
                return True
            elif object.table.schema == db_name or object.table.schema is None:
                return True
            else:
                return False

        if type_ == "column" and compare_to is None:
            return True

    context.configure(
        connection=connection,
        target_metadata=target_metadata,
        include_object=include_object,
    )

    with context.begin_transaction():
        context.run_migrations()


async def run_async_migrations() -> None:
    """In this scenario we need to create an Engine
    and associate a connection with the context.
    """

    connectable = async_engine_from_config(
        config.get_section(config.config_ini_section, {}),
        prefix="sqlalchemy.",
        poolclass=pool.NullPool,
    )

    async with connectable.connect() as connection:
        await connection.run_sync(do_run_migrations)

    await connectable.dispose()


def run_migrations_online() -> None:
    """Run migrations in 'online' mode."""

    asyncio.run(run_async_migrations())


if context.is_offline_mode():
    run_migrations_offline()
else:
    run_migrations_online()

Answered by CaselIT

Mar 26, 2024

Note that every time you return a falsy value from include_object that will be treated as "skip", same as returning False.

so you probably need a

if type_ == "foreign_key_constraint":
  return True

after your initial check, otherwise you exclude all of the new ones (that compare_to=None)

alternatively you may also consider return True at the end of your include object, otherwise you are excluding everything that's not matched by your conditions, so for example all indexes, unique, other constraints seem skipped by your include function

View full answer

zzzeek · 2024-03-14T17:43:33Z

zzzeek
Mar 14, 2024
Maintainer

what kind of "foreign keys" are these? how do these foreign keys relate to "multiple databases" ? what does "multiple databases" mean here?

7 replies

mmiller8878 Mar 22, 2024
Author

Sorry for the delay in getting back to you.

Yes correct - all we're doing is changing the database URL. For both databases the public schema is being used which is the default I guess.

The reason we're running autogenerate on both is that we sometimes need a series of different changes to the dev db as we develop a feature. So we might need to run a few migrations on the dev db before we get to the final db design. Once we do, we migrate once to live. SO basically the dev db will have a few intermediate migrations but they both end up in the same state.

I guess that's not a standard way of doing this?

Do you think that could be related to the issue we're seeing?

zzzeek Mar 23, 2024
Maintainer

If you run autogenerate on database A, new migration files are generated. that's your new migration. if you say "alembic revision --autogenerate" again, without first doing "alembic upgrade", you get this error:

FAILED: Target database is not up to date.

so first you have to do "alembic upgrade". Now if you run alembic with autogenerate again, the new migration files will be blank. Because you have already updated your database.

basically it does not make any sense to run autogenerate on two databases in a row and I dont even understand how that would work. the purpose of autogenerate is to look for changes in your python source code, not in your database, and it does this by comparing what your python model/metadata looks like to an arbitrary database. the autogenerate against database A and database B, if those two databases arent the same, will be different, but there's no intrinsic way to get an autogenerate that is somehow adding what's missing from A and different things missing from B, I mean maybe if there were totally non-overlapping constructs missing in each database it would sort of do that, but it doesnt seem very useful. using a migration tool assumes your database schema is managed by the tool. if you are hand-modifying the database as well you'd have to keep those hand-modified changes completely separate from whatever your database model indicates.

mmiller8878 Mar 25, 2024
Author

Maybe I'm not explaining it very well. We are tracking the 2 databases separately but they are using the same model, as defined in the Python code.

so the relevant part of alembic.ini looks like this:

[dev_database]
sqlalchemy.url = postgresql+asyncpg://postgres:%(DB_PASSWORD)s@%(DB_HOST)s:5432/%(DEV_DATABASE_NAME)s
version_locations = ./alembic/dev
[live_database]
sqlalchemy.url = postgresql+asyncpg://%(DB_USER)s:%(DB_PASSWORD)s@%(DB_HOST)s:5432/%(LIVE_DATABASE_NAME)s
version_locations = ./alembic/live

So when I migrate the dev database, the command looks something like this:
alembic -n dev revision --autogenerate -m "my database change"
followed by:
alembic -n dev upgrade head

Everything is being managed in alembic, there are no "manual" changes to the database.

Assume the dev and live database are identical to start with. Let's call this state A. During development, we might make an update to the model, and migrate the dev database using autogenerate then upgrade. We then make another change to the database model, and run another migration on the dev db. Let's call this state B. State B has been created by running 2 migrations.

Currently the dev database is in state B, whereas the live database is still in state A. We are happy with the changes to the dev database so want to migrate live to state B.

I guess there are 2 choices here. We could run autogenerate then upgrade on the live database, which would generate a single migration file that covers all changes required to get from state A to state B. This is what we've been doing. Or we could use the migration generated for the dev database and apply them, one after the other. But that didn't make as much sense to me.

I hope that explains it - sorry if it was a bit painful! I'm not sure if this setup is causing the original issue that I reported or if it's a coincidence!..

CaselIT Mar 25, 2024
Maintainer

have you verified that the fk is not filtered by your custom include_object?

It seems that you return None if the type type_ == "foreign_key_constraint" and compare_to is None.

zzzeek Mar 25, 2024
Maintainer

OH ok, you are using multiple independent environments including separate version directories. When you said "I'm assuming this issue is a peculiarity of my multi DB setup" it led me to believe you had multiple databases in play, but you don't. the autogen run only sees one database and set of versioning files

What we need to know now is:

do you observe the same issue on both environment or only in prod?
if only in prod, you can confirm that the identical modification to a model against the same DB state behaves differently vs. dev/ prod?
what is the specific model change? Column(..., ForeignKey()), does not produce an add_constraint ?

mmiller8878 · 2024-03-26T15:54:14Z

mmiller8878
Mar 26, 2024
Author

Thanks for the comments. I'm pretty sure it is related to my custom include_object but not sure exactly what's causing it.

The issue is present on both databases.

The issue is present if I add a foreign key constaint to a preexisting column. For example, if I start with a model like this:

class Table1(Base):
    __tablename__ = "table_1"
    id: Mapped[Optional[int]] = mapped_column(primary_key=True)
    name: Mapped[str] = mapped_column()
    field_1: Mapped[int] = mapped_column()


class Table2(Base):
    __tablename__ = "table_2"
    id: Mapped[Optional[int]] = mapped_column(primary_key=True)
    name: Mapped[str] = mapped_column()
    field_2: Mapped[int] = mapped_column()

and change the name column on table_1 to a foreign key:

class Table1(Base):
    __tablename__ = "table_1"
    id: Mapped[Optional[int]] = mapped_column(primary_key=True)
    name: Mapped[str] = mapped_column(ForeignKey("table_2.name"))
    field_1: Mapped[int] = mapped_column()


class Table2(Base):
    __tablename__ = "table_2"
    id: Mapped[Optional[int]] = mapped_column(primary_key=True)
    name: Mapped[str] = mapped_column()
    field_2: Mapped[int] = mapped_column()

This is not detected by alembic and produces an empty migration file. If I create a new table from scratch with a foreign key, it is correctly detected.

@CaselIT I've tried commenting out the part relating to the foreign keys in my custom include_object, but it is still not detected (e.g. doing this):

def do_run_migrations(connection: Connection) -> None:
    def include_object(object, name, type_, reflected, compare_to):
        # if (
        #     type_ == "foreign_key_constraint"
        #     and compare_to
        #     and (
        #         compare_to.elements[0].target_fullname
        #         == db_name + "." + object.elements[0].target_fullname
        #         or db_name + "." + compare_to.elements[0].target_fullname
        #         == object.elements[0].target_fullname
        #     )
        # ):
        #     return False

        # Make sure we don't drop the spatial_ref_sys table
        if type_ == "table" and name == "spatial_ref_sys":
            return False

        if type_ == "table":
            if object.schema == db_name or object.schema is None:
                return True
            elif object.table.schema == db_name or object.table.schema is None:
                return True
            else:
                return False

        if type_ == "column" and compare_to is None:
            return True

    context.configure(
        connection=connection,
        target_metadata=target_metadata,
        include_object=include_object,
    )

    with context.begin_transaction():
        context.run_migrations()

So I guess I might need to specifically include the foreign keys in my include_object, but not sure how to go about that. I did struggle a bit to understand the reflected and compare_to parameters. I can see the new foreign key is detected by the include_object function, it's just not included.

3 replies

CaselIT Mar 26, 2024
Maintainer

Note that every time you return a falsy value from include_object that will be treated as "skip", same as returning False.

so you probably need a

if type_ == "foreign_key_constraint":
  return True

after your initial check, otherwise you exclude all of the new ones (that compare_to=None)

alternatively you may also consider return True at the end of your include object, otherwise you are excluding everything that's not matched by your conditions, so for example all indexes, unique, other constraints seem skipped by your include function

Answer selected by mmiller8878

mmiller8878 Mar 27, 2024
Author

Ahh ok thanks. That was the issue. I didn't realise The default behaviour was to ignore - I noticed that my fk constraint was not satisfying a True or False condition. Thanks for the help - I've added a catch-all True at the end.

BTW do you have any suggestions as to the best way to develop this include_object function in isolation, to ensure it's including / excluding exactly what we want? I tried to debug in the IDE but the breakpoints were not being picked up.

CaselIT Mar 27, 2024
Maintainer

you are free to call that include_object however you like. The function interface should be properly typed in recent alembic releases

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When using a multiple database setup, alembic does not detect added foreign keys #1443

{{title}}

Replies: 2 comments 10 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

When using a multiple database setup, alembic does not detect added foreign keys #1443

mmiller8878 Mar 14, 2024

Replies: 2 comments · 10 replies

zzzeek Mar 14, 2024 Maintainer

mmiller8878 Mar 22, 2024 Author

zzzeek Mar 23, 2024 Maintainer

mmiller8878 Mar 25, 2024 Author

CaselIT Mar 25, 2024 Maintainer

zzzeek Mar 25, 2024 Maintainer

mmiller8878 Mar 26, 2024 Author

CaselIT Mar 26, 2024 Maintainer

mmiller8878 Mar 27, 2024 Author

CaselIT Mar 27, 2024 Maintainer

mmiller8878
Mar 14, 2024

Replies: 2 comments 10 replies

zzzeek
Mar 14, 2024
Maintainer

mmiller8878 Mar 22, 2024
Author

zzzeek Mar 23, 2024
Maintainer

mmiller8878 Mar 25, 2024
Author

CaselIT Mar 25, 2024
Maintainer

zzzeek Mar 25, 2024
Maintainer

mmiller8878
Mar 26, 2024
Author

CaselIT Mar 26, 2024
Maintainer

mmiller8878 Mar 27, 2024
Author

CaselIT Mar 27, 2024
Maintainer