Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a reverse migration to distribution change #495

Merged
merged 1 commit into from Apr 16, 2021

Conversation

mdellweg
Copy link
Member

@mdellweg mdellweg commented Apr 6, 2021

No description provided.

@mdellweg
Copy link
Member Author

mdellweg commented Apr 6, 2021

It turns out, that this operation fails to perform backwards. I have no idea yet, how to solve this.

        migrations.AlterField(
            model_name='filedistribution',
            name='distribution_ptr',
            field=models.OneToOneField(auto_created=True,
                                       on_delete=django.db.models.deletion.CASCADE,
                                       parent_link=True, primary_key=True,
                                       related_name='file_filedistribution', serialize=False,
                                       to='core.Distribution'),
            preserve_default=False,
        ),

I raises an error:
django.db.utils.ProgrammingError: column "distribution_ptr_id" is in a primary key

@pulpbot
Copy link
Member

pulpbot commented Apr 6, 2021

WARNING!!! This PR is not attached to an issue. In most cases this is not advisable. Please see our PR docs for more information about how to attach this PR to an issue.

@mdellweg
Copy link
Member Author

mdellweg commented Apr 7, 2021

It turns out, that the following statements in the resulting sql-backward migration (pulpcore-manager sqlmigrate file 0009 --backwards) are in the wrong order:

ALTER TABLE "file_filedistribution" ALTER COLUMN "distribution_ptr_id" DROP NOT NULL;
ALTER TABLE "file_filedistribution" DROP CONSTRAINT "file_filedistribution_distribution_ptr_id_5ce5564b_pk";

I think, this is a django bug.

edit: Filed an issue: https://code.djangoproject.com/ticket/32634

@daviddavis
Copy link
Contributor

This looks good to me.

@mdellweg
Copy link
Member Author

This looks good to me.

It does not work. Once loaded with data, you get:
django.db.utils.InternalError: could not find trigger 86175

This is actually a complete redesign of the migration. It not only makes
it possible to have the backward migration, but it makes the process
much more symmetric and easier.

[noissue]
@mdellweg mdellweg changed the title WIP: Add a reverse migration to distribution change Add a reverse migration to distribution change Apr 14, 2021
@mdellweg mdellweg marked this pull request as ready for review April 14, 2021 15:17
@mdellweg
Copy link
Member Author

This should work now. I tested forward and backward several times with the following data:

#!/bin/bash

set -eux

pulp file distribution destroy --name "test2" || true
pulp file distribution destroy --name "test1" || true
pulp file repository destroy --name "test1" || true
pulp file remote destroy --name "test1" || true

pulp file remote create --name "test1" --url "https://fixtures.pulpproject.org/file/PULP_MANIFEST"
pulp file repository create --name "test1" --remote "test1"
pulp file repository sync --name "test1"
PUBLICATION_HREF=$(pulp file publication create --repository "test1" | jq -r '.pulp_href')
pulp file distribution create --name "test1" --base-path "test1"
pulp file distribution create --name "test2" --base-path "test2" --publication $PUBLICATION_HREF

@daviddavis
Copy link
Contributor

I created some distributions against 1c96911 then checked out this PR and ran the migration. Everything worked. So then I ran pulpcore-manager migrate file 0008 followed by pulpcore-manager migrate file 0009 and confirmed everything worked again. 👍

@daviddavis
Copy link
Contributor

I guess we need to update this @pulp/rpm migration too?

https://github.com/pulp/pulp_rpm/blob/master/pulp_rpm/app/migrations/0032_new_distribution_model.py

@dralley
Copy link
Collaborator

dralley commented Apr 14, 2021

@daviddavis Yup

FileDistribution = apps.get_model('file', 'FileDistribution')
CoreDistribution = apps.get_model('core', 'Distribution')
for old_file_distribution in FileDistribution.objects.all():
NewFileDistribution = apps.get_model('file', 'NewFileDistribution')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just an attempt at making it more generic and proper or is it strictly necessary for functional reasons for the file plugin?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, what you mean. The apps.get_model is Django's way to provide you the historical model with a schema that matches the current state of the migration.
In this case, it provides you the transient NewFileDistribution that only pops into existance to later be renamed to FileDistribution. Its evanescent nature manifests in its complete lack of presence in the actual code.

Copy link
Collaborator

@dralley dralley Apr 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is that NewFileDistribution and core.Distribution are effectively the same synce the former adds no new fields over the latter (except for _ptr). I agree that it should be used since it's more specific, I'm just trying to make sure my mental model is correct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NewFileDistribution here is more than core.Distribution, as it's a detail table. So i am copying objects from one detail table to the other.
IMHO you should never add objects to the core.Distribution table directly.

new_master_model_entry.save()
old_file_distribution.distribution_ptr = new_master_model_entry
old_file_distribution.save()
pks_to_delete.append(old_file_distribution.pulp_id)
Copy link
Collaborator

@dralley dralley Apr 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad we can get rid of this, I'm just concerned that perhaps there was some necessary reason it was done in such a convoluted fashion originally? I know Brian ran into a lot of problems throughout the process.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the problem of being unable to delete the old base objects was, that the basedistribution_base_ptr had a foreign key constraint that was only lifted by removing the field from the table.
In the new approach there is no such ambiguity of inheritance that prevents the immediate deletion of old objects.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1



class Migration(migrations.Migration):
atomic = False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary?

Copy link
Member Author

@mdellweg mdellweg Apr 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This solves the aforementioned trigger error.
My theory: When inserting new items into the new table, Django or Postgres add a trigger to check for constraints upon leaving the transaction. Renaming the table however breaks the relation and the trigger cannot be found anymore. Ending the transaction earlier fixes the problem.
BTW: Django suggests making data migrations non atomic:
https://docs.djangoproject.com/en/2.2/howto/writing-migrations/#non-atomic-migrations

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that constraints are checked last, I don't know the specific implementation details. But this sounds plausible. It also sounds potentially like a bug - maybe we should bring it up in #django and what people say.

Are there portions of the migration that we can make atomic or non-atomic selectively rather than making the entire migration non-atomic in a blanket way?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not seen a way to make several steps of the migration atomic.
What we could do is move the last two steps (Deleting and renaming Tables) in a separate migration. But i am unsure whether this would gain much.

migrations.RemoveField(
model_name='filedistribution',
name='basedistribution_ptr',
migrations.RunPython(
Copy link
Collaborator

@dralley dralley Apr 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is, I believe we can put atomic=True here rather than having our own transactions inside. Maybe for other operations also.

Sidenote: There's no such thing as "nested transactions", so if the step is already atomic (run in a transaction) by default, it means that the "transaction" in the migration was actually not a "real transaction", it's a "savepoint". I don't remember precisely how the semantics differ, but it might potentially be part of the issue.

I remember one of the things the SqlAlchemy author wrote is that they hate this particular part of the Django ORM because the semantics can bite you sometimes. I don't know that this is one of those times I just want to point it out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transactions in postgres can have multiple savepoints, and rollback can go back to a specific savepoint (rather than rolling back the entire transaction). You have to specifically do a 'ROLLBACK TO savepoint', iirc, to take advantage of this, and I don't know if/how Django does so.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had atomic=true with and without the inner transactions and it produced the same error.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still interested in figuring out of this is an upstream bug, but LGTM

@dralley
Copy link
Collaborator

dralley commented Apr 15, 2021

@mdellweg I'm planning on swapping the distribution models today on many of the plugins. If you can copy this PR to the RPM plugin and fix the template to use the new model, then I'll handle all of the rest.

@mdellweg
Copy link
Member Author

@mdellweg I'm planning on swapping the distribution models today on many of the plugins. If you can copy this PR to the RPM plugin and fix the template to use the new model, then I'll handle all of the rest.

What template do you mean?

@dralley
Copy link
Collaborator

dralley commented Apr 15, 2021

@daviddavis
Copy link
Contributor

Oh yea, I totally forgot about the plugin template. I can take care of it unless @mdellweg you want to.

@mdellweg
Copy link
Member Author

@dralley pulp/pulp_rpm#1971

@daviddavis that would be great!

@mdellweg mdellweg merged commit cb798e2 into pulp:master Apr 16, 2021
@mdellweg mdellweg deleted the reverse_migration_0009 branch April 16, 2021 12:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants