Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Imperative Migrations with a TypeScript DSL #4688

Open
flybayer opened this issue Jul 1, 2020 · 70 comments
Open

Imperative Migrations with a TypeScript DSL #4688

flybayer opened this issue Jul 1, 2020 · 70 comments
Labels
kind/feature A request for a new feature. team/schema Issue for team Schema. topic: migrate sdk topic: migrate

Comments

@flybayer
Copy link

flybayer commented Jul 1, 2020

Problem

Not all migrations can be declarative. For example, we need to add a custom index for performance reasons. We also need this as an escape hatch for anything prisma schema doesn't support. Lastly, we also need this for data migrations.

This is very important for any serious apps using prisma migrate.

Suggested solution

A new command: prisma migrate create

This will create the necessary migration files. Similar as prisma migrate save but the files are just stubbed out.

Then the user will open the stubbed migration file and add their custom migration code.

Perhaps the imperative migration file can look something like this:

import {migrate} from '@prisma/migrate`

migrate({
  async up(prisma) {
    // SQL migration
    await prisma.executeRaw(`some SQL stuff`)

    // Data migration
    await prisma.someModel.create(/*..*/)
  },

  async down(prisma) {
    await prisma.someModel.delete(/*..*/)

    await prisma.executeRaw(`undo some SQL stuff`)
  }
})
@flybayer
Copy link
Author

flybayer commented Jul 1, 2020

cc @peterp you'll probably be interested in this for Redwood too?

@karanpratapsingh
Copy link

@flybayer +1 really like this idea

@ryanking1809
Copy link

ryanking1809 commented Jul 1, 2020

100% - This is the only thing stopping me from being able to fully utilise migrate.

I even think steps.json could even be added to this file and things could work more similarly to sqlAlchemy or Rails' Active Record. in that, prisma will generate a file containing all steps, but will allow us to insert additional javascript and even additional database modifications in between each step, allowing for more complex migrations.

migrate({
  async up(prisma) {
    // Prisma migration steps for intial setup
    // a lot of the time we need to migrate the data before we can drop columns etc
    // or setup defaults for non-nullable fields
    await prisma.db.addColumn(...)

    // Data migration
    await prisma.someModel.update(/*..*/)

    // Prisma migration for db cleanup
    // only once we know the data is safe in another location and defaults setup
    // can we drop columns and make columns non-nullable
    await prisma.db.alterColumn(...)
    await prisma.db.dropColumn(...)

    // Raw when needed
    await prisma.executeRaw(`some SQL stuff`)
  },

  async down(prisma) {
    await prisma.db.addColumn(...)
    //...
    await prisma.db.dropColumn(...)
  }
})

You would need to verify that the resulting db will match the schema but it would allow for much more sophisticated migrations.

@flybayer
Copy link
Author

flybayer commented Jul 2, 2020

@ryanking1809 yes!! I definitely thought about that — that would be super cool.

That also opens up the possibility to have a single change() hook that automatically generates up and down migrations, same as ActiveRecord.

migrate({
  // Automatically generates both `up` and `down` migrations
  async change(prisma) {
    await prisma.db.addColumn(...)
    await prisma.db.addIndex(...)
  }
})

@Jolg42
Copy link
Member

Jolg42 commented Jul 2, 2020

Thanks for the suggestion! We're currently working on Migrate so any feedback is welcome 😄

@flybayer So what I see is the main driver for imperative migrations is to provide an escape hatch for unsupported features and data migrations. We really care about these two cases and want them covered.

If this was covered not by imperative migrations but before / after hooks will this work for you?
For example: before.ts / after.ts or even before.sql / after.sql called before or after a migration.

Let me know what you think

@flybayer
Copy link
Author

flybayer commented Jul 2, 2020

I don't think that would be sufficient. Let's say I want to create an SQL function to use for default values. This is an imperative migration that's not tied to anything else.

There's no existing migration for which I can add a before or after hook since this is a standalone thing.

Also for complex migrations requiring data migrations, only before after hooks would be very cumbersome. Example:

  1. Make first change in schema file
  2. Save that as a migration
  3. Write the after hook for that
  4. Make another change to schema file
  5. Save that as a migration
  6. Write the after hook for that second migration

@ryanking1809
Copy link

ryanking1809 commented Jul 2, 2020

Yes, to extend what @flybayer is saying with some examples, say I had a project with tasks, the initially I might have done this in a json column because I'm still designing what tasks are.

model Project {
  id   Int @id
  tasks Json 
}

After a bit of use, my customers want more features like task completion, and dueDates, and I decide to move tasks to it's own model. Ideally, I'd simply do this in one migration.

model Project {
  id   Int @id
  tasks Task[]
}
model Task {
  id   Int @id
  project Project 
  completed   Boolean
  dueDate DateTime
}

And in the migration file I would be able to:

  1. Create the task table
  2. Migrate the task data from the project model to the task model
  3. Drop the task column from projects

A before and after hook would force this to be executed in 2 migrations. And there's a little confusing on how to define the relationships in the prisma.schema file.

Migration Schema 1

model Project {
  id   Int @id
  // this has to stay intact so we can migrate the data
  tasks Json 
}
model Task {
  id   Int @id
  // project relation not possible due to the definition being on one side
  // but the foreign key is required in the data so we just use the projectId Column for now
  projectId Int
  completed   Boolean
  dueDate DateTime
}

We can then migrate the data with and after hook, and only after that we can make the schema look they way we want.

model Project {
  id   Int @id
  tasks Task[]
}
model Task {
  id   Int @id
  project Project 
  completed   Boolean
  dueDate DateTime
}

It works but is a lot more cumbersome.


I've also had situations in the past where I've just been required to migrate data and make no changes to the database. An example - maybe we previously decided to store the length of a task in integer days, but now need more detail, so the length has to be stored in integer minutes. The column length remains an integer but we need to multiply out the data within the database.

Whilst the data can migrated with a script. It's very useful to have it integrated into your deploy process. Especially if you're dealing with multiple application instances. So being able to have a migration that isn't directly tied the the prisma.schema file would be amazing.

@davidmcnamee
Copy link

davidmcnamee commented Jul 23, 2020

I'm definitely hyped about this feature -- lack of data migrations is the main thing holding me back from using prisma in all my projects -- but I disagree with the suggested solution in terms of API.

For example, the suggested solution would produce code like this:

// migration.ts

import { migrate } from '@prisma/migrate`

migrate({
  async up(prisma) {

    // update some fields
    await prisma.updateField({ tag: "UpdateField", model: "Following", field: "id", type: "Int" })

    // create some new directives, arguments, feilds, or models
    await prisma.createDirective({ tag: "CreateDirective", location: { path: { tag: "Field", model: "Following", field: "id" }, directive: "id" } })
    await prisma.createDirective({ tag: "CreateDirective", location: { path: { tag: "Field", model: "Following", field: "id" }, directive: "default" } })
    await prisma.createArgument({ tag: "CreateArgument", location: { tag: "Directive", path: { tag: "Field", model: "Following", field: "id" }, directive: "default" }, argument: "", value: "autoincrement()" })

    // Data migration
    await prisma.someModel.rows.forEach(r => {
      // move data around with typescript
    });
    await prisma.executeRaw(`some SQL stuff`) // ...or just use SQL

    // delete the old directives
    await prisma.deleteDirective({ tag: "DeleteDirective", location: { path: { tag: "Field", model: "Following", field: "id" }, directive: "unique" } })
  },

  async down(prisma) {
    // repeat EVERYTHING from above, but in the reverse order
  }
})

This seems like way too much generated/tedious code in my opinion, especially once you scale up. I think it would be a good idea to leverage the existing steps.json and simply insert a new type of step that has a tag like "ExternalMigration" or something, that comes with a path to the migration file. Since most migrations can be boiled down to Create new columns/tables -> Move the data around -> Update columns/tables -> Drop columns/tables, this would allow you to seamlessly insert a "Data Migration" step somewhere in your generated migration. Here's an example of what I mean:

migrations/20200717142452-my-migration/steps.json

{
  "version": "0.3.14-fixed",
  "steps": [
    {
      "tag": "UpdateField",
      "model": "Following",
      "field": "id",
      "type": "Int"
    },
    {
      "tag": "CreateDirective",
      "location": {
        "path": {
          "tag": "Field",
          "model": "Following",
          "field": "id"
        },
        "directive": "id"
      }
    },
    {
      "tag": "CreateDirective",
      "location": {
        "path": {
          "tag": "Field",
          "model": "Following",
          "field": "id"
        },
        "directive": "default"
      }
    },
    {
      "tag": "CreateArgument",
      "location": {
        "tag": "Directive",
        "path": {
          "tag": "Field",
          "model": "Following",
          "field": "id"
        },
        "directive": "default"
      },
      "argument": "",
      "value": "autoincrement()"
    },
    {
      "tag": "ExternalMigration", <--------- there it is!!!
      "path": "./data-migration.ts"
    },
    {
      "tag": "DeleteDirective",
      "location": {
        "path": {
          "tag": "Field",
          "model": "Following",
          "field": "id"
        },
        "directive": "unique"
      }
    }
  ]
}
// migrations/20200717142452-my-migration/data-migration.ts
import { migrate } from '@prisma/migrate`

migrate({
  async up(prisma) {
    await prisma.someModel.rows.forEach(r => {
      // move data around with typescript
    });
    await prisma.executeRaw(`some SQL stuff`) // ...or just use SQL
  },

  async down(prisma) {
    await prisma.executeRaw('reverse the previous changes')
  }
})

It could even come with a json stub like {"tag": "ExternalMigration", ...} so that you can then directly copy/paste it into steps.json at whichever point you want it to execute. But primarily, I think it would be great if prisma could have this kind of intermediary data-migration step rather than the before.sh and after.sh that are currently in the spec.

As for custom indexing and an escape hatch for things prisma doesn't support -- I think those belong in the declarative prisma.schema rather than in migrations, but to be honest I haven't given that much thought to that since it's not something that I use in my day-to-day projects.

@cannikin
Copy link

We've got a PR for adding Data Migrations natively to RedwoodJS (here are the docs): https://deploy-preview-256--redwoodjs.netlify.app/docs/data-migrations

It was inspired by the data-migrate gem for Rails.

We now have three tasks that run during a deploy (db save is our wrapper around prisma migrate up --experimental):

yarn redwood db up && yarn redwood dataMigrate up && yarn redwood build

It never occurred to me that Prisma could add this functionality natively so I just started building our own. 😬 But ideally these would tie into the migrate workflow so they can run in the proper order. Right now our process does have the downside that Prisma migrations and these data migrations are run completely separately, so you need to be careful if you're migrating data that could result in data loss, like removing a column. You need to deploy in two steps:

  1. Add new table/column and migrate date
  2. Remove old table/column

@AviBueno
Copy link

Have similar needs, one example (similar to @ryanking1809's) is detailed in #3470
Looking forward to updates on the subject.

@Mautriz
Copy link

Mautriz commented Sep 18, 2020

Same here, the possibility to use imperative sql alongside declarative migrations would solve all my problems, and I would use prisma for every project. As of now I'm kind of confused on how to do custom migrations in order. Would br awesome

@erawk
Copy link

erawk commented Oct 6, 2020

This is a blocker for us. We're rolling our own post-deploy data migration scripts. Having something similar to Rails's ActiveRecord and data-migrate would be fantastic. In the mean time, we're doing something similar to Redwood above.

@AlexMenor
Copy link

I'm interested in this too! So I can declare views/indexes/on cascade statements and have my db ready with one command.

@albertoperdomo
Copy link
Contributor

albertoperdomo commented Nov 19, 2020

Hello,

I'm a product manager at Prisma, currently working on schema migrations.

As previously mentioned in this issue, we are working hard on taking Prisma Migrate to General Availability. As part of this work, we are making some important changes to how it works.
One aspect that's changing is that migrations will now be auto-generated but imperative, and there will be an option to create an empty migration file and implement your own logic inside of it.

We have an Early Access program for the Prisma Migrate with these set of changes and are looking for users who can test it and provide feedback. You can find more info and join the conversation in our #product-feedback channel on our public Slack community. You can join the Slack community here.

@tomhoule
Copy link
Contributor

As of 2.13.0, the new migrate with imperative migrations (as SQL, currently) in in preview. Feedback is very much wanted :)

@AlexMenor
Copy link

I'm using it already and I'll report as needed

@Jolg42 Jolg42 changed the title Imperative Migrations Imperative Migrations with a TypeScript DSL Dec 15, 2020
@tomhoule tomhoule added the kind/feature A request for a new feature. label Dec 17, 2020
@tomhoule tomhoule transferred this issue from prisma/migrate Dec 17, 2020
@tomhoule tomhoule added the team/schema Issue for team Schema. label Dec 17, 2020
@machendos
Copy link

@albertoperdomo are you currently working on migrations using DSL of typescript? Now, as for me, the absence of this possibility is the only thing that stops from introducing the prism to large projects. Since without it we can't applying to DB changed, based on the typescript, for example, inserting values from dictionary enums in automatic mode. Or, was the work on migrations finished after the introduction of migrations based on the SQL?

@florianmartens
Copy link

florianmartens commented Aug 22, 2021

I'd also love some info on this. I really like working with prisma so far but the fact that this seems to be a limitation of the migrations system could be a real concern / show-stopper for me. While it might be sufficient for small projects to use migrations that are .sql only, this introduces pretty drastic contrains for larger projects.

It would be great to know if the Prisma team conciders .sql migrations as sufficient or wants to move to a solution that allows for scripting.

@tomhoule
Copy link
Contributor

It's a complex question, there is no written stance at Prisma at the moment, the discussion is still definitely open. This issue was opened at the time of the experimental migrate version, where migrations were .json files, declarative, non-deterministic and with very little customization possible. The migrate we have now, with SQL migrations, gives you a lot more freedom, while still being deterministic. It was also the easiest to implement reliably and without spending time and effort designing a DSL (like any other software project, we have to prioritize between many features and we have limited time).

My individual, personal view is that since the SQL migrations are generated, you only need to review them and maybe customize them, which is easier than writing them yourself from scratch, so I think they are fine.

These are a few questions that come to mind when discussing a Typescript DSL for migrations on SQL databases:

  • What is the execution model of the DSL? Is it a script that results in an object that will be interpreted to SQL, or a script that can interact directly with the database, do API calls, use dependencies, etc.? The first case is a lot more restricted, but would give Prisma more context on the content of the migration, and can be made deterministic. The second case introduces much more complexity, and lets you write migrations that will easily break when you upgrade your TS dependencies. Migrations can't easily be refactored, they have to keep working the same for as long as your project keeps them around.

  • Migrate fundamentally relies on migrations being determinisitc. Applying the same sequence of migrations twice should always result in exactly the same database schema, otherwise we can't reliably generate new migrations and even diagnose what the database schema actually is/should be. This means authors of TS migrations would have to be very careful with conditional logic and non-deterministic things like API calls.

@florianmartens
Copy link

florianmartens commented Aug 23, 2021

Thanks @tomhoule, what you say definitely makes sense and is very interesting to think about. My current use-case is as follows: I'm working with a simple text-based field named description. We now decided to let users interact with this field using a WYSIWYG editor and need to transform existing descriptions into stringified versions of an WYSIWYG competible object. The transformation requires the use of a library.

This is a relatively simple use-case but around 80% of my migration scripting use-cases are in that ballpark. Both migrations hooks as well as DSL migrations would potentially solve this (although I'd prefer DSL migrations).

I understand the need for deterministic migrations but don't think this necessarly needs to be enforced programatically if it leads to limitations as such. What is wrong with advising users to not apply db schema migrations conditionally? I think many of us have experience with other migrations systems that wouldn't stop us from doing this, yet I'm assuming that very few of us have ever mutated our db schema conditionally as it screams bad practise.

I also understand that your resources are limited and am super thankful for all the work you've put in already. I just think prisma would really become so much more useful if the migration system would support:

  • Down migrations
  • Scripting
  • Migrating to an arbitrary point in the migration history

@tomhoule
Copy link
Contributor

Thanks for the context @florianmartena, that definitely makes a lot of sense. You raised the idea that migration hooks would also be OK for that, and I think we're more aligned than it looks like at first. If I understand your use case correctly, it's about using a JS/TS library to migrate data, rather than the database schema.

There is room for improvement in migrate to support data migrations, and there a DSL would totally be the way to go — the question is whether that DSL should also be used for schema migrations. There is a lot to be said for keeping them separate (see this classic blog post from the Rails world. If you have input for designs in that direction, it is super welcome and will definitely be taken into account — there is a lot of feedback in GitHub issues and a lot of different projects running at the same time, but we always read and discuss all of it. This problem could also benefit from its own issue.

@vladinator1000
Copy link

Hi y'all, what about generating imperative .sql migrations with code? This way only the generated .sql needs to be correct and it can use the existing migration history system.

@janpio
Copy link
Member

janpio commented Aug 5, 2022

Does this mean you leave this up to (hacky) user space?

No, it means a user asked if someone was interested and I replied for all the people in this thread that there indeed are.
If we considered this done with that, we would have closed the issue.

@JulianAtPave
Copy link

Tossing in my hat to say that my org is fairly deep into Prisma in production and I'm having to hand-build a data migrations pipeline because it's missing in Prisma. I'm going to go with a similar solution to @marek-hanzal to use umzug for managing which migrations have run or not, but it seems a heck of a feature to be missing.

@justindasilva
Copy link

I was in need of this feature today and was surprised to find out that I couldn't do this in Prisma as I had been using it back in the day with Drupal (using hook updates) for the longest time.

I would love to see this officially supported by Prisma.

@leohaarmann
Copy link

The easiest way would be to differentiate between schema migrations and data migrations

Some may argue that other ORMs like TypeORM do not differentiate between different types of migrations but that would always lead to type inconsistency in Prisma’s context. In addition, Redwood also took a similar approach (https://redwoodjs.com/docs/data-migrations).
When adding a column imperative (or do anything else causing schema changes), the client has to be generated to sustain typesafety. Here is a fictional code snippet to demonstrate the problem:

prisma.user.addColumn('company');
prisma.user.create({
  data: {
    company: 'example company name', // <-- this entry does not exist yet!
    // .. other entries
  },
});

The new data migration part

To create a new data migration I would suggest extending the migration command with a new --data flag like
npx prisma migrate dev --data --name exampleDataMigration

This command creates a folder in the default migrations folder containing one TypeScript file following the naming convention (unix Timestamp + migration name in snake case).

The generated TypeScript file exports a class implementing the PrismaDataMigration interface which provides two functions with the prisma client as parameter. Here is an example:

export class TimestampMigrationName implements PrismaDataMigration {

  // called when migration gets applied
  async up(prisma: PrismaClient) {
    await prisma.user.create({
      data: {
        company: 'example company name',
        // .. other entries
      }.
    });
    // more data migration logic here..
  }

  // called when migration gets reverted
  async down(prisma: PrismaClient) {
    await prisma.user.deleteMany({
      where: {
        company: 'example company name',
      },
    });
    // more revert logic here..
  }

}

All data migrations should be tracked in the migration history (_prisma_migrations table) too and executed in the default order (oldest first, newest last).
Before deploying the data migration it is very important to generate the client.

What are you thoughts on this concept?

@elderapo
Copy link

@leohaarmann migrations should not depend on PrismaClient because if they do you will need to keep all the previous versions of the PrismaClient (with different schemas) each for every migration.

@leohaarmann
Copy link

@leohaarmann migrations should not depend on PrismaClient because if they do you will need to keep all the previous versions of the PrismaClient (with different schemas) each for every migration.

Not necessarily. Take a look at the long-term maintainability section from Redwood (https://redwoodjs.com/docs/data-migrations#long-term-maintainability).

Sure, it is a trade-off but using the PrismaClient ensures typesafety in all migrations which is more important in my opinion. Especially because one of Prisma's biggest claims is being typesafe.

@elderapo
Copy link

elderapo commented Dec 20, 2022

@leohaarmann What if:

  • First migration creates a sample user with username, password and email columns.
  • Second migration either deletes the email column, adds a new column age, or renames username to name.
  • First migration is no longer compilable by typescript.

It simply won't work unless you keep multiple versions of the PrismaClient.

@leohaarmann
Copy link

leohaarmann commented Dec 20, 2022

@leohaarmann What if:

  • First migration creates a sample user with username, password and email columns.
  • Second migration either deletes the email column, adds a new column age, or renames username to name.
  • First migration is no longer compilable by typescript.

It simply won't work unless you keep multiple versions of the PrismaClient.

This problem already has been discussed here and yes, you definitely have to write some extra code to get no TypeScript errors. That is what I meant with trade-off.

I think @franky47 made it quite clear here:

@LMaxence I was more referring to "data migrations": once the schema is in a given state (changed either via the current prisma migrate SQL-based approach or by a code-driven approach as you propose), running code at runtime to modify records based on some logic.

This is often used in expand-contract migrations, where for example you would want to add a new column which contains data that can be computed from a mix of the existing record and external data:

  1. Add the new column, marked as optional
  2. Run data migrations to iterate over records and fill up the new column with computed data
  3. Mark the new column as required and remove old columns (for normalisation)

If step 2 was to use the Prisma Client to run those data migrations, then it would be hard to keep it in sync across schema versions. But because this is a TypeScript matter only, and those data migrations should only run once like their SQL counterparts, a simple // @ts-nocheck at the top of the data migration file would work to suppress those errors.

By the way, another trade-off would be that you can not squash data and schema migrations (https://www.prisma.io/docs/guides/database/developing-with-prisma-migrate/squashing-migrations).

So coming to your example you would have to modify the first migration:
As @franky47 mentioned you could simply add // @ts-nocheck at the top of the data migration file.
Alternatively you could also just use type assertions like

import { Prisma } from '@prisma/client';

prisma.user.create({
  data: {
    laterDeletedColumn: 'value123',
  } as Prisma.UserCreateInput & { laterDeletedColumn: string }, // or just as any
});

So just to be super clear: You definitely have to modify your TypeScript code in a rather ugly way (with type assertions) or just ignore it with // @ts-nocheck to work with the prisma client.

But since migrations usually only run once and almost no one will ever rewrite old (already executed) migrations it is a conscious trade-off.

I would definitely prefer having typesafety while implementing a new migration and add a few // @ts-nocheck to old migrations than not having this feature and write complex SQL on my own :)

@elderapo besides this concept, how would you concept a solution for this issue?

@JulianAtPave
Copy link

This same discussion came up at my organization and we landed on a similar conclusion to @leohaarmann . You can't easily make it so that every data migration will always work forever, but who cares?

I want the migration to work now because data migrations are (IMO) by definition run once. If you need a data migration in order for your tables to be consistent, that's not a data migration, that's a schema migration and transactional guarantees become more important than things like keeping the tables from locking or type safety.

Similarly, as @leohaarmann pointed out you actually can maintain those old migrations if you really care but just commenting them out or adding some ignores does the trick too

@elderapo
Copy link

elderapo commented Dec 20, 2022

@leohaarmann Migrations are kind of "write and forget" so I think manually writing/generating (typeorm style) raw SQL queries or using API like:

await prisma.db.alterColumn(...);
await prisma.db.dropColumn(...);

then making sure they work, and "forgetting about them" is the way to go. Editing old migrations sounds like a really bad idea.

If you want to have type-safety when writing migrations then there is a possibility of writing migrations using PrismaClient and after you're happy with the migration you wrote (after you made sure it's correct and it works as expected) replacing calls like:

prisma.user.create({
  data: {
    username: "Tom",
    age: 123
  }
});

with

prisma.$queryRaw`INSERT INTO user ...`;

either manually or maybe using some kind of codegen.


@edit

Another problem with using PrismaClient in previous database migrations is that it could break middlewares, extensions etc...

Example (wouldn't do that in the real world but you get the point...):

  1. Create a middleware that changes the username of the inserted user to be always uppercase.
  2. Create a migration that inserts sample user tom.
  3. Create environment 1 and you end up with the expected TOM user in the DB.
  4. Remove said middleware or alter it to transform username to lowercase.
  5. Run exact same migrations on the new environment.
  6. You end up with TOM on env1 and tom on env2 but you anticipated to always end up with TOM username for the sample user at the time of writing migration.

So by changing PrismaClient configuration you basically alter the behavior of the already written migrations. This wouldn't have happened with raw queries because migration inserting sample user would have uppercased TOM in the raw sql.

@JulianAtPave
Copy link

I think manually writing/generating (typeorm style) raw SQL queries or using API like:

I don't see the major benefits to this versus just doing data migrations through the existing Prisma schema migration tool? You get TS and query building but if you're already in raw query land why not just write raw queries and have them be nice and centralized in your migrations folder?

My org also ran into this exact problem in that nobody could nail down a sufficiently precise definition of "data migration". I interviewed a number of devs and that was the problem I kept running into that people kept placing whatever their particular immediate or non-immediate need was on top of the idea of a "data migration".

For example, does a data migration need to have:

  1. Transactionality?
  2. The ability to be run from any version of the schema?
  3. The ability to be run from any version of the underlying app?
  4. The ability to be rolled back?
  5. The ability to be run more than once?
  6. Any sort of built in type safety?
  7. Plenty more I can't think of since I gave up on the project

People having different answers to those sorts of questions leads to wildly different opinions on what is obviously going to work or not. To me a data migration is:

  1. Transactional by default, but can be overrideen
  2. Is written under the assumption that there is a single target schema for the migration to run. All previous and future schemas are irrelevant to any individual data migration because:
  3. data migrations are run once to fix a problem. Either they fix the problem or they don't, but if you need to run it multiple times with varying inputs - it's not a data migration.

To me, that makes using the PrismaClient in an umzug style up/down function a no-brainer. That being said, I work in an org where there is one live version of the schema at any given time. If you were designing for being able to handle many different schemas all being considered equally valid across the org, this requires additional coding to make things backwards compatible, but I don't think there's any way to get backwards compatibility for "free" other than to ignore type safety relative to the DB schema.

@elderapo
Copy link

elderapo commented Dec 21, 2022

I don't see the major benefits to this versus just doing data migrations through the existing Prisma schema migration tool?

I don't think the existing Prisma schema migration tool works for all the migration cases that's why this issue was created. How else would you write a migration for adding a non-nullable column assuming there are already production environments in the wild? You need to:

  1. Add a temporary nullable column.
  2. Fix existing records so all previously inserted records (with old schema) don't have null in the newly created column anymore.
  3. Alter the newly created table column to be non-nullable.

The above migration is possible to create using raw SQL queries assuming you have somewhere to automatically pull data from (for point 2).


Using PrismaClient for migrations is a bad idea because PrismaClient essentially changes with every schema change (and other changes like middleware, extensions, etc.) and what that does is change the behaviors of previously written migrations/executed which should be immutable (after it has been deployed on some important environment). For the same reason, migrations should not import "reusable" code because this "reusable" code can be changed in the future which breaks the immutability of the previous transactions.

I'd add immutable and self-contained points to your migration definition.

@franky47
Copy link

To data-migrate non-nullable new columns based on computed data (whether from the database or from outside sources), while keeping production downtime to a minimum (ideally zero), there's quite a little dance to set up.

In a previous comment, I linked to the expand and contract pattern, which involves not only database operations, but also adapting the application code for correct reads and writes based on what phase we're at. It's not something that can be automated away with a tool.

Now, coming back to doing data migrations with Prisma Client, there are a few roadblocks in place, aside from typesafety:

  1. There is no easy way to iterate on every record in a table. There are pagination utilities, but that adds a lot of boilerplate around iteration. Streams could help here.
  2. It's likely that row migration code (computing values based on other data) will need to run in a transaction. Interactive transactions are great for this, but since they are experimental, it requires enabling them first. Not really an issue, as I found those to work well, and they'll probably stabilise soon.

@leohaarmann
Copy link

No perfect technical solution

I think the goal of many here is to create a solution with build-in mechanisms which prevents developers from writing "bad" code. But as we discussed before - there is not one specific definition for bad code, it heavily depends on the developer you ask and the current environment (like dev team size, ...).

@JulianAtPave made it quite clear with his statement here:

My org also ran into this exact problem in that nobody could nail down a sufficiently precise definition of "data migration". I interviewed a number of devs and that was the problem I kept running into that people kept placing whatever their particular immediate or non-immediate need was on top of the idea of a "data migration".

I do not think we will get all on the same page when discussing the question of good/bad code.
Here is a compromise which will not pursue any mechanisms concerning code quality/choices:

Code migrations

A code migration can be created by using the following CLI command
npx prisma migrate dev --code --name exampleCodeMigration

It created a folder containing a TypeScript file, which implements the PrismaCodeMigration interface and follows the given naming conventions (unix Timestamp + migration name in snake case).
In contrary to my previous suggestion, the implemented interface does not pass any parameters to the up/down functions.

It would look like this:

export class TimestampMigrationName implements PrismaCodeMigration {

  // called when migration gets applied
  async up() {
    // do whatever you want to do here
  }

  // called when migration gets reverted
  async down() {
    // do whatever you want to do here    
  }

}

No build-in mechanisms or stuff like that.
You can implement whatever you want to - it is up to you as developer to decide whether it makes sense.

In order to make data migrations like in my previous example I would just have to import the prisma client and run the generate command upfront (which is another issue mentioned here #4703).

By the way - right now nothing really prevents you from writing SQL code which breaks the schema and typesafety when running npx prisma migrate dev --create-only, so why shouldn't it be possible to have the same behavior in TypeScript?

co-existence

At then end both ways (new code migration and raw sql only) would just co-exist. The raw SQL approach already exists with npx prisma migrate dev --create-only. Devs who prefer @elderapo's approach could still go with raw SQL. Devs like me would go with the new code migrations.

Easy maintenance and competitors

In addition, this feature would be super easy to maintain since it does not implement any constrains.
Other ORMs like TypeORM, Sequelize, Django's ORM and ActiveRecord offer running code in migrations. As mentioned in a few comments before like @BenStirrup's comment, it is crucial for many dev teams especially in production environments.

Any update on this feature, that is running migrations from JavaScript/TypeScript files ? Django's ORM has been supporting this feature for years, it is huge let down regarding data migration that Prisma cannot support. As far as I know, TypeORM does support this feature.

Edit : we eventually decided to use TypeORM instead of Prisma, and the lack of this feature was a huge point in deciding to do so...

What are your thoughts on the code migration approach?

@janpio
Copy link
Member

janpio commented Dec 21, 2022

Quick note from the Prisma side (who is obviously watching this conversation): The most important part of the issue here is you all leaving a precise description of your use case. As you also highlighted "data migrations" means all kinds of things to different people, so having an as extensive as possible list of these is good start for us. The design suggestions are of course also welcome, but the use case description are essential to use to be able to figure this out later. Thanks.

@kokokenada
Copy link

My use case for this is that I use Postgres JSONB field types to sweep up complex data structures. There's a typescript interfaces that describes the format (and often helper/utility functions). From time to time I need to change the format of the JSON and want to run a migration. This is possible at the database level with postgres JSON support but much more painful compared to writing something in typescript where I can leverage the types and other code that I have. So as part of running the migration, I want to be able to import code from my app.

@villesau
Copy link

villesau commented Feb 4, 2023

I have a very simple use case: I need to be able to generate CUIDs while splitting a table and populating data from old table to the new one. I don't want to copy the id from table to another.

I think the default and the first class method could still very well be an sql file as it definitely covers most of the cases and is usually safer, but a possibility to run raw sql mixed with JS would be helpful in many more complex cases.

To keep it simple, only forward migrations should be supported, and there could be a special restricted prisma client that would only expose $raw method. This would probably solve most of the issues.

So I would treat code based migrations as an escape hatch only.

@antmarot
Copy link

antmarot commented Feb 6, 2023

My use case is a column containing some kind of instructions (in a given DSL) as strings.
We want to drop the support of that DSL in favor of JSON blobs, which will make it easier to integrate with for consumers.
I would need the ability to run code in order to transform the former format into the new one in a new column.

@cyrilchapon
Copy link

I basically don't understand any argument here, nor am able to find a legit one against improving migrations with a DSL language for this.

Prisma is a fantastic piece of library, such a wonderful ORM. But everytime I ask myself "which ORM should I choose this time ?" (On every project I start), every. single. time the migration topic is in the "cons" column for Prisma.

Almost 400 issues with a title containing the word "migration" should be a strong hint that it's time to face the reality : it was a nice try, but the migration system is bad. As simple as that.

The above comment states that "what needs the Prisma teams are some piece of use-cases"; well here it is, as a user-story :

As a developer, I want to generate migrations for my database, with an understandable CLI, then tweak the migrations to fit my needs, in a language which is not from the 90s

  1. That distinction between "structure" and "data" migration...

...is a total nonsense. In the actual world, when you migrate the structure, 50% of the time you'll migrate the data.

The only correct way to do this, is to do this in a transaction. Oh wait... not every DB systems can use migration for data structure ? Guess what : you just have to tell them to switch to a non-crappy DB engine !

Examples :

  • split a table in 3 (create / select / insert / drop)
  • change the id column type (Alter / update / alter)
  1. PrismaClient cannot be used here because it changes when generated

Well refactor it. Underlying low level proxy methods should just not be part of it; and should not change when generated.

  1. We don't want to encourage people to write bad code

Yeah for sure. Let's rather encourage them writing SQL by hand, and bypassing the migration flow when migrating data outside of migration transactions.

Total nonsense.


This might sound spicy, and I'm sorry about that. But please take note that I LOVE that ORM; deeply respect the hard work done on it. And all of this are just IMHO legit arguments from someone (just like most backend developers) who has written hundreds of migrations over the years, have worked with tens of ORMs on tens of environments, and is dying to finally be able to use this fantastically promising ORM without thinking "ha, I had forgotten that strange migration system"...

@nickluger
Copy link

Yeah for sure. Let's rather encourage them writing SQL by hand, and bypassing the migration flow when migrating data outside of migration transactions.

With Prisma I started living the thug life and runnin ma run-manual-migration.ts script whenever I want 😎. CI can't stop me!

@emmanuelbuah
Copy link

I have kept a close watch on this topic and aware the prisma community cares a lot about it. Db migration is a difficult problem to solve but the cleanest I have seen is with edgedb - https://www.edgedb.com/docs/intro/migrations. It not perfect but it's the best I've see so far.

I use prisma and hope it gets inspired by edgedb in improving its migration strategy.

@cyrilchapon
Copy link

cyrilchapon commented Apr 6, 2023

@emmanuelbuah yes, and the same goes with umzug; when used programmatically which can become a solid brick for simple and powerful migration systems too.

Here is an example from a personal repository :

README.md

(notice how things are simple-and-stupid, without any magic in it)

yarn mig --help

# Create a migrations
yarn mig create <name>

# Execute all pending migrations
yarn mig up

# Execute next migration
yarn mig up --one

# Revert all migrations
yarn mig down

# Revert last migration
yarn mig down --one

# Get migrations status
# Success if up-to-date
# Errors if pending found
yarn mig status

migrator.ts

import { Command } from '@commander-js/extra-typings'
import { appEnv } from '../lib/env/app-env.js'
import { createWhateverDbConnector } from '../lib/whatever-db-connector.js'
import { createUmzug } from './umzug.js'

const whateverDbConnector = createWhateverDbConnector(appEnv.DATABASE_URL)
const umzug = createUmzug(whateverDbConnector, 'migration')

const program = new Command()
program.name('mig').description('Perform database migrations powered by Umzug')

program
  .command('create')
  .description('Create a new migration')
  .argument('<name>', 'name of migration to create')
  .action(async (name) => {
    await umzug.create({
      name: name,
      allowExtension: '.ts',
    })
  })

program
  .command('up')
  .description('Execute pending migrations')
  .option('--one', 'just execute next migration')
  .action(async (options) => {
    await umzug.up(!!options.one ? { step: 1 } : undefined)
  })

program
  .command('down')
  .description('Revert executed migrations')
  .option('--one', 'just revert previous migration')
  .action(async (options) => {
    await umzug.down(!!options.one ? undefined : { to: 0 })
  })

program
  .command('status')
  .description('Get migrations status')
  .action(async () => {
    const pendingMigrations = await umzug.pending()
    if (pendingMigrations.length > 0) {
      program.error(
        `Found pending migrations, execute \`mig up\` to run them.\n${JSON.stringify(
          pendingMigrations.map((mig) => mig.name),
        )}`,
        {
          exitCode: 1,
          code: 'pending.migrations',
        },
      )
    }
  })

const go = async () => {
  try {
    await program.parseAsync()
  } finally {
    await createWhateverDbConnector.disconnect()
  }
}

void go()

migration-xxx.ts

import type { Migration } from '../umzug.js'

export const up: Migration = async ({ context: { whateverDbConnector } }) => {
  await whateverDbConnector.transaction(async (tx) => {
    await tx.createTable('stuff')
    // await some data migration
    await tx.dropTable('other-stuff')
  })
}

export const down: Migration = async ({ context: { whateverDbConnector } }) => {
  await whateverDbConnector.transaction(async (tx) => {
    await tx.createTable('other-stuff')
    // await revert data migration
    await tx.dropTable('stuff')
  })
}

Things are so simple here, understandable, flexible. What is the point of missing that ?

@Daidalos117
Copy link

I was trying one of the ways with umzug. After using it, the Prisma migration tool (when I want to create new SQL migration) detects drift. So basically it's not possible to use umzug and prisma migration together right?
Anyway, it's really really bad that Prisma itself doesn't solve this by itself. I honestly cannot comprehend how any bigger application can work without script migrations.

@janpio
Copy link
Member

janpio commented Jun 20, 2023

I was trying one of the ways with umzug. After using it, the Prisma migration tool (when I want to create new SQL migration) detects drift. So basically it's not possible to use umzug and prisma migration together right?

You just need to tell Prisma about the change you did with the external tool. That would mean creating the SQL migration file that affects anything schema related, and marking it as already applied to the database with migrate resolve --applied: https://www.prisma.io/docs/reference/api-reference/command-reference#migrate-resolve

Anyway, it's really really bad that Prisma itself doesn't solve this by itself.

We at Prisma agree, which is why this issue exists. We will work on this when we have capacity.

@cyrilchapon
Copy link

cyrilchapon commented Jun 20, 2023

@janpio , that's so great to hear ❤️

If I could add just one thing here, is I also tried myself to just-don't-use-prisma-migration-tool and go with another one like Umzug; and faced the same issue.

It would be a great addition, maybe, to allow Prisma just to report the DB "state" versus Schema. Just that; and print this to the console; maybe with an option to process.exit(1) to run this in CIs.

@janpio
Copy link
Member

janpio commented Jun 20, 2023

We got you: https://www.prisma.io/docs/reference/api-reference/command-reference#migrate-diff (If you have questions on how exactly to do that, best to it in a discussion so we do not spam all the people subscribed here with notifications. Thanks.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature A request for a new feature. team/schema Issue for team Schema. topic: migrate sdk topic: migrate
Projects
None yet
Development

No branches or pull requests