Skip to content
This repository has been archived by the owner on Mar 20, 2023. It is now read-only.

Improve database reloader #2

Closed
steve-chavez opened this issue Apr 12, 2017 · 6 comments
Closed

Improve database reloader #2

steve-chavez opened this issue Apr 12, 2017 · 6 comments

Comments

@steve-chavez
Copy link
Contributor

Right now the sub0_devtools tries to run the user sql scripts and recreates the db in case of failure, this cycle is slower on later stages of development and in projects with a considerable size of master data.

I have a modest dataset related to some local GIS app that takes 12 seconds to recreate.

For now I propose using apgdiff, with this migrations are easier, say we have:

-- v1.sql
CREATE TABLE user (
    email TEXT,
    pass TEXT NOT NULL CHECK (length(pass) < 512),
    first_name TEXT
);
-- v2.sql
CREATE TABLE user (
    email TEXT,
    pass TEXT NOT NULL CHECK (length(pass) < 512),
    first_name TEXT,
    last_name TEXT
);

To migrate from v1.sql to v2.sql we can do this:

java -jar apgdiff-2.4.jar v1.sql v2.sql                                                                                                   

This outputs:

ALTER TABLE "user" ADD COLUMN last_name TEXT;

To reverse from v2.sql to v1.sql:

java -jar apgdiff-2.4.jar v2.sql v1.sql                                                                         

This outputs:

ALTER TABLE "user" DROP COLUMN last_name;

I think this tool approach for migrations is better than the ones that rely on convention and discipline like sqitch or pgrebase.

The thing is apgdiff only operates on sql files, so I could do a pg_dump -s on the user database and just run apgdiff with this and the user current db defined in his folder structure then apply the diff.

@ruslantalpa
Copy link
Member

yes, i was looking into using it but for migration from dev to production, but the usecase here is also good.

If you can get this approach working then perfect. What i am not sure will work is the fact that in the project structure, in sql files, i use \ir, can this tool handle that or does it need a flat sql file with no psql instructions?

Anyway, you know the thing we are trying to accomplish, "any change in the sql files is immediately reflected in the db, then the containers are refreshed to reload the new schema".
Feel free to try other approaches to make the process faster. It would be a gain if the approach you suggested works in 90% of the time and when it does not, we fall back to resetting the db.

@steve-chavez
Copy link
Contributor Author

I've been trying working the sub0_kickstart example with apgdiff, and It's giving lots of errors:

  • The psql slash commands like \{set,echo} only get ignored if they end with ; if they do not end with this seems the next statement that ends with a ; is ignored.

      -- v1.sql
      \set QUIET on
      create schema data;
      -- v2.sql
      \set QUIET on
      create schema data;
      create table data.items(id int);
      -- diff.sql
      Exception in thread "main" java.lang.RuntimeException: Cannot find schema 'data' for statement 'create table data.items(id int);'. 
      Missing CREATE SCHEMA statement?
  • When adding a new check constraint apgdiff thinks is a new column and it outputs an invalid syntax in the form of ALTER COLUMN "check".

    -- v1.sql
    create schema data;
    create table data.users (  
      firstname            text not null,
      check (length(firstname)>2)
    );
    -- v2.sql
    create schema data;
    create table data.users (
      firstname            text not null,
      lastname             text not null,
      check (length(firstname)>2),
      check (length(lastname)>2)
    );
    -- diff.sql
    SET search_path = data, pg_catalog;
    
    ALTER TABLE users
      ADD COLUMN lastname text NOT NULL,
      ALTER COLUMN "check" TYPE (length(lastname)>2) /* TYPE change - table: users original: (length(firstname)>2) new: (length(lastname)>2) */;
  • Outputs invalid syntax instead of adding a CONSTRAINT when adding REFERENCES to a column.

    -- v1.sql
    create schema data;
    create table data.tasks (
      id           serial primary key,
      project_id   int not null
    );
    -- v2.sql
    create schema data;
    create table data.projects( id serial primary key );
    create table data.tasks (
      id           serial primary key,
      project_id   int not null references data.projects(id)
    );
    -- diff.sql
    SET search_path = data, pg_catalog;
    CREATE TABLE projects (
            id serial primary key
    );
    ALTER TABLE tasks
            ALTER COLUMN project_id TYPE int not null references data.projects(id) /* TYPE change - table: tasks original: int new: int not null references data.projects(id) */,
            ALTER COLUMN project_id DROP NOT NULL;
    -- ERROR:  42601: syntax error at or near "not"
    -- ALTER COLUMN project_id TYPE int not null referenc...
  • Cannot deduce a SERIAL implicit SEQUENCE, failing to do a GRANT.

    -- v1.sql
    create schema data;
    create table data.tasks (
      id           serial primary key,
    );
    -- v2.sql
    create schema data;
    create table data.tasks (
      id           serial primary key,
    );
    create role webuser;
    grant usage on sequence data.tasks_id_seq to webuser;
    -- diff.sql
    Exception in thread "main" java.lang.RuntimeException: Cannot find sequence 'data.tasks_id_seq' for statement 'grant usage on sequence data.tasks_id_seq to webuser;'. 
    Missing CREATE SEQUENCE?

The list is not comprehensive I bet there'll be more errors. I think apgdiff has been thought just for handling simple ddl's. This is not counting the \ir that is maybe a worst issue, at first I thought just doing a cat but I forgot the order matters.

@steve-chavez
Copy link
Contributor Author

Patching apgdiff java parser may be too much of an effort, I think there's a better solution for this.

Instead of a custom parser for pg sql I thought of reusing pg parser, there's is an official guide for this
and it seems is already done for select statements, using this would guarantee syntax correctness and sync with newer releases.

For our use case I'd have to include ddl and dcl statements, these are also included in pg parser.c, maybe I could patch libpg_query or see if I can reuse pg code in another way.

The other issue about psql slash commands, I've seen pg code and it handles in a more simpler way separate from the parser.c, I think just the \{i,ir} could be supported.

This pgdiffer could be another project under the subZero organization, separate from the devtools, if it's done maybe later can be open sourced for having more eyes for the bugs.

It's more of a long term project though but I think it's feasible and it would not only benefit the devtools but solve more complex migration issues, if you think is worth it I could research a bit more and then do a small proof of concept and see where it goes.

@ruslantalpa
Copy link
Member

a good pgdiff that understands all the commands is good and probably will be easyer to write in haskell where you have custom types and easey parsing but it's a complex project so let's put tht on hold, let's get the devtools working with the curent approach of resetting (which could be made faster probably with some tricks)

@steve-chavez
Copy link
Contributor Author

Seeing the amount of errors apgdiff throws with the kickstart I think that it'll be better to not include it in the devtools console, what other tricks could make the resetting faster?

@ruslantalpa
Copy link
Member

ruslantalpa commented Apr 13, 2017 via email

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants