Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a schema validator #2476

Open
ben-manes opened this issue May 23, 2013 · 12 comments
Open

Provide a schema validator #2476

ben-manes opened this issue May 23, 2013 · 12 comments

Comments

@ben-manes
Copy link
Contributor

It would be handy if jOOQ provided a schema validator so that an application can ensure that the generated code matches the runtime database schema. This is particularly useful when the code is generated from one database (such as H2 during for compile and test) and run against another (MySQL in production).

Due to the differences in the SQL dialects, we maintain two sets of flyway migration scripts to minimize confusion. However, that could lead to diverging schemas if a developer forget to update both sets of scripts. Most likely automated integration testing should catch this, but to better avoid surprises based on different side effects a validation phase at start up seems appropriate.

I currently have a naive validator, but I think a more intelligent one would be a useful part of jOOQ's tool chain.

@lukaseder
Copy link
Member

I like the idea. Essentially, the task wouldn't be about validating schemas, but about comparing them (clients could then throw exceptions when a comparison shows differences).

But things are a bit more complex than they seem. There are a couple of problems:

  • The grants used for code generation might not match those of the runtime connection
  • Generated code might have been mapped through codegen-time schema / table mapping
  • Runtime artefacts might have been mapped through runtime schema / table mapping
  • Generated data types might have been changed via <forcedType/>
  • Generated data types might have been changed via other flags
  • Some databases poorly implement DatabaseMetaData
  • org.jooq.Meta is not completely implemented yet. It doesn't introspect primary / unique / foreign keys, etc.
  • DatabaseMetaData does not support advanced data types very well (UDTs, arrays, etc.)

So, I feel that a reliable implementation might not be easy... If this is really about validation, you could re-generate the sources and compare the new generation with the "old" one. The ".java" files should be exactly the same.

@ben-manes
Copy link
Contributor Author

I was thinking of something more best effort and not absolutely correct, so a deep comparison may be too exact. I thought Hibernate's Configuration validateSchema() might be inspiration, though its been a long time since I've used it. That may require keeping extra metadata about how the tables were generated.

@lukaseder
Copy link
Member

From a quick glance, Hibernate's validateSchema() seems to do roughly what your gist does. Well, I can see how this is already useful, even if it's not complete. However, I still think that "comparing" is more generally useful than "validating". So let's think about comparing.

When comparing schema A with schema B, you'll get:

  1. Tables in A but not in B
  2. Tables in B but not in A
  3. Tables in both A and B that are different (A.T and B.T)
  4. Tables in both A and B that are the "same"

For 3), there's more interesting information:

3.1. Columns in A.T but not in B.T
3.2. Columns in B.T but not in A.T
3.3. Columns that switched order between A.T and B.T (we probably want to recognise this, explicitly)
3.4. Columns in both A.T and B.T that are different (data type)
3.5. Columns in both A.T and B.T that are the "same"

In 3.4), data types can be:

3.4.1. Incompatible, e.g. VARCHAR and DATE
3.4.2. Comparable, e.g. INT and BIGINT, or VARCHAR(3) and VARCHAR(4)
3.4.3. Identical

Once the comparison result has been established, a validator can still assess, which changes are acceptable and which aren't. Separating "comparison" from "validation" keeps things a bit simpler at the comparison side... What do you think? Did I miss anything?

@Chrriis
Copy link
Contributor

Chrriis commented May 25, 2013

I have implemented such database structure check in our product. We don't use jOOQ but some concepts might be applicable.

Some background info first.
We use conceptual data types. For example "ENUM", "STRING255", "ID", etc. We have a generated conceptual model which can target any of our supported database (SQL Server, Oracle, MySQL) and our connection provider tells which dialect to use and allows to acquire new connections, which this check leverages.

I don't think there is the concept of a conceptual model in jOOQ, so if you want to target multiple databases you have to generate against one particular database and hope that it works for other kinds simply by switching the dialect. Am I right?

However, I still think that "comparing" is more generally useful than "validating". So let's think about comparing.

This is exactly what we do: comparing. And we do so without using our generated model, except for the fact that we know all came from there. Our conceptual types are a point of reference for example, as well as the way we created our keys and indexes with regards to multi-database support.

An important aspect is that we perform the comparison on multiple databases, not just two. Indeed, we have upgrade scripts and we want to validate that 2 databases with different dialects get upgraded the same way, and we also want to make sure that a newly created database has the same structure as a database from the same dialect that got upgraded. In fact, to cover all cases, we compare each structure against all the others.

What we compare are:

  • Table Z in X is missing from Y
  • Foreign Key (FK) Z in X is missing from Y
  • FK column in X is different from Y
  • FK's PK table/column in X is different from Y
  • Primary Key (PK) Z in X is missing from Y
  • PK column in X is missing from Y
  • Unique Constraint (UC) Z in X is missing from Y
  • UC column in X is missing from Y
  • Index Z in X is missing from Y

In our case, we happen to have the same indexes whatever the target database is. In any case, we could always add whatever exception we want because we have a constrained world. Maybe we could check more like stored procedures or triggers, but this is already a very good base for us.

So, here is what I did:
The tool creates a conceptual representation of the database in which I load all sort of information with database specific queries. To simplify:

DBStructure
  FKDefinition[]
  IndexDefinition[]
  PKDefinition[]
  TableDefinition[]
  UCDefinition[]
FKDefinition
  String fkName
  String fkColumn
  String pkTable
IndexDefinition
  String indexName
PKDefinition
  String pkName
  String pkTable
  String[] columnNames
UCDefinition
  String ucName
  String ucTable
  String[] columnNames
TableDefinition
  String tableName
  ColumnDefinition[]
ColumnDefinition
  String columnName
  String type
  boolean isNullable

After having loaded all the structure, I run the comparison process.

The ColumnDefinition type includes the precision, something that I forge in the queries like "NUMERIC(10)", "VARCHAR(255)", "DATETIME(23)". When comparing the structures, I ask the column definition for comparable types which is a set of corresponding conceptual types. The reason is that several conceptual types may map to the same database type for certain dialects. For example, our mapping in SQL Server of INT and LONG results in "INT(10)", so when I ask the structure for corresponding conceptual types I get 2 results. If 2 columns that I compare have a common conceptual type, I consider them same.

Another aspect which I haven't covered is case sensitivity, which is of importance as you can imagine.

My experience is that when I introduced those checks, which are since part of our nightly tests, we found several problems right away. Our product had more than 10 years of upgrade history and a few discrepencies were identified so this was definitely a good exercise. Moreover, it occasionaly detects issues in new scripts the night that follows the commit, which allows us to fix the script rather than having to add more scripting to correct issues from past scripts that may have been shipped to clients.

@lukaseder
Copy link
Member

Thanks, Chris. That's interesting insight. One thing that strikes me right away is that - once more - I feel there is a need for a more globally available API describing databases, i.e. one that is better than JDBC's DatabaseMetaData. I feel after all these years, the time is ripe for a object-oriented database description API specified by a JSR (see #2371).

Conceptional types are not so orthogonal to jOOQ's "forcedTypes", where user-defined Java types can be mapped to database types through converters...

@lukaseder

This comment has been minimized.

@iceback

This comment has been minimized.

@lukaseder

This comment has been minimized.

@shorn
Copy link

shorn commented Apr 14, 2020

Sharing a use-case:
I've been given a task to create an integration process that reads/updates an "integration table" in a database/schema we do not own. The table structure will be defined and implemented by someone else. I will be connecting to the DB with the integration table with JOOQ.

My dev/testing will be done with a version of the table maintained by myself via Flyway scripts. But when the process runs in integration-testing/production environments, I'm going to want to "sanity test" that integration table before kicking off the process.

So, in this case, I think the proposal from Lukas that I want "a comparison and show differences" is suitable to what I want. It's ok if the integration table is a little different from what I defined in development, as long as the SQL JOOQ executes can run. So, I'm thinking I'd just want to see a "diff" report and decided for myself if the process should run or not.

Idea: it'd be pretty cool if I could give JOOQ my query and a DataSource/Connection and ask "hey, you reckon this query will run on this DB?"

@lukaseder
Copy link
Member

Thanks for the ping, @shorn. Meanwhile, 7 years after issue creation and 1.5 years after my last comment, we actually have the infrastructure for such a feature: DDL interpretation and an actual diff API, which you can already use today:
https://www.jooq.org/doc/latest/manual/sql-building/schema-diff

I'm not convinced that we should implement a feature that checks if a given query would run on a given database. A much better approach to solving that problem is to implement an actual integration test on the given database version. It should be possible to re-create the production database version using your database change management (e.g. flyway), or via test containers, etc.

In the future, however, we'll be investing more time in this set of features via our new set of migrations features, where the above diff is a part of.

@tomdcc
Copy link

tomdcc commented Nov 16, 2020

Another use-case that isn't supported by integration testing: we would like to support connecting our app to a database managed by customers, and it would be good to be able to verify that the schema is in an expected state.

Our migrations are defined in SQL rather than a more structured form such as a liquibase definition, so our generated jOOQ classes are the closest thing that we have to an inspectable expected schema. We can (and probably will) do a comparison manually by comparing database metadata to our jOOQ classes, but such a tool might be useful for others in similar scenarios.

@lukaseder
Copy link
Member

Another use-case that isn't supported by integration testing: we would like to support connecting our app to a database managed by customers, and it would be good to be able to verify that the schema is in an expected state.

These kinds of things are being worked on. I'm currently fixing a lot of issues in the current implementation of DSLContext.meta() to make this work. There's tooling around Meta.migrateTo(Meta) to compare two versions of schemas (e.g. from an actual connection to your DDL scripts), and it already works decently. So, if you'd like to run some tests and report some issues, that would be helpful!

Our migrations are defined in SQL rather than a more structured form such as a liquibase definition, so our generated jOOQ classes are the closest thing that we have to an inspectable expected schema.

Or, just interpret the SQL using DSLContext.meta(Source...) and similar overloads?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants