Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(entity-generator): detect more ManyToMany relations #4974

Merged
merged 2 commits into from
Jan 4, 2024

Conversation

boenrobot
Copy link
Collaborator

@boenrobot boenrobot commented Nov 28, 2023

Auto increment columns in pivot entities are now detected.

In addition, pivot tables are allowed to contain additional props, including relations, but only if those props would not hinder the ORM's ability to freely insert and remove records from the collection.

In other words, those additional props need to be optional, and have defaults that are either null or non-unique.


I wonder if maybe the additional props thing should be behind an option (let's say called "allowAdditionalPropertiesInPivotTables")? Surely detecting the auto increment for the sake of fixed ordering should be unconditional though.

P.S. I'll add some additional tests for coverage's sake after getting some feedback on the above point...

Copy link

codecov bot commented Nov 28, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (b523ae4) 99.78% compared to head (d7158a3) 99.79%.
Report is 10 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4974      +/-   ##
==========================================
+ Coverage   99.78%   99.79%   +0.01%     
==========================================
  Files         221      221              
  Lines       16345    16424      +79     
  Branches     3933     3970      +37     
==========================================
+ Hits        16310    16391      +81     
+ Misses         35       33       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@boenrobot boenrobot force-pushed the pivotMore branch 2 times, most recently from a8741d1 to eb232aa Compare November 28, 2023 11:02
@B4nan
Copy link
Member

B4nan commented Nov 28, 2023

I'd actually say this whole thing should be behind an option (could be opt-out, but it should be configurable).

I'd keep it disabled by default and when allowed, I'd include all columns regardless of their type. Ignoring some columns will most probably break the schema diffing.


Maybe it could be enabled by default too, but still, nothing should be ignored, the main use case is to generate once and start using migrations, which needs to work.

@boenrobot
Copy link
Collaborator Author

boenrobot commented Nov 28, 2023

I'd actually say this whole thing should be behind an option

👍

nothing should be ignored

It's not about ignoring, it's about what qualifies an entity as pivot entity in a M:N relation vs a regular entity that happens to have two ManyToOne relations over the PKs.

For MikroORM to successfully manage a collection of a M:N relation, it needs to be able to freely insert and remove rows in the pivot entity table. Currently (without this PR), this is ensured by forcing the pivot table to not have any other columns. With this PR, I'm aiming to allow extra columns, while still ensuring rows can be added and removed freely when manipulating the collection.

With or without this option being toggled, I'm thinking a pivot entity with additional columns should be included in the output, whereas a pivot entity without any can be skipped in the output, as it is today.

@B4nan
Copy link
Member

B4nan commented Nov 28, 2023

For MikroORM to successfully manage a collection of a M:N relation, it needs to be able to freely insert and remove rows in the pivot entity table

You can define a pivot entity in any way you want, if the values don't have defaults, you need to create the records explicitly (through the pivot entity), this itself is not a problem really. I still don't think we need to distinguish that.

@boenrobot
Copy link
Collaborator Author

boenrobot commented Nov 28, 2023

if the values don't have defaults, you need to create the records explicitly (through the pivot entity)

But... that's my point... if you need to create the records explicitly in the pivot entity, you can't really have a collection property... at least, not a writeable one. So the generator should not generate collections for these.

Or are you saying that with this option turned on, it should generate collections even for pivot entities where records can only be added in the pivot entity, rather than in the collection? (and with it off, it should only generate collections when the collection can be manipulated by itself, rather than only via the pivot entity directly) If so, is there a readonly variant of the collection that can be used for the cases that require direct access to the pivot entity?

@B4nan
Copy link
Member

B4nan commented Nov 28, 2023

But... that's my point... if you need to create the records explicitly in the pivot entity, you can't really have a collection property... at least, not a writeable one. So the generator should not generate collections for these.

You can, for reading, which is 90% of use cases most probably, and that's my point :]

Or are you saying that with this option turned on, it should generate collections even for pivot entities where records can only be added in the pivot entity, rather than in the collection?

Exactly. But then there is the question of the defaults. I'd rather not have another super long option with multiple possible values, just because it's ugly to document those :] Maybe we could have two booleans instead.

I think it's safe to generate the collections by default for things that are safe to work with (so with defaults), configurable but enabled by default, and we could have another bool flag (disabled by default) to allow generating the readonly pivot collections for the rest. But maybe it's not worth the hustle.

If so, is there a readonly variant of the collection that can be used for the cases that require direct access to the pivot entity?

Yeah, the collection can be marked as readonly, it's currently used for the matching method.

@boenrobot
Copy link
Collaborator Author

boenrobot commented Nov 29, 2023

Exactly. But then there is the question of the defaults. I'd rather not have another super long option with multiple possible values, just because it's ugly to document those :] Maybe we could have two booleans instead.

IMO, a boolean which only makes sense if another boolean is set to a certain value is more confusing, but you're the boss 😄

I've added two options called onlyPurePivotTables and readOnlyPivotTables, along with docs for both.

The alternative (which I personally would like better...) is one option with I think a short name, though perhaps requiring long doc for the different values... that being manyToManyRelations: 'none' | 'pure' | 'safe' | 'all' ("none" being to not generate any M:N relations; "pure" being the effective current default, "safe" being the new default of pure + additional columns with non-unique default values; "all" being safe + read only collections).

Yeah, the collection can be marked as readonly, it's currently used for the matching method.

Right, but I meant on a property option / entity schema / typescript level... Anything that stops the user from writing to that collection, or at least indicates to them they shouldn't... I've currently added persist: false to those. Is that right?

Long term, it might be better if there's a dedicated collection type that lacks mutation methods, but that's perhaps for another time.


Side note, auto increment in non-PK field doesn't seem to be properly detected right now, so another TODO before this goes in I guess 😕
Fixed

@boenrobot boenrobot force-pushed the pivotMore branch 2 times, most recently from d7eba19 to 1d51c97 Compare November 30, 2023 07:05
@boenrobot
Copy link
Collaborator Author

boenrobot commented Nov 30, 2023

The example of the read only collection also made me think...

If the ORM is this permissive... What about tables where instead of a composite PK, you have a non-nullable unique composite key? I've added a test case in which the table is called "completed_orders"... We have a PK that is an auto increment, and a single composite unique key, made up from exactly two relations.

Currently, the first relation is mis-detected as a OneToOne relation, but even if that's fixed though that's now fixed, the ManyToMany relation would currently not be detected, since the PK is a non-composite one, and the two relations are not part of a PK... Should it be detected (yet another boolean option that does not include them by default?)? Should such an entity be marked as pivot table? Should be omitted from output if there's no extra columns other than the PK AI (that would be featured for "fixed ordering") plus the unique key columns? Or should omission be kept solely for composite PK pivot tables? If it should be omitted... What about the AI being "int" vs "unsighed int" vs "bigint" vs "unsigned bigint"? Should it be omitted only in one of those cases?


I'm starting to think a whole dedicated prop in the config may be warranted... like

interface GenerateOptions {
manyToManyRelations: {
    // Settings for ManyToMany relations where the pivot entity has a composite PK
    // with exactly two OneToMany relations spanning it.
    // Set to false to not generate ManyToMany relations with such pivot entities.
    primaryKeyPivotTables: false | {
        // Whether to output pivot tables with no additional columns, including fixed ordering one.
        // Such entities are implicitly recognized by the ORM based on the two relations involved.
        // Default is false.
        // Set to true to also output those pivot entities.
        outputPurePivotTables: boolean;
        // Whether to allow for additional optional props with non-unique defaults on the detected pivot entities.
        // Default is true.
        allowOptionalProps: boolean; 
        // Whether to allow for additional required props or optional unique props with defaults on the detected pivot entities.
        // Such relations will also be marked with "persist: false", to denote their readonly status.
        // Default is false.
        allowAdditionalRequiredProps: boolean;
        // Whether to use the table's AI column as the fixed order column in the generated ManyToMany relations.
        // Default is true.
        // AI column needs to be outside of the PK.
        // AI column doesn't count towards "additional optional props" when used for fixed ordering,
        // but counts as one when not used for fixed ordering.
        useAutoIncrementAsFixedOrder: boolean;
    },
    // Settings for ManyToMany relations where the pivot entity has a non-nullable unique index
    // with exactly two OneToMany relations spanning it.
    // Set to false to not generate ManyToMany relations with such pivot entities.
    uniqueKeyPivotTables: false | {
        // Whether to allow for additional optional props with non-unique defaults on the detected pivot entities.
        // Default is true.
        allowOptionalProps: boolean; 
        // Whether to allow for additional required props or optional unique props with defaults on the detected pivot entities.
        // Such relations will also be marked with "persist: false", to denote their readonly status.
        // Default is false.
        allowAdditionalRequiredProps: boolean;
        // Whether to use the table's AI column as the fixed order column in the generated ManyToMany relations.
        // Default is true.
        // AI column doesn't count towards "additional optional props",
        // regardless of whether it is used for fixed ordering or not.
        useAutoIncrementAsFixedOrder: boolean;
}
...
}

For each of these two top level options, maybe also allow for a callback that takes in the pivot table name, and the two FKs involved, and must return the options for this relation (or return falsy value to skip the relation).

@boenrobot boenrobot force-pushed the pivotMore branch 5 times, most recently from a56ed1e to 07c8ab5 Compare December 6, 2023 11:25
@boenrobot boenrobot force-pushed the pivotMore branch 6 times, most recently from 416aade to a1eb5ce Compare December 8, 2023 14:14
Copy link
Member

@B4nan B4nan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the silence, I've been off for a week. Will try to get back to this over the weekend.

I'd appreciate it if you don't try to do too many things in a single PR as it will be harder to review (and therefore to merge). Let's do one thing at a time, it will be also easier to understand from the changelog what has changed. So let's focus on the fixed order collections first, and we can allow more things later on.

If the ORM is this permissive... What about tables where instead of a composite PK, you have a non-nullable unique composite key? I've added a test case in which the table is called "completed_orders"... We have a PK that is an auto increment, and a single composite unique key, made up from exactly two relations.

You cannot have a relation like that, relations can be built only based on FKs/PKs, not some random unique constraint. We would have to first support that in the ORM, adding support to the entity generator is the very last step to be done. I already have some ideas on how to support this but will need proper testing to see if it's feasible. The population mechanism depends on the fact that entities are always referred by a PK.

I surely won't have time for that before v6 is out in stable, that is the priority now. But I will be more than happy to improve the entity generator over time, so I appreciate the suggestions.

Copy link
Member

@B4nan B4nan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few comments on the tests for now, haven't checked the implementation much

i'd also think about better names for the config options, i dont like that very much (but its a minor issue so keeping that for later)

@@ -54,7 +54,7 @@ export class ProductColors {
product!: Products;

@Index({ name: 'fk_product_colors_products1_idx' })
@Property({ persist: false })
@Property({ autoincrement: true, persist: false })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems wrong, autoincrement option is for PKs, not FKs

Copy link
Collaborator Author

@boenrobot boenrobot Dec 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

autoincrement option can be applied on any numeric column in place of a default value.

An autoincrement column doesn't have to be PK, and it may be a FK.

In this case, the column is all three - it's part of a composite PK within the product_colors table, the product_id column has a foreign key to the products table, and it has autoincrement.

See

CREATE TABLE IF NOT EXISTS `product_colors` (
  `color_id` INT UNSIGNED NOT NULL,
  `product_id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`color_id`, `product_id`),
  INDEX `fk_product_colors_products1_idx` (`product_id` ASC) VISIBLE,
  CONSTRAINT `fk_product_colors_colors1`
    FOREIGN KEY (`color_id`)
    REFERENCES `colors` (`color_id`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION,
  CONSTRAINT `fk_product_colors_products1`
    FOREIGN KEY (`product_id`)
    REFERENCES `products` (`product_id`)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION)
ENGINE = InnoDB;

And I imagine someone may intentionally do such a design if they want to f.e. pre-make the referenced column rows, and then ensure only the first N ones get populated in the other table... not a typical scenario, I know, but valid, and thus something that should be handled.

Having said all that... The above autoincrement was added by mistake 😅
If anything is wrong here, it's the SQL, not the generator.


I also just realized with this comment that my current implementation won't detect the fixed ordering if a PK column is also the autoincrement... I'll see if I can fix that too. Now fixed... Assuming non-pure pivot tables are allowed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

autoincrement option can be applied on any numeric column in place of a default value.

in fact this only works in postgres, but yes

An autoincrement column doesn't have to be PK, and it may be a FK.

nope, FK is not autoincrement, FK is poiting to a PK which is autoincrement. the flag shouldn't be there.

Copy link
Collaborator Author

@boenrobot boenrobot Dec 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in fact this only works in postgres, but yes

idk about the entity manager or knex level, but on a raw SQL level, the following works fine in MySQL 8.0.35, and probably some earlier MySQL versions too (though I haven't checked such):

INSERT INTO colors (`name`) VALUES ('red'), ('white'), ('blue');

INSERT INTO products (`name`) VALUES ('Shirt'), ('Pants'), ('Phone');

INSERT INTO product_colors (color_id) VALUES (3), (1), (2);

The AI is part of the PK, and that column is also a FK. It's giving exactly what you'd expect in product_colors:

color_id	product_id
3		1
1		2
2		3

and trying to insert a 4th item, as expected, fails due to the FK constraint.

In the many_to_many_variants schema, this also works fine on a raw SQL level on MySQL 8.0.35:

INSERT INTO users (`name`) VALUES ('a'), ('b'), ('c');

INSERT INTO orders (`name`) VALUES ('shirt'), ('pants'), ('phone');

INSERT INTO user_orders (user_id, order_id) VALUES (2,3), (1, 1), (1, 2), (3, 1);

giving, as you'd expect in user_orders

user_id	product_id	priority
2	3		1
1	1		2
1	2		3
3	1		4

and the "priority" column there is AI, but is not part of the PK, and is not a FK.

@@ -0,0 +1,242 @@
import { MikroORM, MikroORMOptions } from '@mikro-orm/mysql';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would prefer sqlite since this does not contain anything mysql specific, and mysql is the slowest driver out of all when it comes to creating the db schema (as well as querying it, but the main difference is in the info schema queries)

or how long does it take to execute this one? maybe its fine, since the discovery should be instant (we dont discover anything), but i still expect the info schema queries to be slower than necessary

Copy link
Collaborator Author

@boenrobot boenrobot Dec 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MySQL has Workbench, which lets me very quickly and visually design schemas with complex relations or other exotic options to tables and columns, that I can then forward engineer into SQL create statements.

There's no equivalent to that for SQLite and Postgresql. There are table managers where you can define those things, sure, but you can't draw these things out first and forward engineer them once you're ready.

I've seen maybe one or two Postgresql editors that allegedly support this functionality, but are paid (and I'm not paying for that...), while Workbench is free, and I'm not aware of any equivalent (even a paid one) for SQLite.

If the schema, in its current form, can be switched to sqlite with no edits, fine, I can do that... but I refuse to design SQLite schemas from scratch, as that's too much torture.

Copy link
Member

@B4nan B4nan Dec 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are talking about library tests, you are talking about designing schema for complex apps through UI...

but I refuse to design SQLite schemas from scratch, as that's too much torture.

well, and i refuse to merge slow tests (and as said, mysql is the slowest driver we support, especially when it comes to working with information schema) :]

i'd prefer if you change this mindset. working on libraries requires some sacrifices, your (or mine) own comfort (as a contributor) is not the most important part here - it's the produced code (and its quality, performance, and whatnot). imagine i say i dont like mysql because its slow (or any other reason), so i decide not to test anything with it... and believe me that i struggled a lot at some point with mysql, since it took ages for them to provide ARM (I was on M1 pretty early) docker image that would be usable locally - it took seconds to do anything (a single query took 100ms+ without doing anything)... the test suddenly took several minutes just because of mysql.

the difference here is can be really huge, easily 10x, as creating schema with mysql on its own takes ages (lets say hundreds of milliseconds), while with sqlite you only use memory and its close to instant (as you dont write anything to the disk).

anyway, back to my question, how long does the test take? as that's the important bit here, if its a couple hundreds of milliseconds, we can probably keep things with mysql just fine, but i dont want to merge things that take seconds to test (we already have several schema related tests that do, guess what, all of them are in mysql :])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw tried to run the test locally and its around 250ms with my m1 pro, so that's good enough

ENGINE = InnoDB;
`;

beforeAll(async () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why so many init calls, one should be enough? the state we store is the entity discovery, but since there are no discovered entities, i dont think it matters?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At one point I found out that the entity generator also adds the generated entities' metadata to the ORM's state... Not sure if that's a bug, but it appears to be the current behaviour.

So if you generate with one set of options, and then regenerate with a different set of options that results in a different meta data (say, bidirectionalRelations: true vs bidirectionalRelations: false), you'll actually get... ehh... I'm not sure if a merge, or just the first or just last set... but certainly not the same result as generating the first time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if it was already there, but you did some changes that alter the metadata yourself, e.g. here:

https://github.com/mikro-orm/mikro-orm/pull/4892/files?w=1#diff-41bc1ce63697a8a667a96cd32696603b53981ac0b10b598d91ddd22ef5ff663aR151

and the same PR also makes the schema inference dependent on the entity generator ORM config, so that's another problem. i guess lest keep it this way.

Copy link
Collaborator Author

@boenrobot boenrobot Dec 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, you do have a point that the names of the schemas are not random, so... I've now changed the test files with many cases to not drop the schemas in afterAll(), and check for existence on beforeAll().

While this won't make much difference in CI (since containers are recreated anyway; we're just shaving off the drop database call), it will make a bigger difference in local development - as long as the container is still running, you'll get a boost during those tests, as they'll skip the schema creation.

@boenrobot
Copy link
Collaborator Author

i'd also think about better names for the config options, i dont like that very much (but its a minor issue so keeping that for later)

I'm not particularly attached to the names, so let me know what you'd like. Either exact ones, or at least point me what exactly irks you about the current ones, so that I can come up with some alternatives.

@boenrobot boenrobot force-pushed the pivotMore branch 2 times, most recently from db16813 to 3492244 Compare December 28, 2023 13:41
Auto increment columns in pivot entities are now detected.
They are also now properly emitted.

Corrected FK index selection to prefer indexes that match the columns exactly,
and never associate the index with more columns to the property itself
(instead using it at the entity level).
This in turn ensures correctly identifying more ManyToOne relations,
which in turn ensures correctly identifying more ManyToMany opportunities.

In addition, by default, pivot tables are allowed to contain additional props,
including relations, but only if those props would not hinder the ORM's ability
to freely insert and remove records from the collection.

In other words, those additional props need to be optional,
and have defaults that are either null or non-unique.

This can be adjusted with the two new settings
onlyPurePivotTables and readOnlyPivotTables.

Also optimized some of the tests a bit, to never drop the schema,
and create it and its tables only if it doesn't exist already.
The tests trust that if the schema exists, it will be in the expected state.
The DB containers can be re-created if those schemas are changed.
Copy link
Member

@B4nan B4nan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I kinda like the option names in the end, let's keep them.

Let's merge, found just a few formatting issues.

packages/core/src/typings.ts Outdated Show resolved Hide resolved
packages/core/src/utils/Configuration.ts Outdated Show resolved Hide resolved
packages/core/src/utils/Configuration.ts Outdated Show resolved Hide resolved
packages/entity-generator/src/EntityGenerator.ts Outdated Show resolved Hide resolved
@B4nan B4nan merged commit d0e3ac9 into mikro-orm:master Jan 4, 2024
11 checks passed
@B4nan
Copy link
Member

B4nan commented Jan 4, 2024

Thanks!

@@ -169,8 +169,11 @@ beforeAll(async () => {
extensions: [EntityGenerator],
multipleStatements: true,
});
await orm.schema.ensureDatabase();
await orm.schema.execute(schema);
const driver = orm.config.getDriver();
Copy link
Member

@B4nan B4nan Jan 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI ensureDatabase returns true if the database was created, so this can be simplified:

if (await orm.schema.ensureDatabase()) {
  await orm.schema.createSchema();
  await orm.schema.execute(schema);
}

I guess it would make sense to add createSchema: boolean option there too, so it would be just:

if (await orm.schema.ensureDatabase(true)) {
  await orm.schema.execute(schema);
}

edit: also you are not supposed to be passing the database name in there, the schema option is for postgres which supports multiple schemas in one database, the database name itself is always implicit (taken from config) in those methods

edit2: one problem with this approach is that there is a cache for this, and the ensureDatabase is called automatically during MikroORM.init, so for this to work it needs to be disabled

@boenrobot boenrobot deleted the pivotMore branch February 10, 2024 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants