feat(data-modeling): implement automatic relationship inference algorithm COMPASS-9776 #7275

gribnoysup · 2025-09-03T10:36:37Z

This patch adds the logic for automatically discovering relationships between collections following the algorithm we outlined in the "Infer relationships" scope. This behavior is behind a feature flag until we add the toggle to disable the discovery to the schema generation UI, this is the next chunk of work for the project.

I added some collections to the sample_airbnb database in our teams "Cluster 0" cluster if you want to test it. If you create a data model for this namespace, it should identify relationships between listingsAndReview, reviewers, and reviews collections:

…ithm

…ll option; gate automatic inference with a feature flag

…logid

Anemy

Looking good, left a couple comments, no blockers.

There are a few places we could check the abort signal to fail bit quicker if we want but I don't think we need to go into that now, similarly if we know it's a view already we don't need to do index checks. We aren't special casing views yet in data modeling though.

packages/compass-data-modeling/src/store/analysis-process.ts

packages/compass-data-modeling/src/store/relationships.ts

packages/compass-preferences-model/src/feature-flags.ts

packages/data-service/src/data-service.ts

packages/compass-data-modeling/src/store/analysis-process.ts

addaleax · 2025-09-09T23:22:32Z

packages/compass-data-modeling/src/store/relationships.ts

+    schema: MongoDBJSONSchema,
+    path: string[],
+    isArrayItem: boolean
+  ) => void,


I don't know if it's just me, but my first thought reading this function is "this callback could have been a generator". Obviously not really all that different in the end, but mentioning it in case you also feel that that would be more idiomatic

That's a good idea and worth doing, let me adjust this

Refactored in 811d84e

packages/compass-data-modeling/src/store/relationships.ts

Co-authored-by: Anna Henningsen <anna.henningsen@mongodb.com> Co-authored-by: Rhys <Anemy@users.noreply.github.com>

…me time

…ce works with no options provided

gribnoysup · 2025-09-10T12:38:25Z

@addaleax @Anemy Thanks for the reviews! I'm rerunning some flaky tests, but otherwise I think I addressed all the feedback, do you want to take another look when you have a moment?

addaleax

Looks great! A few comments but basically LGTM 🙂

addaleax · 2025-09-10T14:03:04Z

packages/data-service/src/data-service.ts

-    });
+    return indexes
+      .filter((index): index is IndexDescriptionInfo & { name: string } => {
+        return !!index.name;


Just for my own understanding, in what type of situation would index.name be false-y?

I don't really know, that's the type we're getting directly from the driver, so I just assumed in some corner cases it might be possible and decided to account for that, can do a bit more digging 🙂

packages/data-service/src/data-service.ts

addaleax · 2025-09-10T14:32:42Z

packages/compass-data-modeling/src/store/relationships.ts

+      ).filter((value) => {
+        // They will be matching in a lot of cases, but the chances of both
+        // local and foreign field in a relationship being _id are very slim, so
+        // skipping
+        return value[0] !== '_id';


Suggested change

).filter((value) => {

// They will be matching in a lot of cases, but the chances of both

// local and foreign field in a relationship being _id are very slim, so

// skipping

return value[0] !== '_id';

).filter((propPath) => {

// The types of _id will be matching in a lot of cases, but the chances of both

// local and foreign field in a relationship being _id are very slim, so

// skipping

return propPath[0] !== '_id';

or at least that's what we're trying to communicate here, right?

Unfortunately, I think this would be a valid pattern that we do care about, where _ids of multiple collections are references to each other as a way of reducing a single document's size (e.g. you have a busy collection with smaller docs and a larger collection with rarely-accessed and potentially large docs that could be referenced via $lookup if necessary)

I think it's fine to skip this for now but if we do, we probably want a TODO ticket to think a bit more about this situation

Hmmm, actually I was really just assuming there would never be a case like that and was trying to optimise a bit, if you think it's a possible scenario I think I can just drop this filter, I don't think performance-wise it's a very big change

Removed the logic in c2c98ff, thanks for bringing this up!

Co-authored-by: Anna Henningsen <anna.henningsen@mongodb.com>

… during relationship discovery

feat(data-modeling): implement automatic relationship inference algor…

87b1b54

…ithm

github-actions bot added the feat label Sep 3, 2025

gribnoysup changed the title ~~feat(data-modeling): implement automatic relationship inference algorithm~~ feat(data-modeling): implement automatic relationship inference algorithm COMPASS-9776 Sep 3, 2025

gribnoysup added 2 commits September 5, 2025 09:52

Merge remote-tracking branch 'origin/main' into COMPASS-9776

909a424

chore(data-service, data-modeling): add proper support for indexes fu…

83bbfff

…ll option; gate automatic inference with a feature flag

gribnoysup added the feature flagged PRs labeled with this label will not be included in the release notes of the next release label Sep 9, 2025

gribnoysup added 6 commits September 9, 2025 11:02

Merge branch 'main' into COMPASS-9776

b34c3d8

chore(data-modeling): add method description; add unit tests

1510814

chore(data-modeling): fix type in test

17fef4a

chore(data-modeling): filter out nullish values from the sample; fix …

022bd0e

…logid

chore(data-modeling): more comments

508b96e

Merge branch 'main' into COMPASS-9776

768447d

gribnoysup marked this pull request as ready for review September 9, 2025 13:35

gribnoysup requested a review from a team as a code owner September 9, 2025 13:35

Anemy reviewed Sep 9, 2025

View reviewed changes

addaleax reviewed Sep 9, 2025

View reviewed changes

gribnoysup and others added 6 commits September 10, 2025 10:20

chore(data-modeling): adjust wording

e40bd32

Co-authored-by: Anna Henningsen <anna.henningsen@mongodb.com> Co-authored-by: Rhys <Anemy@users.noreply.github.com>

Merge remote-tracking branch 'origin/main' into COMPASS-9776

d66be99

chore(data-modeling): do not allow multiple analysis to run at the sa…

d6286a0

…me time

fix(data-service): make sure that _getOptionsWithFallbackReadPreferen…

1578994

…ce works with no options provided

chore(data-modeling): convert traverse to a generator function

811d84e

chore(data-modeling): better comment

2e953fa

gribnoysup requested review from Anemy and addaleax September 10, 2025 12:38

addaleax approved these changes Sep 10, 2025

View reviewed changes

Anemy approved these changes Sep 10, 2025

View reviewed changes

gribnoysup and others added 2 commits September 11, 2025 10:37

chore(data-service): improve _getOptionsWithFallbackReadPreference types

3da4089

Co-authored-by: Anna Henningsen <anna.henningsen@mongodb.com>

chore(data-modeling): do not filter out id fields with matching types…

c2c98ff

… during relationship discovery

gribnoysup merged commit f33ac5e into main Sep 11, 2025
56 of 58 checks passed

gribnoysup deleted the COMPASS-9776 branch September 11, 2025 10:55

feat(data-modeling): implement automatic relationship inference algorithm COMPASS-9776 #7275

feat(data-modeling): implement automatic relationship inference algorithm COMPASS-9776 #7275

Uh oh!

Conversation

gribnoysup commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Anemy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gribnoysup commented Sep 10, 2025

Uh oh!

addaleax left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gribnoysup commented Sep 3, 2025 •

edited

Loading