-
Notifications
You must be signed in to change notification settings - Fork 234
feat(data-modeling): implement automatic relationship inference algorithm COMPASS-9776 #7275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ll option; gate automatic inference with a feature flag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, left a couple comments, no blockers.
There are a few places we could check the abort signal to fail bit quicker if we want but I don't think we need to go into that now, similarly if we know it's a view already we don't need to do index checks. We aren't special casing views yet in data modeling though.
schema: MongoDBJSONSchema, | ||
path: string[], | ||
isArrayItem: boolean | ||
) => void, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if it's just me, but my first thought reading this function is "this callback could have been a generator". Obviously not really all that different in the end, but mentioning it in case you also feel that that would be more idiomatic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea and worth doing, let me adjust this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored in 811d84e
Co-authored-by: Anna Henningsen <anna.henningsen@mongodb.com> Co-authored-by: Rhys <Anemy@users.noreply.github.com>
…ce works with no options provided
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! A few comments but basically LGTM 🙂
}); | ||
return indexes | ||
.filter((index): index is IndexDescriptionInfo & { name: string } => { | ||
return !!index.name; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my own understanding, in what type of situation would index.name
be false-y?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really know, that's the type we're getting directly from the driver, so I just assumed in some corner cases it might be possible and decided to account for that, can do a bit more digging 🙂
).filter((value) => { | ||
// They will be matching in a lot of cases, but the chances of both | ||
// local and foreign field in a relationship being _id are very slim, so | ||
// skipping | ||
return value[0] !== '_id'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
).filter((value) => { | |
// They will be matching in a lot of cases, but the chances of both | |
// local and foreign field in a relationship being _id are very slim, so | |
// skipping | |
return value[0] !== '_id'; | |
).filter((propPath) => { | |
// The types of _id will be matching in a lot of cases, but the chances of both | |
// local and foreign field in a relationship being _id are very slim, so | |
// skipping | |
return propPath[0] !== '_id'; |
or at least that's what we're trying to communicate here, right?
Unfortunately, I think this would be a valid pattern that we do care about, where _id
s of multiple collections are references to each other as a way of reducing a single document's size (e.g. you have a busy collection with smaller docs and a larger collection with rarely-accessed and potentially large docs that could be referenced via $lookup if necessary)
I think it's fine to skip this for now but if we do, we probably want a TODO ticket to think a bit more about this situation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, actually I was really just assuming there would never be a case like that and was trying to optimise a bit, if you think it's a possible scenario I think I can just drop this filter, I don't think performance-wise it's a very big change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the logic in c2c98ff, thanks for bringing this up!
Co-authored-by: Anna Henningsen <anna.henningsen@mongodb.com>
… during relationship discovery
This patch adds the logic for automatically discovering relationships between collections following the algorithm we outlined in the "Infer relationships" scope. This behavior is behind a feature flag until we add the toggle to disable the discovery to the schema generation UI, this is the next chunk of work for the project.
I added some collections to the
sample_airbnb
database in our teams "Cluster 0" cluster if you want to test it. If you create a data model for this namespace, it should identify relationships betweenlistingsAndReview
,reviewers
, andreviews
collections: