New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The MongoDB connector should support the `prisma introspect` command #3529

Open
nikolasburk opened this Issue Nov 15, 2018 · 8 comments

Comments

Projects
None yet
3 participants
@nikolasburk
Member

nikolasburk commented Nov 15, 2018

When using the MongoDB connector with an existing database, I currently need to model my data by hand. It would be great if the prisma introspect command could help me with this by sampling a number of documents from the collections inside my database and suggest a datamodel based on that.

@ejoebstl

This comment has been minimized.

Collaborator

ejoebstl commented Nov 21, 2018

Proposed design goals

The introspection process ...

  • should not be interactive
  • should create a datamodel file that's easy to tweak, e.g. by inserting useful comments
  • should yield an equivalent datamodel when run against a database created by prisma from an existing datamodel,

Proposed tasks

All mentioned Todos only affect the CLI component.

First, some cleanup should be done. This is optional but recommended, to avoid duplication.

  • Define an abstract internal format for the representation of database datamodels, opposing to the current relational-focused model. This model should also support representing error states, e.g. invalid schemas which can occur with NoSql databases.
  • Refactor the SDL inferrer to support the new format, make use of abstraction for different database types.
  • Refactor (or wrap) the existing postgres connector to support the new datamodel.
  • Look for duplicate definitions and utility methods through the code, especially related to pluralization and capitalization. Move those to some central utility module.

After this, we can implement the MongoDB connector:

  • Create a base class for the fetching schemas from any NoSql Database. This class should already support an interface for abstract sampling strategies, intersection strategies and as well as an interface for resolving relations.
  • Create sampling strategies: First, Random-N1, All.
  • Create relation resolver strategy: Index-Lookup2
  • Implement the actual MongoDB connector

Notes on Sampling:

1: Samples N random documents from each collection and tries to find a useful intersection.

Multiple samples are merged. Fields that are found in all samples are made required, fields that are found in some samples are made optional.

Notes on Resolving Relations

2: For fields of type UUID or ObjectID, performs an index lookup on all collections to guess if a relation exists. Alternatives/possible additions would be: Guessing relations by name, or to include other types to the lookup as well.

Relations for embedded documents are recursively resolved.

Handled corner cases

  • When there are documents with the same field, but different primitive types, we simply generate a comment which indicates the conflict.
  • MongoDB allows '.' and '$' in variable names. We sanitize the type name.
  • When we find embedded types with equal schema, we summarize them to a single embedded type.

Unhandled corner cases

If we encounter such a case, we abort.

Dependencies

Existing databases might have embedded types without any _id field. Related to #3575.

@ejoebstl

This comment has been minimized.

Collaborator

ejoebstl commented Nov 30, 2018

This PR implements an alpha version of Mongo introspection. I've abstracted the concept of document databases, so it should be super easy (150 LOC) to add other document databases in the future. Schema rendering is now a completely independent module in prisma-datamodel.

Resolving Behavior

The default behavior is: Sample one element from each collection to infer a flat schema, then do a lookup of all fields of 50 randomly selected items to find relations.

Let's see how well this works with real-world data.

For now, we try to infer relations on all ObjectID and string fields. I'll test that with real-world data.

Open Todos

  • Type naming: It might be desirable to singularize type names when inferred from an array type field (Example: embedded type for field orders should be called Order).
  • Refactor primitive type inference to a separate class or module.
  • More tests, especially complicated or messy datasets.
@ejoebstl

This comment has been minimized.

Collaborator

ejoebstl commented Dec 4, 2018

There are currently the following open questions for this feature. I suggest we wait for input of some users who tried the new beta release to answer this questions:

  • It might make sense to use random sampling for inferring the flat model as well.
  • Is it desirable to infer required/not required? E.g. when a field was set on any sampled document, we could mark it as required.
  • Bi-directional relations are not considered right now. I'm not sure if it's a good idea to infer that from the data model.
  • Type naming, as mentioned above. Do we have any reference implementation for regularizing stuff?
  • I am not sure if all primitive types supported by mongo/BSON are mapped in the best possible prisma type right now.
@nikolasburk

This comment has been minimized.

Member

nikolasburk commented Dec 13, 2018

I just tested the introspection with this data that was structured according to this datamodel:

type User @db(name: "users") {
  id: ID! @id
  email: String @unique
  name: String!
  posts: [Post!]! @relation(link: INLINE)
}

type Post @db(name: "posts") {
  id: ID! @id
  wasCreated: DateTime! @createdAt
  wasUpdated: DateTime! @updatedAt
  title: String!
  published: Boolean @default(value: false)
  author: User
  comments: [Comment!]!
}

type Comment @embedded {
  text: String!
  writtenBy: User!
}

This was the output that was generated:

type posts {
  _id: ID! @id
  published: Boolean
  title: String
  wasCreated: postsWasCreated
  wasUpdated: postsWasUpdated
}

# type postsWasCreated @embedded {

# }

# type postsWasUpdated @embedded {

# }

# type User {

# }

type users {
  _id: ID! @id
  email: String
  name: String
  posts: [ID!]!
}

EDIT: Note that the dataset I used was extremely small:

  • 2 documents in users
  • 3 documents in posts
    • 2 of the 3 documents had 1 subdocument each in comments
See data

Data:

image

image

@nikolasburk

This comment has been minimized.

Member

nikolasburk commented Dec 13, 2018

One general consideration might be that we generate model names that follow the Prisma conventions, i.e. start with uppercase letter and use singular version and use the @db directive to map to the underlying collection.

I opened an issue for this: #3702

@ejoebstl

This comment has been minimized.

Collaborator

ejoebstl commented Dec 13, 2018

Thank you for the input. I will look into the relation issue immideately. Can you PM me the data as JSON or PM me credentials for the database?

Regarding the naming: Great idea! To respect prisma conventions, we should singularize type names. Is there any reference for this in prisma so far? Otherwise I can just do something trivial, like trimming trailing ses.

@nikolasburk

This comment has been minimized.

Member

nikolasburk commented Dec 13, 2018

We have some scarce docs for naming conventions (it actually doesn't mention the uppercasing of models) here. The data was produced using the following three mutations:

Create two new users

mutation {
  user1: createUser(data: {
    email: "alice@prisma.io"
    name: "Alice"
    posts: {
      create: {
        title: "Join us for GraphQL Conf 2019 in Berlin"
        published: true
      }
    }
  }) {
    id
  }

  user2: createUser(data: {
    email: "bob@prisma.io"
    name: "Bob"
    posts: {
      create: [{
        title: "Subscribe to GraphQL Weekly for community news"
        published: true
      } {
        title: "Follow Prisma on Twitter"
      }]
    }
  }) {
    id
  }
}

Add comments to two posts from Bob (send twice)

mutation {
  updatePost(
    where: {
      id: "__ID_FROM_BOBS_POST__"
    }
    data: {
      comments: {
         create: [{
          text: "Love it 👏"
          writtenBy: {
            connect: {
              email: "alice@prisma.io"
            }
          }
        }]
      }
    }
  ) {
    id
  }
}
@ejoebstl

This comment has been minimized.

Collaborator

ejoebstl commented Dec 13, 2018

The problem can be split into the following issues:

  • Only the first item in a collection is sampled for model resolution by default, therefore the comments embedded type is missing completely. I will change that default to random sampling.
  • DateTime scalar type is not handled correctly. I will re-work the scalar handling.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment