-
Notifications
You must be signed in to change notification settings - Fork 867
Improving performance with Indexes #1300
Comments
@sorenbs An excellent and thorough analysis! Before I go any further, you might have a typo in the definition of the 'Post table': type Post @model {
id: ID! @isUnique
title: String
creator: Post! @relation(name: "Posts") // <-- should read 'creator: User! @relation(name: "Posts")?
} I think the syntax also needs to cover simple non-join tables as per follows: type Country @model {
id: ID!
countryName: String! @isUnique
regions: [Region!] @relation(name: "CountryRegions")
// Regions are States in the US, but 'Provinces' in other countries
}
type Region @model {
id: ID!
country: Country! @IsUnique(groups: [{name: "UK_CountryRegions", position: 0}])
stateName: String! @isUnique(groups: [{name: "UK_CountryRegions", position: 1}]
} // UK_ here means 'Unique Key' Also, I do not think that generated M2M Join tables are a good idea. One thing I learned the hard way over the years, even after having been given this piece of advice, was that if a M2M join table is created solely for the purpose of joining, without any additional info in it, then the database analysis is probably missing some important feature, and needs to be looked at again. Some M2M join tables also sometimes join 3 or 4 tables. In principle, a M2M join table needs to include some other qualifying columns, above and beyond the standard 'CreatedAt, CreatedByUser' etc. For eg: Real-Estate Agents (who often work in pairs, and can work with agents from another Agency, under a deal arrangement), is mapped to a specific property, sometimes as the 'Lead' agent, sometimes as the 'Assistant'. So the M2M join table might look like: type MapAgent2Property @model {
id: ID! @isUnique
agentID: ID!
propertyID: ID!
agentRole: RoleEnum! // 'Primary', 'Assistant', 'CoAgent' etc.
dealAgreementID: ID // The details of who gets what % of the sale
} So what I am saying is that generated M2M tables are not really feasible. The DB analyst should be able to define the table properly, and define the necessary (multi-column) Unique keys to ensure consistency. So, your proposed syntax is good, but does not need to be as complex as you are making it. I really like the 'sort_order' option as well. Good job on the thorough analysis! |
BTW, what MarkDown tag did you use to get the cool Markup on your types? |
|
awesome, thanks! @kbrandwijk |
Thanks @Trellian - some good thoughts! First - I have corrected the typo. About the M2M join table: A core design goal of Graphcool is to enable pain-free schema migrations preserving existing data. One of the design decisions this has led to is that we always use a M2M-style join table when creating relationships. This way, transforming a 1-m relationship into a M2M is literally a matter of changing the type of a single field to a list. As we don't have to move data around, we can perform this operation quickly without downtime and without loosing data. This also allows us to introduce Kims proposal https://github.com/graphcool/framework/issues/746 in a way that does not require changing the underlying datastructure. We should carefully consider if this level of flexibility is really required and if the tradeoff in performance is worth it, so thank you for bringing it up. An alternative could be that relations by default never use a relation table, and M2M relations are not supported. If a relation table is required (either to add extra fields on the relation or to create a M2M relation), then the Do you think that would be a better approach? My main concern is that this introduces extra complexity that makes it more difficult to get started with Graphcool. |
@sorenbs I like the @edge syntax. It's effectively defining a join table. But I don't want to get off topic here, though. We're talking about having multi-column unique indexes/keys. Once you have those in place, a whole new world opens up. Without them, one can't produce a real-world database, except using the Can I suggest that we open a new topic for discussing the impact of Join tables? Please also note, that a Join table is just like any other, no need to abstract it out, really. If there is extra information on the 'edge' with a join table, then that info is useful. Even for the simplest case of just order-by. |
@sorenbs I don't see https://github.com/graphcool/framework/issues/746 marked as 1.0 😄 |
I agree with @Trellian that multi-column uniqueness constraints are far more important than having a join table for being able to easily convert a one to many 1-M relationship into a many to many M2M relationship. I need many to many relationships occasionally, so don't want to get rid of that ability and realize that would still need a join table, but IMO it's going to be pretty rare to want to transition between those join situations, and I'd be fine with that being manual and cumbersome (need to create new tables, migrate data, rename) if it meant I got multi-column constraints (and indexes) which I need all the time. |
The proposal mentions To specify a unique constraint covering multiple fields, simply use the @unique directive instead of @Index: type User @model {
id: ID! @isUnique
name: String @unique(groups: [{name: "combined_name_and_age", position: 0}])
age: String @unique(group: [{name: "combined_name_and_age", position: 1}])
}
|
@marktani in answer to The necessary syntax is simpler than in type User @model {
id: ID! @Unique
name: String
age: String
Posts: [Post!] @relation(name: "user_posts")
}
type Post @model {
id: ID! @Unique
author: User! @relation(name: "user_posts) @unique(groups: [{name: "post_title_per_user", position: 0}])
title: String! @unique(groups: [{name: "post_title_per_user", position: 1}])
} or maybe this would be simpler and easier to implement?: type Post @model {
id: ID! @Unique
author: User! @relation(name: "user_posts)
title: String!
@unique({name: "UK_user_posts", columns: {"author", "title"})
// or even simpler, no need for a unique key name, most RBMS don't need them, or they `autogenerate`:
// @unique(columns: {"author", "title"})
}
Unique keys are Table constraints, and are never allowed to span tables directly in RDBMS theory. For |
How will multi-field unique constraints be reflected in Example schema: type Post {
id: ID! @unique
title: String @unique(groups: [{name: "combined_title_and_slug", position: 0}])
slug: String @unique(group: [{name: "combined_title_and_slug", position: 1}])
} Query by query {
post(where: {
title: "My biggest adventure"
}) {
id
title
published
}
} Query by slug: query {
post(where: {
id: "my-biggest-adventure"
}) {
id
title
published
}
} Is there a way to fetch a post by a unique query {
post(where: {
combined_title_and_slug: {
slug: "my-biggest-adventure"
title: "My biggest adventure"
}
}) {
id
title
published
}
} |
I'd just like to mention that I'm really hurting for a compound primary key at the moment to provide an upsert on these fields |
Yes, echoing the need for a multi-column index to enforce uniqueness. Right now we are manually enforcing the combined uniqueness by performing a "where" at all places a creation can take place. |
what about |
+1 for multi-column index and unique indexes! |
Suggestion looks good! Is anything happening here? How can we help? About multi column unique indexes, which i'm in dire need of at the moment: Why have 2 decorators |
Generally agree Foreign Keys, multi column index, and constraints are super important in a robust for production RBDMS. I dont understand the design decision for defaulting to join tables, seems optimizing for the wrong thing, 80-20 rule, relational dbs are ALL about relationships, native on those entities, FKs are important, if u need an entire new table every time u need a simple FK that is a huge overhead, huge performance impact adding joins to SQL, and just generally hard to work with engineering wise, especially coming into a project with a big graph/schema. |
Adding to @bjm88 I really don't understand, how this feature could be not in the list of top-priorities. Running a production DB without keys and constraints and any kind of load more than nothing is absurd. And without compound indexes in lot cases you just can't guarantee data consistency. |
@terion-name That's the argument I have been trying to make for months, and have been shot down repeatedly. It is such a fundamental feature that I seriously wonder if the guys behind graphcool have any real-world experience at all. I'm not trying to be nasty here. It's just an observation. I wouldn't even consider releasing a product like this without those basic features. It will be dead in the water on the first even slightly non-trivial database. You wouldn't even be able to get the basic old-favourite 'video-store' DB to work properly. @bjm88 I agree totally with you too. The solution as it stands is totally unworkable in the real world. |
Hi Prisma team, I think you are seeing that building an ORM it is crucial to get relationships and performance right working with databases. This project has such potential, I really hope you consider doing organic foreign keys on tables themselves and taking index setup and management seriously. Have you considered using GitCoin to have people be able to pay for features to help support them. I suspect this is actually not hard to implement, many other ORMs do it, but I simply cannot use this library without these basic design principles for working with a db, would be happy to support with $ if you enable it. |
Adding in my vote for this |
For those that run into performance problems using Postgres, check out Hasura as an option. |
Hey all, thanks for sharing your concerns with us! 🙏 However indexes are a just one facet of performance. During the last weeks we created an extensive benchmarking suite which we used to drive our decisions on which areas must be improved first performance wise. In our tests just adding indexes did not help performance as much as we would like it to. Instead we identified our SQL queries and application code as the biggest problems right now. Therefore we spent a lot of time during the last weeks to refactor our SQL queries to be more performant. In addition we profiled the Prisma application code a lot and were able to implement significant improvements for CPU and memory consumption. The first big batch of those improvements is landing in our release If you are eager to give it a spin just use the tag/version And there is one last thing around indexes: If you think that indexes are absolutely crucial you always have the possibility to just connect to your database and create them manually. I agree that this is not the best experience yet, but we really want to get indexes right before we ship anything half baked to our users. PS: In case you hit any performance problems with |
@mavilein performant queries is a good thing, but saying that indexes are not efficient — is not correct. Now we simply can't make compound indexes (like |
@mavilein, if I create the unique constraints manually as you suggest, what are the chances that I will have problems later with e.g. data migrations? Thanks |
Did you guys ever have a meeting(s) about this and pick a direction? These are very good questions that will dictate a ton of code you guys write, as well as how successful Prisma is. fwiw, the "M2M everything!" approach you're currently drifting along with was the reason we jumped ship.
The complexity would be worth it. (I'd trade the complexity of developing/supporting 50 connectors to just supporting Postgres.) |
I 100% agree with you. Guys, IMHO, use Certainly I would never consider using Prisma, even in a demo project, without it. JM2C. |
Chiming in about the topic of supporting only M2M relations - once we support introspection for MySQL and unify "passive" and "active" connectors, more relation modelling approaches will be supported. For Postgres, we already support different relation modelling approaches, here is an overview. @Trellian, did you check that out already? It would be great to hear your perspective on this. |
Agree. The original topic of this issue is about index, while since prisma and db run on our own server, we can always manage indexes by ourselves (graphcool cannot do that). But the db structure & table structure created by For a one-to-many relation it uses 3 tables (A - AB- B) instead of 2; for a M2M relation with extra fields, 5 tables (A - A(AB) - AB - B(AB) -B) instead of 3. Also see #3060. That harms query performance badly. |
@marktani Does introspection for Postgresql only work one way -- from db to prisma schema, not from prisma schema to db? I tried to use the introspected schema (which has lots of @pgRelation's) to deploy a new db, but the deployed db structure is not the same as the original one. |
That's an interesting question, I am not entirely sure! Let's create a separate discussion about this in a new issue. Can you share your initial schema and the introspected datamodel there? This will give me a better understanding of what's happening. |
I have just published a spec that touches on this topic. It contains a new syntax proposal for (multi field) indexes. We would love to hear your feedback on this one. |
Is anyone able to share an update on this topic? The lack of composite key/index support forced me to have to migrate over to hasura, but I'd love to have the option to come back. I really think prisma has a great deal of potential if this issue in particular is executed correctly, in a way that provides a high degree of flexibility for advanced schemas. In my opinion, this issue should be a top priority or many potential users will simply not take the library into serious consideration for use in production environments. |
@willm78 In Russia, many people have already abandoned the Prisma on this issue. |
@willm78 can we just add the indexes we wanted to our database directly bypassing prisma? |
@beeplin : Yes you can do that. |
One of the most valuable and important features for production-ready apps is in hold for a year. Have no words |
@willm78 - would the proposal in this spec satisfy your requirements? prisma/prisma#3405 |
+1 |
Waiting for this feature to be implemented ... |
Prisma2 looks promising Guess we will have to wait few more months for some more stable release? |
@melounek Are you suggesting that this feature is confirmed for rollout in Prisma 2? |
I'm also waiting for this to be added for Prisma 1. We cannot upgrade to Prisma 2 yet because it doesn't support subscriptions and we use that heavily.😔 |
In order to be able to optimise performance for the actual workload, it should be possible for the development team to specify the exact indexes they need.
Indexes currently being created
primary key
All models are backed by a single database table with a
id
column containing the node id. This column is indexed as the primary key.unique fields
Uniqueness is guaranteed by creating a unique index on the column.
Example
This model generates a table with two indexes:
relations
All relations are implemented with a intermediate relation table:
generates three database tables:
User
Post
Posts
Simplifying indexes on relation tables
Remove the id coulmn
There is no need for a dedicated id column on relation tables
Instead, use the compound index A,B as the primary key to guarantee that a node can only be inserted once into the relation
Two indexes is enough to provide fast traversal
To provide fast traversal in both directions we need two indexes: one on A and one on B.
As we already have a compound index with A as prefix, we just need an index on B
We end up with this structure:
Complexity Analysis
Single Model
non unique field
two fields
range query
range query and equality
[Sorting same as range query]
range query on two fields
Multiple Models
simple relation
relation with single equality
relation with range query
Mirror fields to relation table
Many of the Best Case and Realistic Case scenarios above depend on one or more columns from a model table to be mirrored in the relation table. Mirroring a column into the relation table enables us to create a compound index across relations. In the relation example above, we might introduce a mirrored field like this:
generates three database tables:
User
Post
Posts
The new compound index allows us to speed up queries like this:
Mirrored fields must be kept in sync by the graphcool backend
Specifying indexes in
types.graphql
Design Goals
Single Model
one field
multiple fields
Across relations
one field
Improves performance of queries like:
multiple fields
Improves performance of queries like:
Unique Index
To specify a unique constraint covering multiple fields, simply use the
@unique
directive instead of@index
:The text was updated successfully, but these errors were encountered: