New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issues with large data sets #268

Open
mwilliamson-healx opened this Issue Sep 2, 2016 · 37 comments

Comments

Projects
None yet
9 participants
@mwilliamson-healx

For our use case, we send a few thousand objects to the client. We're currently using a normal JSON API, but are considering using GraphQL instead. However, when returning a few thousand objects, the overhead of resolving values makes it impractical to use. For instance, the example below returns 10000 objects with an ID field, and that takes around ten seconds to run.

Is there a recommended way to improve the performance? The approach I've used successfully so far is to use the existing parser to parse the query, and then generate the response by creating dictionaries directly, which avoids the overhead of resolving/completing on every single value.

import graphene

class UserQuery(graphene.ObjectType):
    id = graphene.Int()

class Query(graphene.ObjectType):
    users = graphene.Field(UserQuery.List())

    def resolve_users(self, args, info):
        return users

class User(object):
    def __init__(self, id):
        self.id = id

users = [User(index) for index in range(0, 10000)]

schema = graphene.Schema(query=Query)

print(schema.execute('{ users { id } }').data)
@ekampf

This comment has been minimized.

Show comment
Hide comment
@ekampf

ekampf Sep 2, 2016

Contributor

@mwilliamson-healx Ive had the same problem.
Fortunately the next version of Graphene fixes the issue (and also adds performance tests to make sure it doesnt regress).
Though its a risk using the library's bleeding edge, Ive been running the next version (pip install graphene>=1.0.dev) for a couple of weeks now in production without problems.

So you should give it a try and see if it solves your problem (and if not, maybe there's some new performance test cases to add to Graphene's performance tests)

Contributor

ekampf commented Sep 2, 2016

@mwilliamson-healx Ive had the same problem.
Fortunately the next version of Graphene fixes the issue (and also adds performance tests to make sure it doesnt regress).
Though its a risk using the library's bleeding edge, Ive been running the next version (pip install graphene>=1.0.dev) for a couple of weeks now in production without problems.

So you should give it a try and see if it solves your problem (and if not, maybe there's some new performance test cases to add to Graphene's performance tests)

@syrusakbary

This comment has been minimized.

Show comment
Hide comment
@syrusakbary

syrusakbary Sep 2, 2016

Member

@mwilliamson-healx as Eran pointed, the next version its been rewritten with a special focus on performance.

We also added a benchmark for a similar case you are exposing (retrieving about 100k elements instead of 10k).
https://github.com/graphql-python/graphene/blob/next/graphene/types/tests/test_query.py#L129

The time spent for retrieving 10k elements should be about 10-20 times faster in the next branch (50-100ms?).
https://travis-ci.org/graphql-python/graphene/jobs/156652274#L373

Would be great if you could test this case in the next branch and expose if you run into any non-performant case, I will happily work on that :).

Member

syrusakbary commented Sep 2, 2016

@mwilliamson-healx as Eran pointed, the next version its been rewritten with a special focus on performance.

We also added a benchmark for a similar case you are exposing (retrieving about 100k elements instead of 10k).
https://github.com/graphql-python/graphene/blob/next/graphene/types/tests/test_query.py#L129

The time spent for retrieving 10k elements should be about 10-20 times faster in the next branch (50-100ms?).
https://travis-ci.org/graphql-python/graphene/jobs/156652274#L373

Would be great if you could test this case in the next branch and expose if you run into any non-performant case, I will happily work on that :).

@mwilliamson-healx

This comment has been minimized.

Show comment
Hide comment
@mwilliamson-healx

mwilliamson-healx Sep 5, 2016

Thanks for the suggestion! I gave Graphene 1.0.dev0 a go, and while it's certainly faster, it still takes around a second to run the example above. Admittedly, I didn't try it out on the speediest of machines, but suggests that it would still be the dominant factor in response time for our real data.

Thanks for the suggestion! I gave Graphene 1.0.dev0 a go, and while it's certainly faster, it still takes around a second to run the example above. Admittedly, I didn't try it out on the speediest of machines, but suggests that it would still be the dominant factor in response time for our real data.

@syrusakbary

This comment has been minimized.

Show comment
Hide comment
@syrusakbary

syrusakbary Sep 5, 2016

Member

@mwilliamson-healx some of the performance bottleneck was also in the OrderedDict generation.
For that graphql-core uses cyordereddict when available (a implementation of OrderedDict in Cython that runs about 2-6x faster).

Could you try installing cyordereddict with pip install cyordereddict and running again the tests? (no need to modify anything in the code).

Thanks!

Member

syrusakbary commented Sep 5, 2016

@mwilliamson-healx some of the performance bottleneck was also in the OrderedDict generation.
For that graphql-core uses cyordereddict when available (a implementation of OrderedDict in Cython that runs about 2-6x faster).

Could you try installing cyordereddict with pip install cyordereddict and running again the tests? (no need to modify anything in the code).

Thanks!

@syrusakbary

This comment has been minimized.

Show comment
Hide comment
@syrusakbary

syrusakbary Sep 5, 2016

Member

PS: There are plans to port some code to Cython (while still preserving the Python implementation) to make graphene/graphql-core even more performant, however any other suggestion would be always welcome! :)

Member

syrusakbary commented Sep 5, 2016

PS: There are plans to port some code to Cython (while still preserving the Python implementation) to make graphene/graphql-core even more performant, however any other suggestion would be always welcome! :)

@mwilliamson-healx

This comment has been minimized.

Show comment
Hide comment
@mwilliamson-healx

mwilliamson-healx Sep 6, 2016

Thanks again for the suggestion! Using cyordereddict shaves about 200ms off the time (from 1s to 0.8s), so an improvement, but still not ideal. I had a look around the code, but nothing stuck out to me as an easy way of improving performance. The problem (from my extremely quick and poorly informed glance!) is that you end up resolving every single value, which includes going through any middleware and having to coordinate promises. Keeping that functionality while being competitive with just spitting out dicts directly seems rather tricky.

The proof of concept I've got sidesteps the issue somewhat by parsing the GraphQL query, and then relying on the object types being able to generate the requested data directly, without having to further resolve values. It's very much a proof of concept (so doesn't support fragments, and isn't really GraphQL compliant yet), but feel free to have a look. Assuming the approach is sane, then it's hard to see how to reconcile that approach with the normal GraphQL resolve approach.

Thanks again for the suggestion! Using cyordereddict shaves about 200ms off the time (from 1s to 0.8s), so an improvement, but still not ideal. I had a look around the code, but nothing stuck out to me as an easy way of improving performance. The problem (from my extremely quick and poorly informed glance!) is that you end up resolving every single value, which includes going through any middleware and having to coordinate promises. Keeping that functionality while being competitive with just spitting out dicts directly seems rather tricky.

The proof of concept I've got sidesteps the issue somewhat by parsing the GraphQL query, and then relying on the object types being able to generate the requested data directly, without having to further resolve values. It's very much a proof of concept (so doesn't support fragments, and isn't really GraphQL compliant yet), but feel free to have a look. Assuming the approach is sane, then it's hard to see how to reconcile that approach with the normal GraphQL resolve approach.

@syrusakbary

This comment has been minimized.

Show comment
Hide comment
@syrusakbary

syrusakbary Sep 6, 2016

Member

Hi @mwilliamson-healx,
At first congrats for your great proof of concept!

I've been thinking for a while how we can improve performance in GraphQL. This repository -graphene- uses graphql-core under the hood which is a very similar port of the GraphQL-js reference implementation.

The problem we are seeing is that either in graphql-core and graphql-js that each type/value is checked in runtime (what I mean is that the resolution+serialization function is "discovered" in runtime each time a value is completed). In js the performance difference is not as big as it usually have a great JIT that optimizes each of the type/value completion calls. However as Python doesn't have any JIT by default, this result in a quite expensive operation.

In the current graphql-js and graphql-core implementations if you want to execute a GraphQL query this is how the process will look like:

Parse AST from string (==> validate the AST in the given schema) ==> Execute a AST given a Root type.

However we can create a "Query Builder" as intermediate step before executing that will know exactly what are the fields we are requesting and therefore it's associated types and resolvers, so we don't need to "search" for them each time we are completing the value.
This way, the process will be something like:

Parse AST from string (==> validate the AST in the given schema) ==> Build the Query resolver based in the AST ==> Execute the Query resolver builder given a Root type.

Your proof of concept is doing the latter so the performance difference is considerable comparing with the current graphql-core implementation.

I think it's completely reasonable to introduce this extra Query resolver build step before executing for avoid the performance bottleneck of doing it in runtime. In fact, I would love to have it in graphql-core.

And I also think this would be super valuable to have it too in the graphql-js implementation as it will improve performance and push forward other language implementations ( @leebyron ).

Member

syrusakbary commented Sep 6, 2016

Hi @mwilliamson-healx,
At first congrats for your great proof of concept!

I've been thinking for a while how we can improve performance in GraphQL. This repository -graphene- uses graphql-core under the hood which is a very similar port of the GraphQL-js reference implementation.

The problem we are seeing is that either in graphql-core and graphql-js that each type/value is checked in runtime (what I mean is that the resolution+serialization function is "discovered" in runtime each time a value is completed). In js the performance difference is not as big as it usually have a great JIT that optimizes each of the type/value completion calls. However as Python doesn't have any JIT by default, this result in a quite expensive operation.

In the current graphql-js and graphql-core implementations if you want to execute a GraphQL query this is how the process will look like:

Parse AST from string (==> validate the AST in the given schema) ==> Execute a AST given a Root type.

However we can create a "Query Builder" as intermediate step before executing that will know exactly what are the fields we are requesting and therefore it's associated types and resolvers, so we don't need to "search" for them each time we are completing the value.
This way, the process will be something like:

Parse AST from string (==> validate the AST in the given schema) ==> Build the Query resolver based in the AST ==> Execute the Query resolver builder given a Root type.

Your proof of concept is doing the latter so the performance difference is considerable comparing with the current graphql-core implementation.

I think it's completely reasonable to introduce this extra Query resolver build step before executing for avoid the performance bottleneck of doing it in runtime. In fact, I would love to have it in graphql-core.

And I also think this would be super valuable to have it too in the graphql-js implementation as it will improve performance and push forward other language implementations ( @leebyron ).

@mwilliamson

This comment has been minimized.

Show comment
Hide comment
@mwilliamson

mwilliamson Sep 6, 2016

Contributor

Thanks for the kind words. One question I had was how much you'd imagine trusting the query builder? For my implementation, I was planning on putting the responsibility of correctness onto the queries (rather than having the GraphQL implementation check). The result is that, unlike the normal implementations of GraphQL, it's possible to implement something that doesn't conform to the GraphQL spec.

Contributor

mwilliamson commented Sep 6, 2016

Thanks for the kind words. One question I had was how much you'd imagine trusting the query builder? For my implementation, I was planning on putting the responsibility of correctness onto the queries (rather than having the GraphQL implementation check). The result is that, unlike the normal implementations of GraphQL, it's possible to implement something that doesn't conform to the GraphQL spec.

@syrusakbary

This comment has been minimized.

Show comment
Hide comment
@syrusakbary

syrusakbary Sep 7, 2016

Member

I'm working in the query builder concept. As of right now the benchmarks shows about 4x improvement when returning large datasets.

Related PR in graphql-core: graphql-python/graphql-core#74

Member

syrusakbary commented Sep 7, 2016

I'm working in the query builder concept. As of right now the benchmarks shows about 4x improvement when returning large datasets.

Related PR in graphql-core: graphql-python/graphql-core#74

@syrusakbary

This comment has been minimized.

Show comment
Hide comment
@syrusakbary

syrusakbary Sep 8, 2016

Member

Some updates!
I've been working non-stop on keep improving the performance with the Query Builder.

Benchmarks

Retrieving 10k ObjectTypes

Doing something similar to the following query where allContainers type is a [ObjectType] and x is a Integer:

{
  allContainers {
    x
  }
}

Retrieving a List with 10k Ints

Doing something similar to the following query where allInts type is a [Integer]

{
  allInts
}

NOTE: Just serializing a plain list using GraphQLInt.serialize takes about 8ms, so the gains are better compared substracting this amount from the totals: 4ms vs 22ms

Conclusion

The work I'm doing so far is being a demonstration the code performance still have margins to improve while preserving fully compatibility with GraphQL syntax.

The proof of concept speedup goes between 5x and 15x while maintaining the syntax and features GraphQL have. Still a lot of work to do there, but it's a first approach that will let us discover new paths for speed improvement.

Extra

I think by using Cython for some critical instructions we can gain about another 10-20x in speed.

Transport

Apart of using Cython I'm thinking how we can plug multiple kind of transports into GraphQL.
So instead of creating Python Objects each time we are accessing a field, and then transforming the result to JSON, another approach could be transform the values directly into JSON or whatever transport we are using.

This way the result could be created directly in the output format. This way we can plug other transports like binary (CapN Proto/FlatBuffers/Thrift/others), msgpack or any other thing we could think of.

Member

syrusakbary commented Sep 8, 2016

Some updates!
I've been working non-stop on keep improving the performance with the Query Builder.

Benchmarks

Retrieving 10k ObjectTypes

Doing something similar to the following query where allContainers type is a [ObjectType] and x is a Integer:

{
  allContainers {
    x
  }
}

Retrieving a List with 10k Ints

Doing something similar to the following query where allInts type is a [Integer]

{
  allInts
}

NOTE: Just serializing a plain list using GraphQLInt.serialize takes about 8ms, so the gains are better compared substracting this amount from the totals: 4ms vs 22ms

Conclusion

The work I'm doing so far is being a demonstration the code performance still have margins to improve while preserving fully compatibility with GraphQL syntax.

The proof of concept speedup goes between 5x and 15x while maintaining the syntax and features GraphQL have. Still a lot of work to do there, but it's a first approach that will let us discover new paths for speed improvement.

Extra

I think by using Cython for some critical instructions we can gain about another 10-20x in speed.

Transport

Apart of using Cython I'm thinking how we can plug multiple kind of transports into GraphQL.
So instead of creating Python Objects each time we are accessing a field, and then transforming the result to JSON, another approach could be transform the values directly into JSON or whatever transport we are using.

This way the result could be created directly in the output format. This way we can plug other transports like binary (CapN Proto/FlatBuffers/Thrift/others), msgpack or any other thing we could think of.

@mwilliamson

This comment has been minimized.

Show comment
Hide comment
@mwilliamson

mwilliamson Sep 22, 2016

Contributor

Thanks for working on this. I've taken a look at the proof of concept you wrote, but it's not clear to me exactly how it behaves, and how it's saving time versus the existing implementation. It seems like it's still resolving all fields of objects in the response, but I could easily have misread.

I adjusted my proof of concept to (optionally) integrate with GraphQL properly. This means that you can do things like generating the schema, introspect, and the all other stuff that GraphQL does, but it means you hit the performance penalty again. It seems to me that the easiest way of fixing this for my use case would be a way to prevent resolution from descending into the object that my proof of concept produces -- a way of returning a value from resolve functions that doesn't trigger resolution on any fields (since they're already resolved).

Perhaps something like:

def resolve_users(...):
    ...
    return FullyResolvedValue(users)

where users is already fully resolved by inspecting the AST or whatever. Alternatively, a decorator on the function itself might be clearer.

This shifts more responsibility onto the calling code to make sure that the returned value is of the correct shape in order to ensure it's still a valid GraphQL implementation, but that's definitely a good trade-off for me.

Contributor

mwilliamson commented Sep 22, 2016

Thanks for working on this. I've taken a look at the proof of concept you wrote, but it's not clear to me exactly how it behaves, and how it's saving time versus the existing implementation. It seems like it's still resolving all fields of objects in the response, but I could easily have misread.

I adjusted my proof of concept to (optionally) integrate with GraphQL properly. This means that you can do things like generating the schema, introspect, and the all other stuff that GraphQL does, but it means you hit the performance penalty again. It seems to me that the easiest way of fixing this for my use case would be a way to prevent resolution from descending into the object that my proof of concept produces -- a way of returning a value from resolve functions that doesn't trigger resolution on any fields (since they're already resolved).

Perhaps something like:

def resolve_users(...):
    ...
    return FullyResolvedValue(users)

where users is already fully resolved by inspecting the AST or whatever. Alternatively, a decorator on the function itself might be clearer.

This shifts more responsibility onto the calling code to make sure that the returned value is of the correct shape in order to ensure it's still a valid GraphQL implementation, but that's definitely a good trade-off for me.

@qubitron

This comment has been minimized.

Show comment
Hide comment
@qubitron

qubitron Mar 1, 2017

@syrusakbary any update on this thread? I am using graphene in production and unfortunately it simply doesn't scale for even the moderate data sets being returned by my API. I'm slowly rewriting my API calls as normal HTTP calls and seeing 10x RPS increases (and therefore 10x reduction in server costs), but it means I'm losing the flexibility of the graphQL approach. Seems like the solution discussed in this thread would save me from this headache!

qubitron commented Mar 1, 2017

@syrusakbary any update on this thread? I am using graphene in production and unfortunately it simply doesn't scale for even the moderate data sets being returned by my API. I'm slowly rewriting my API calls as normal HTTP calls and seeing 10x RPS increases (and therefore 10x reduction in server costs), but it means I'm losing the flexibility of the graphQL approach. Seems like the solution discussed in this thread would save me from this headache!

@mwilliamson-healx

This comment has been minimized.

Show comment
Hide comment
@mwilliamson-healx

mwilliamson-healx Mar 1, 2017

In case it's useful, I've been using the project I mentioned above in production, and performance has been good enough. In particular, it avoids having to run a (potentially asynchronous) resolver for every field. I'm still tweaking the API, but it should be reasonably stable (and better documented!) soon.

https://github.com/healx/python-graphjoiner

In case it's useful, I've been using the project I mentioned above in production, and performance has been good enough. In particular, it avoids having to run a (potentially asynchronous) resolver for every field. I'm still tweaking the API, but it should be reasonably stable (and better documented!) soon.

https://github.com/healx/python-graphjoiner

@syrusakbary

This comment has been minimized.

Show comment
Hide comment
@syrusakbary

syrusakbary Mar 1, 2017

Member

Hi @qubitron,

If you use the experimental branch features/next-query-builder in graphql-core, you will be able to use a new execution system that improves significantly the speed: graphql-python/graphql-core#74.

It should give you a ~3-5x speed improvement for both big and small datasets.

How to use it

  1. Install it with pip install https://github.com/graphql-python/graphql-core/archive/features/next-query-builder.zip

  2. Enable the new executor (execute this code before any query)

from graphql.execution import executor

executor.use_experimental_executor = True
  1. Execute the query

If you can try it and output here your results would be great!

Extra questions

To help us optimize for your use case:

  • Are you in a CPython environment? (non pypy or google app engine) (to see if we can optimize easily with Cython)
  • How many fields are resolved? (what is the "size" of the GraphQL output)
  • Did you use any GraphQL middleware?
Member

syrusakbary commented Mar 1, 2017

Hi @qubitron,

If you use the experimental branch features/next-query-builder in graphql-core, you will be able to use a new execution system that improves significantly the speed: graphql-python/graphql-core#74.

It should give you a ~3-5x speed improvement for both big and small datasets.

How to use it

  1. Install it with pip install https://github.com/graphql-python/graphql-core/archive/features/next-query-builder.zip

  2. Enable the new executor (execute this code before any query)

from graphql.execution import executor

executor.use_experimental_executor = True
  1. Execute the query

If you can try it and output here your results would be great!

Extra questions

To help us optimize for your use case:

  • Are you in a CPython environment? (non pypy or google app engine) (to see if we can optimize easily with Cython)
  • How many fields are resolved? (what is the "size" of the GraphQL output)
  • Did you use any GraphQL middleware?
@qubitron

This comment has been minimized.

Show comment
Hide comment
@qubitron

qubitron Mar 5, 2017

@syrusakbary it took me a bit of time to get to a place where I had a good test for this. The package you provided seems to make a big improvement! Cutting total execution time for my request roughly in half, with the graphene portion reduced by a factor of 3x.

Initially it wasn't working because I already had graphql-core installed, doing "pip uninstall graphql-core" before running your command above finally yielded the performance improvements.

More about my workload... I'm using a flask web server with graphene_sqlalchemy and returning objects that inherit from SQLAlchemyObjectType (not sure if that counts as middleware but I get similar results when I return plain graphene.ObjectType).

For this particular example, I have ~300 items being returned, and resolving 5 fields (on each. The SQL Query takes about 18ms to return results, and the full HTTP response takes 78ms.

After installing your package the request takes about 18ms and full HTTP response takes 37ms. This is much more reasonable, but there still might be some opportunities for improvements.

I ran the CPython profiler for the duration of the request, here is the breakdown of time spent in the graphql libraries with the experimental executor:

   ncalls  cumtime    filename:lineno(function)
        1    0.165    flask/app.py:1605(dispatch_request)
        1    0.165    flask/views.py:82(view)
        1    0.165    flask_graphql/graphqlview.py:58(dispatch_request)
        1    0.162    flask_graphql/graphqlview.py:149(execute_graphql_request)
        1    0.159    flask_graphql/graphqlview.py:146(execute)
        1    0.159    graphql/execution/executor.py:32(execute)
        1    0.159    graphql/execution/experimental/executor.py:14(execute)
        3    0.159    promise/promise.py:42(__init__)
        1    0.159    promise/promise.py:73(do_resolve)
        1    0.159    graphql/execution/experimental/executor.py:42(executor)
        1    0.159    graphql/execution/experimental/executor.py:59(execute_operation)
    323/1    0.159    graphql/execution/experimental/fragment.py:98(resolve)
   2255/1    0.155    graphql/execution/experimental/resolver.py:25(on_complete_resolver)

I'm using a CPython runtime in AWS, do you think your experimental executor is complete/stable enough for me to use it in production (obviously I will test it)?

qubitron commented Mar 5, 2017

@syrusakbary it took me a bit of time to get to a place where I had a good test for this. The package you provided seems to make a big improvement! Cutting total execution time for my request roughly in half, with the graphene portion reduced by a factor of 3x.

Initially it wasn't working because I already had graphql-core installed, doing "pip uninstall graphql-core" before running your command above finally yielded the performance improvements.

More about my workload... I'm using a flask web server with graphene_sqlalchemy and returning objects that inherit from SQLAlchemyObjectType (not sure if that counts as middleware but I get similar results when I return plain graphene.ObjectType).

For this particular example, I have ~300 items being returned, and resolving 5 fields (on each. The SQL Query takes about 18ms to return results, and the full HTTP response takes 78ms.

After installing your package the request takes about 18ms and full HTTP response takes 37ms. This is much more reasonable, but there still might be some opportunities for improvements.

I ran the CPython profiler for the duration of the request, here is the breakdown of time spent in the graphql libraries with the experimental executor:

   ncalls  cumtime    filename:lineno(function)
        1    0.165    flask/app.py:1605(dispatch_request)
        1    0.165    flask/views.py:82(view)
        1    0.165    flask_graphql/graphqlview.py:58(dispatch_request)
        1    0.162    flask_graphql/graphqlview.py:149(execute_graphql_request)
        1    0.159    flask_graphql/graphqlview.py:146(execute)
        1    0.159    graphql/execution/executor.py:32(execute)
        1    0.159    graphql/execution/experimental/executor.py:14(execute)
        3    0.159    promise/promise.py:42(__init__)
        1    0.159    promise/promise.py:73(do_resolve)
        1    0.159    graphql/execution/experimental/executor.py:42(executor)
        1    0.159    graphql/execution/experimental/executor.py:59(execute_operation)
    323/1    0.159    graphql/execution/experimental/fragment.py:98(resolve)
   2255/1    0.155    graphql/execution/experimental/resolver.py:25(on_complete_resolver)

I'm using a CPython runtime in AWS, do you think your experimental executor is complete/stable enough for me to use it in production (obviously I will test it)?

@syrusakbary

This comment has been minimized.

Show comment
Hide comment
@syrusakbary

syrusakbary Mar 8, 2017

Member

Hi @qubitron, thanks for the info and the profiling data!

I've fixed few issues in the experimental executor and now is as stable as the master branch.
For extra verification, I've executed all the master tests using the experimental executor and all are passing ☺️

So yes, as stable as master! :)

Member

syrusakbary commented Mar 8, 2017

Hi @qubitron, thanks for the info and the profiling data!

I've fixed few issues in the experimental executor and now is as stable as the master branch.
For extra verification, I've executed all the master tests using the experimental executor and all are passing ☺️

So yes, as stable as master! :)

@mwilliamson-healx

This comment has been minimized.

Show comment
Hide comment
@mwilliamson-healx

mwilliamson-healx Mar 8, 2017

Unfortunately, this is still probably too slow for my use-case -- GraphJoiner is around four times faster. When profiling, it seems like most of the time is spent in (potentially asynchronous) field resolution.

Having said that, I'm not sure that the approach I'm using is really compatible with the way Graphene works. I suspect my comments aren't particularly helpful, so I'll be quiet!

Unfortunately, this is still probably too slow for my use-case -- GraphJoiner is around four times faster. When profiling, it seems like most of the time is spent in (potentially asynchronous) field resolution.

Having said that, I'm not sure that the approach I'm using is really compatible with the way Graphene works. I suspect my comments aren't particularly helpful, so I'll be quiet!

@qubitron

This comment has been minimized.

Show comment
Hide comment
@qubitron

qubitron Mar 11, 2017

@mwilliamson-healx I agree it would be nice if this could be faster, for me these changes make it usable but further performance improvements would be nice. I took a cursory look at the GraphJoiner, I haven't had time to full internalize how it works and although it seems like a promising alternative, I'd prefer if the graphene approach could be made faster or if some sort of hybrid approach could be used.

One thing that would be interesting for me is if somehow we could select only the columns from SQL that were requested by the user's query, to further improve database performance.

@mwilliamson-healx I agree it would be nice if this could be faster, for me these changes make it usable but further performance improvements would be nice. I took a cursory look at the GraphJoiner, I haven't had time to full internalize how it works and although it seems like a promising alternative, I'd prefer if the graphene approach could be made faster or if some sort of hybrid approach could be used.

One thing that would be interesting for me is if somehow we could select only the columns from SQL that were requested by the user's query, to further improve database performance.

@syrusakbary

This comment has been minimized.

Show comment
Hide comment
@syrusakbary

syrusakbary Mar 11, 2017

Member

I'm still working on improving Performance.
First step is quite close to be ready, is a new (and ultra-performant) promise implementation.

I'm going to drop here some numbers, so is easier to see the advantages by using just the faster implementation of promise:

Non-optimized GraphQL resolution

Old promise

------------------------------------------------------------------------------------------ benchmark: 5 tests -----------------------------------------------------------------------------------------
Name (time in ms)                                      Min                 Max                Mean             StdDev              Median                IQR            Outliers(*)  Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_big_list_of_ints_serialize                     2.4519 (1.0)        4.8950 (1.0)        2.8593 (1.0)       0.4961 (1.0)        2.6586 (1.0)       0.4846 (1.0)            48;21     380           1
test_big_list_of_ints                              61.0509 (24.90)     73.8399 (15.08)     66.3891 (23.22)     3.7764 (7.61)      66.2786 (24.93)     6.3930 (13.19)            6;0      16           1
test_big_list_objecttypes_with_one_int_field      231.4451 (94.39)    274.0550 (55.99)    253.6332 (88.70)    17.2165 (34.70)    257.7021 (96.93)    27.6580 (57.08)            2;0       5           1
test_big_list_objecttypes_with_two_int_fields     373.6482 (152.39)   407.3970 (83.23)    391.4426 (136.90)   14.5990 (29.43)    391.9201 (147.42)   26.1913 (54.05)            2;0       5           1
test_fragment_resolver_abstract                   233.4590 (95.22)    283.4949 (57.92)    259.2367 (90.66)    21.3765 (43.09)    263.5479 (99.13)    37.4374 (77.26)            2;0       5           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

New promise implementation syrusakbary/promise#23

------------------------------------------------------------------------------------------ benchmark: 5 tests -----------------------------------------------------------------------------------------
Name (time in ms)                                      Min                 Max                Mean             StdDev              Median                IQR            Outliers(*)  Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_big_list_of_ints_serialize                     2.4672 (1.0)        7.0231 (1.0)        2.9814 (1.0)       0.5989 (1.0)        2.7701 (1.0)       0.4563 (1.0)            40;31     378           1
test_big_list_of_ints                              23.3240 (9.45)      31.2262 (4.45)      26.8308 (9.00)      1.9695 (3.29)      26.7700 (9.66)      3.2494 (7.12)            14;0      36           1
test_big_list_objecttypes_with_one_int_field      165.3101 (67.00)    201.4430 (28.68)    181.6540 (60.93)    15.7699 (26.33)    181.4460 (65.50)    29.1352 (63.85)            3;0       6           1
test_big_list_objecttypes_with_two_int_fields     248.4190 (100.69)   291.1139 (41.45)    267.6542 (89.77)    17.9228 (29.93)    259.4721 (93.67)    28.7293 (62.96)            2;0       5           1
test_fragment_resolver_abstract                   112.4361 (45.57)    160.6219 (22.87)    139.5578 (46.81)    20.4794 (34.19)    149.4532 (53.95)    35.4158 (77.61)            2;0       7           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Optimized GraphQL resolution graphql-python/graphql-core#74

Old Promise

------------------------------------------------------------------------------------------ benchmark: 5 tests -----------------------------------------------------------------------------------------
Name (time in ms)                                      Min                 Max                Mean             StdDev              Median                IQR            Outliers(*)  Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_big_list_of_ints_serialize                     2.4519 (1.0)        5.0600 (1.0)        2.8100 (1.0)       0.4778 (1.0)        2.6290 (1.0)       0.3346 (1.0)            40;35     361           1
test_big_list_of_ints                              48.6422 (19.84)     61.3708 (12.13)     55.8666 (19.88)     2.9545 (6.18)      55.4373 (21.09)     2.9249 (8.74)             6;1      20           1
test_big_list_objecttypes_with_one_int_field      148.5479 (60.58)    192.1201 (37.97)    164.5386 (58.55)    18.2469 (38.19)    153.1000 (58.23)    30.8557 (92.23)            2;0       7           1
test_big_list_objecttypes_with_two_int_fields     214.3099 (87.41)    252.1060 (49.82)    237.2049 (84.41)    16.0745 (33.64)    241.0800 (91.70)    26.6772 (79.74)            1;0       5           1
test_fragment_resolver_abstract                   263.5369 (107.48)   294.0340 (58.11)    275.1848 (97.93)    13.9760 (29.25)    268.7261 (102.21)   24.3396 (72.75)            1;0       5           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

New Promise implementation

------------------------------------------------------------------------------------------ benchmark: 5 tests -----------------------------------------------------------------------------------------
Name (time in ms)                                      Min                 Max                Mean             StdDev              Median                IQR            Outliers(*)  Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_big_list_of_ints_serialize                     2.4509 (1.0)        4.5359 (1.0)        2.9296 (1.0)       0.4356 (1.0)        2.7819 (1.0)       0.4752 (1.0)            54;25     351           1
test_big_list_of_ints                              14.3750 (5.87)      20.3481 (4.49)      16.1198 (5.50)      1.0453 (2.40)      15.9812 (5.74)      0.8274 (1.74)            15;6      65           1
test_big_list_objecttypes_with_one_int_field       73.8251 (30.12)    115.9289 (25.56)     92.0637 (31.43)    15.2907 (35.10)     82.6714 (29.72)    27.2505 (57.35)            4;0      12           1
test_big_list_objecttypes_with_two_int_fields      98.5930 (40.23)    149.9560 (33.06)    123.6130 (42.19)    19.3822 (44.50)    128.8331 (46.31)    35.7828 (75.31)            4;0       9           1
test_fragment_resolver_abstract                   115.6740 (47.20)    156.7039 (34.55)    138.5075 (47.28)    16.4670 (37.80)    146.8499 (52.79)    28.6682 (60.33)            3;0       7           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Member

syrusakbary commented Mar 11, 2017

I'm still working on improving Performance.
First step is quite close to be ready, is a new (and ultra-performant) promise implementation.

I'm going to drop here some numbers, so is easier to see the advantages by using just the faster implementation of promise:

Non-optimized GraphQL resolution

Old promise

------------------------------------------------------------------------------------------ benchmark: 5 tests -----------------------------------------------------------------------------------------
Name (time in ms)                                      Min                 Max                Mean             StdDev              Median                IQR            Outliers(*)  Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_big_list_of_ints_serialize                     2.4519 (1.0)        4.8950 (1.0)        2.8593 (1.0)       0.4961 (1.0)        2.6586 (1.0)       0.4846 (1.0)            48;21     380           1
test_big_list_of_ints                              61.0509 (24.90)     73.8399 (15.08)     66.3891 (23.22)     3.7764 (7.61)      66.2786 (24.93)     6.3930 (13.19)            6;0      16           1
test_big_list_objecttypes_with_one_int_field      231.4451 (94.39)    274.0550 (55.99)    253.6332 (88.70)    17.2165 (34.70)    257.7021 (96.93)    27.6580 (57.08)            2;0       5           1
test_big_list_objecttypes_with_two_int_fields     373.6482 (152.39)   407.3970 (83.23)    391.4426 (136.90)   14.5990 (29.43)    391.9201 (147.42)   26.1913 (54.05)            2;0       5           1
test_fragment_resolver_abstract                   233.4590 (95.22)    283.4949 (57.92)    259.2367 (90.66)    21.3765 (43.09)    263.5479 (99.13)    37.4374 (77.26)            2;0       5           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

New promise implementation syrusakbary/promise#23

------------------------------------------------------------------------------------------ benchmark: 5 tests -----------------------------------------------------------------------------------------
Name (time in ms)                                      Min                 Max                Mean             StdDev              Median                IQR            Outliers(*)  Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_big_list_of_ints_serialize                     2.4672 (1.0)        7.0231 (1.0)        2.9814 (1.0)       0.5989 (1.0)        2.7701 (1.0)       0.4563 (1.0)            40;31     378           1
test_big_list_of_ints                              23.3240 (9.45)      31.2262 (4.45)      26.8308 (9.00)      1.9695 (3.29)      26.7700 (9.66)      3.2494 (7.12)            14;0      36           1
test_big_list_objecttypes_with_one_int_field      165.3101 (67.00)    201.4430 (28.68)    181.6540 (60.93)    15.7699 (26.33)    181.4460 (65.50)    29.1352 (63.85)            3;0       6           1
test_big_list_objecttypes_with_two_int_fields     248.4190 (100.69)   291.1139 (41.45)    267.6542 (89.77)    17.9228 (29.93)    259.4721 (93.67)    28.7293 (62.96)            2;0       5           1
test_fragment_resolver_abstract                   112.4361 (45.57)    160.6219 (22.87)    139.5578 (46.81)    20.4794 (34.19)    149.4532 (53.95)    35.4158 (77.61)            2;0       7           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Optimized GraphQL resolution graphql-python/graphql-core#74

Old Promise

------------------------------------------------------------------------------------------ benchmark: 5 tests -----------------------------------------------------------------------------------------
Name (time in ms)                                      Min                 Max                Mean             StdDev              Median                IQR            Outliers(*)  Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_big_list_of_ints_serialize                     2.4519 (1.0)        5.0600 (1.0)        2.8100 (1.0)       0.4778 (1.0)        2.6290 (1.0)       0.3346 (1.0)            40;35     361           1
test_big_list_of_ints                              48.6422 (19.84)     61.3708 (12.13)     55.8666 (19.88)     2.9545 (6.18)      55.4373 (21.09)     2.9249 (8.74)             6;1      20           1
test_big_list_objecttypes_with_one_int_field      148.5479 (60.58)    192.1201 (37.97)    164.5386 (58.55)    18.2469 (38.19)    153.1000 (58.23)    30.8557 (92.23)            2;0       7           1
test_big_list_objecttypes_with_two_int_fields     214.3099 (87.41)    252.1060 (49.82)    237.2049 (84.41)    16.0745 (33.64)    241.0800 (91.70)    26.6772 (79.74)            1;0       5           1
test_fragment_resolver_abstract                   263.5369 (107.48)   294.0340 (58.11)    275.1848 (97.93)    13.9760 (29.25)    268.7261 (102.21)   24.3396 (72.75)            1;0       5           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

New Promise implementation

------------------------------------------------------------------------------------------ benchmark: 5 tests -----------------------------------------------------------------------------------------
Name (time in ms)                                      Min                 Max                Mean             StdDev              Median                IQR            Outliers(*)  Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_big_list_of_ints_serialize                     2.4509 (1.0)        4.5359 (1.0)        2.9296 (1.0)       0.4356 (1.0)        2.7819 (1.0)       0.4752 (1.0)            54;25     351           1
test_big_list_of_ints                              14.3750 (5.87)      20.3481 (4.49)      16.1198 (5.50)      1.0453 (2.40)      15.9812 (5.74)      0.8274 (1.74)            15;6      65           1
test_big_list_objecttypes_with_one_int_field       73.8251 (30.12)    115.9289 (25.56)     92.0637 (31.43)    15.2907 (35.10)     82.6714 (29.72)    27.2505 (57.35)            4;0      12           1
test_big_list_objecttypes_with_two_int_fields      98.5930 (40.23)    149.9560 (33.06)    123.6130 (42.19)    19.3822 (44.50)    128.8331 (46.31)    35.7828 (75.31)            4;0       9           1
test_fragment_resolver_abstract                   115.6740 (47.20)    156.7039 (34.55)    138.5075 (47.28)    16.4670 (37.80)    146.8499 (52.79)    28.6682 (60.33)            3;0       7           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
@syrusakbary

This comment has been minimized.

Show comment
Hide comment
@syrusakbary

syrusakbary Mar 11, 2017

Member

When used with PyPy the difference is even bigger, and this is just the beginning.
Also, when having multiple fields in a same ObjectType, the improvement is also quite significant.

After finishing this promise implementation, I will work on separate the serializer that I assume will give another ~2x gains if using a simple dict instead of OrderedDict for serialization, and maybe even higher if serialized directly to JSON. This will also open the possibility of using other serializers like msgpack :)

And after that, optimizations with Cython will help to crush all benchmarks! 😊

And all this, while preserving 100% compatibility with the GraphQL spec and the current GraphQL Graphene implementation, with no changes required for the developer, other than updating the package once the new version is published.

Member

syrusakbary commented Mar 11, 2017

When used with PyPy the difference is even bigger, and this is just the beginning.
Also, when having multiple fields in a same ObjectType, the improvement is also quite significant.

After finishing this promise implementation, I will work on separate the serializer that I assume will give another ~2x gains if using a simple dict instead of OrderedDict for serialization, and maybe even higher if serialized directly to JSON. This will also open the possibility of using other serializers like msgpack :)

And after that, optimizations with Cython will help to crush all benchmarks! 😊

And all this, while preserving 100% compatibility with the GraphQL spec and the current GraphQL Graphene implementation, with no changes required for the developer, other than updating the package once the new version is published.

@syrusakbary

This comment has been minimized.

Show comment
Hide comment
@syrusakbary

syrusakbary Mar 11, 2017

Member

PS: Meanwhile I'm also working on a dataloader implementation for Python that will solve the N+1 problem in GraphQL

Member

syrusakbary commented Mar 11, 2017

PS: Meanwhile I'm also working on a dataloader implementation for Python that will solve the N+1 problem in GraphQL

@qubitron

This comment has been minimized.

Show comment
Hide comment
@qubitron

qubitron Mar 11, 2017

Amazing work, @syrusakbary! Looking forward to the improvements, let me know if I can help test any changes.

Amazing work, @syrusakbary! Looking forward to the improvements, let me know if I can help test any changes.

@qubitron

This comment has been minimized.

Show comment
Hide comment
@qubitron

qubitron Mar 11, 2017

@syrusakbary I am a bit hesitant to use PyPy, I ran into some bugs/compatibility issues with Cython libraries (unrelated to graphene) and was getting mixed performance results using sqlalchemy. That being said, if the wins are there then it's always good to have that option.

@syrusakbary I am a bit hesitant to use PyPy, I ran into some bugs/compatibility issues with Cython libraries (unrelated to graphene) and was getting mixed performance results using sqlalchemy. That being said, if the wins are there then it's always good to have that option.

@syrusakbary

This comment has been minimized.

Show comment
Hide comment
@syrusakbary

syrusakbary Mar 13, 2017

Member

I've been able to improve a little bit more the type resolution, giving an extra ~35% in speed gains: graphql-python/graphql-core@81bcf8c.

New benchmarks (new promise and better type resolution with experimental executor)

--------------------------------------------------------------------------------------- benchmark: 5 tests ---------------------------------------------------------------------------------------
Name (time in ms)                                     Min                 Max               Mean            StdDev             Median               IQR            Outliers(*)  Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_big_list_of_ints_serialize                    2.6469 (1.0)        5.0581 (1.0)       2.9428 (1.0)      0.4469 (1.0)       2.7812 (1.0)      0.2511 (1.0)            47;53     401           1
test_big_list_of_ints                             13.6490 (5.16)      21.1191 (4.18)     15.1494 (5.15)     1.7030 (3.81)     14.3925 (5.18)     1.9491 (7.76)            12;2      62           1
test_big_list_objecttypes_with_one_int_field      60.2801 (22.77)     90.2431 (17.84)    67.1742 (22.83)    9.6505 (21.60)    63.0350 (22.67)    5.5089 (21.94)            2;2      15           1
test_big_list_objecttypes_with_two_int_fields     82.4349 (31.14)    110.2500 (21.80)    90.0414 (30.60)    7.7319 (17.30)    88.1380 (31.69)    9.3712 (37.32)            1;1      12           1
test_fragment_resolver_abstract                   92.1650 (34.82)    107.6009 (21.27)    98.8749 (33.60)    4.5259 (10.13)    97.8079 (35.17)    4.3540 (17.34)            2;0       8           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Member

syrusakbary commented Mar 13, 2017

I've been able to improve a little bit more the type resolution, giving an extra ~35% in speed gains: graphql-python/graphql-core@81bcf8c.

New benchmarks (new promise and better type resolution with experimental executor)

--------------------------------------------------------------------------------------- benchmark: 5 tests ---------------------------------------------------------------------------------------
Name (time in ms)                                     Min                 Max               Mean            StdDev             Median               IQR            Outliers(*)  Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_big_list_of_ints_serialize                    2.6469 (1.0)        5.0581 (1.0)       2.9428 (1.0)      0.4469 (1.0)       2.7812 (1.0)      0.2511 (1.0)            47;53     401           1
test_big_list_of_ints                             13.6490 (5.16)      21.1191 (4.18)     15.1494 (5.15)     1.7030 (3.81)     14.3925 (5.18)     1.9491 (7.76)            12;2      62           1
test_big_list_objecttypes_with_one_int_field      60.2801 (22.77)     90.2431 (17.84)    67.1742 (22.83)    9.6505 (21.60)    63.0350 (22.67)    5.5089 (21.94)            2;2      15           1
test_big_list_objecttypes_with_two_int_fields     82.4349 (31.14)    110.2500 (21.80)    90.0414 (30.60)    7.7319 (17.30)    88.1380 (31.69)    9.3712 (37.32)            1;1      12           1
test_fragment_resolver_abstract                   92.1650 (34.82)    107.6009 (21.27)    98.8749 (33.60)    4.5259 (10.13)    97.8079 (35.17)    4.3540 (17.34)            2;0       8           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
@syrusakbary

This comment has been minimized.

Show comment
Hide comment
@syrusakbary

syrusakbary Mar 13, 2017

Member

(all this benchmarks are without PyPy, just plain Python with the common CPython executor)

Member

syrusakbary commented Mar 13, 2017

(all this benchmarks are without PyPy, just plain Python with the common CPython executor)

@syrusakbary

This comment has been minimized.

Show comment
Hide comment
@syrusakbary

syrusakbary Mar 13, 2017

Member

The latest next-query-builder branch now includes the ultra-performant version of promise.

Just by running pip install pip install https://github.com/graphql-python/graphql-core/archive/features/next-query-builder.zip it should upgrade promise to promise>=2.0.dev.

(you will also need to do: executor.use_experimental_executor = True)

@qubitron Willing to know the extra performance improvements!

Member

syrusakbary commented Mar 13, 2017

The latest next-query-builder branch now includes the ultra-performant version of promise.

Just by running pip install pip install https://github.com/graphql-python/graphql-core/archive/features/next-query-builder.zip it should upgrade promise to promise>=2.0.dev.

(you will also need to do: executor.use_experimental_executor = True)

@qubitron Willing to know the extra performance improvements!

@qubitron

This comment has been minimized.

Show comment
Hide comment
@qubitron

qubitron Mar 14, 2017

@syrusakbary I gave this a test and I'm seeing similar performance numbers as the previous version, wish I had a different answer! Still seems to be spending about ~20ms of time in resolving in the graphql layer.

Do you still have the previous archive available? My code has changed somewhat and will be easier for me to compare results if I can change back and forth.

@syrusakbary I gave this a test and I'm seeing similar performance numbers as the previous version, wish I had a different answer! Still seems to be spending about ~20ms of time in resolving in the graphql layer.

Do you still have the previous archive available? My code has changed somewhat and will be easier for me to compare results if I can change back and forth.

@syrusakbary

This comment has been minimized.

Show comment
Hide comment
@syrusakbary

syrusakbary Mar 14, 2017

Member

No idea why there is not a performance improvement for your case.
It might be possible that the last version packages are not installed properly?

Here are the new promise 2.0+query-builder requirements:

# Installing Next query builder
pip install https://github.com/graphql-python/graphql-core/archive/features/next-query-builder.zip
# Installing promise 2.0
pip install "promise>=2.0.dev"

Here are the previous requirements:

# Installing Next query builder (working with old promise)
pip install https://github.com/graphql-python/graphql-core/archive/features/next-query-builder-prev.zip
# Installing promise 1.x
pip install "promise==1.0.1"

For verifying that the versions installed corresponds with the ones listed, you can do:

pip freeze | grep "graphql"
pip freeze | grep "promise"
Member

syrusakbary commented Mar 14, 2017

No idea why there is not a performance improvement for your case.
It might be possible that the last version packages are not installed properly?

Here are the new promise 2.0+query-builder requirements:

# Installing Next query builder
pip install https://github.com/graphql-python/graphql-core/archive/features/next-query-builder.zip
# Installing promise 2.0
pip install "promise>=2.0.dev"

Here are the previous requirements:

# Installing Next query builder (working with old promise)
pip install https://github.com/graphql-python/graphql-core/archive/features/next-query-builder-prev.zip
# Installing promise 1.x
pip install "promise==1.0.1"

For verifying that the versions installed corresponds with the ones listed, you can do:

pip freeze | grep "graphql"
pip freeze | grep "promise"
@jameswyse

This comment has been minimized.

Show comment
Hide comment
@jameswyse

jameswyse May 7, 2017

I've been trying out the experimental executor (with all libs on latest stable versions) but it seems to be slower with it enabled.

I did some basic comparison benchmarks by recording the total request time of my two largest / most complex queries with it enabled and disabled and the average results were:

  • Query 1 went from 1738ms to 2142ms (404ms slower)
  • Query 2 went from 1453ms -> 1749ms (296ms slower)

I've been trying out the experimental executor (with all libs on latest stable versions) but it seems to be slower with it enabled.

I did some basic comparison benchmarks by recording the total request time of my two largest / most complex queries with it enabled and disabled and the average results were:

  • Query 1 went from 1738ms to 2142ms (404ms slower)
  • Query 2 went from 1453ms -> 1749ms (296ms slower)
@syrusakbary

This comment has been minimized.

Show comment
Hide comment
@syrusakbary

syrusakbary May 7, 2017

Member

Hi @jameswyse,
The experimental executor is specially suited when returning big lists of scalars or ObjectTypes with few fields. However, it should beat the normal executor in almost all benchmarks.

Could be possible to have a repo that let me reproduce it so I can analyze better? :)

Member

syrusakbary commented May 7, 2017

Hi @jameswyse,
The experimental executor is specially suited when returning big lists of scalars or ObjectTypes with few fields. However, it should beat the normal executor in almost all benchmarks.

Could be possible to have a repo that let me reproduce it so I can analyze better? :)

@jameswyse

This comment has been minimized.

Show comment
Hide comment
@jameswyse

jameswyse May 9, 2017

@syrusakbary that makes sense. Most of our slower queries are lists of ObjectTypes from Django models with lots of relations / deep nesting and most of the request time seems to be spent in python instantiating and resolving all these types.

It's kinda tricky to extract but I'll try to put a repo together soon 👍

@syrusakbary that makes sense. Most of our slower queries are lists of ObjectTypes from Django models with lots of relations / deep nesting and most of the request time seems to be spent in python instantiating and resolving all these types.

It's kinda tricky to extract but I'll try to put a repo together soon 👍

@sirmarlo

This comment has been minimized.

Show comment
Hide comment
@sirmarlo

sirmarlo Aug 3, 2017

Has anyone gotten any improvement with this challenge (i.e. slow response times due to field resolution)? We are addressing this issue as well with even smaller data sets (returning 50-100 items). Our main issue is our data types are large and a bit nested so there are multiple fields that need to get resolved whenever a client has a complex query.

We've tried adding some layer of caching in the field resolution but are unable to get something feasible even if we cache the resolve functions and/or the execute call within a custom executor (that we inherit from SyncExecutor).

sirmarlo commented Aug 3, 2017

Has anyone gotten any improvement with this challenge (i.e. slow response times due to field resolution)? We are addressing this issue as well with even smaller data sets (returning 50-100 items). Our main issue is our data types are large and a bit nested so there are multiple fields that need to get resolved whenever a client has a complex query.

We've tried adding some layer of caching in the field resolution but are unable to get something feasible even if we cache the resolve functions and/or the execute call within a custom executor (that we inherit from SyncExecutor).

@syrusakbary

This comment has been minimized.

Show comment
Hide comment
@syrusakbary

syrusakbary Oct 28, 2017

Member

Hi all!

I'm working in Quiver, the next generation GraphQL engine.
This engine works in a similar way as a high-performance template engine.

The queries will be compiled directly to python functions, so this way remove the overhead of the GraphQL framework, and the queries will be as performant as calling all the resolution functions by hand. With it we can se a 5-10x improvement over the default GraphQL engine.

Right now is closed-source and specifically directed to medium-large size companies.
So if please, you have any needs to speed up GraphQL an order of magnitude contact me.

http://graphql-quiver.com

Member

syrusakbary commented Oct 28, 2017

Hi all!

I'm working in Quiver, the next generation GraphQL engine.
This engine works in a similar way as a high-performance template engine.

The queries will be compiled directly to python functions, so this way remove the overhead of the GraphQL framework, and the queries will be as performant as calling all the resolution functions by hand. With it we can se a 5-10x improvement over the default GraphQL engine.

Right now is closed-source and specifically directed to medium-large size companies.
So if please, you have any needs to speed up GraphQL an order of magnitude contact me.

http://graphql-quiver.com

@syrusakbary

This comment has been minimized.

Show comment
Hide comment
@syrusakbary

syrusakbary Jun 14, 2018

Member

Hi everyone,

Quiver is now ready to being used by the public!
I released a new article analyzing how it works:
https://medium.com/@syrusakbary/quiver-graphql-on-steroids-13612ea1ea77

You can register here:
https://graphql-quiver.com/signup/

Please let me know if you would like to start using it or have any question :)

Member

syrusakbary commented Jun 14, 2018

Hi everyone,

Quiver is now ready to being used by the public!
I released a new article analyzing how it works:
https://medium.com/@syrusakbary/quiver-graphql-on-steroids-13612ea1ea77

You can register here:
https://graphql-quiver.com/signup/

Please let me know if you would like to start using it or have any question :)

@wxkin

This comment has been minimized.

Show comment
Hide comment
@wxkin

wxkin Jul 31, 2018

is there any update on this? Consider the following example, inspired by @mwilliamson-healx, where 100K Users need to be returned. Using graphene.Scalar the response is x8 faster.

  • Is there any way to drop the type checking/casting? Or anyway to speed up the resolves?
    It is not really a use case to return such number of objects however it is a use case to return a few thousand nested objects.
import graphene
import cProfile
import StringIO
import pstats
from contextlib import contextmanager
from graphene.test import Client

@contextmanager
def profile(show_calls=None, message=None):
  print("\n============== Profiler start ==============" )
  if message:
    print(" " + message + " ... ")

  pr = cProfile.Profile()
  pr.enable()
  yield
  pr.disable()
  s = StringIO.StringIO()
  sortby = 'cumulative'
  ps = pstats.Stats(pr, stream=s).sort_stats(sortby)
  if show_calls:
    ps.print_stats()
    print s.getvalue()
  else:
    print ('Execution time %f seconds' % ps.total_tt)
  print("--------------------------------------------")


class UserQuery(graphene.ObjectType):
    id = graphene.Int()

class UserAbstract(graphene.Scalar):

  @staticmethod
  def serialize(dt):
    return dt

class Query(graphene.ObjectType):
    users = graphene.Field(graphene.List(UserQuery))
    users_abstract = graphene.Field(graphene.List(UserAbstract))

    def resolve_users(self, context):
      return users

    def resolve_users_abstract(self, context):
      return resolved_users


class User(object):
    def __init__(self, id):
        self.id = id

nof_users = 100000
users = [User(index) for index in range(nof_users)]
resolved_users = [user.__dict__ for user in users]

schema = graphene.Schema(query=Query)

with profile(message="Fetch using ObjectType", show_calls=False):
  response = Client(schema).execute('{users{id}}')
  assert (len(response['data']['users']) == nof_users)


with profile(message="Fetch using Scalar", show_calls=False):
  response = Client(schema).execute('{usersAbstract}')
  assert (len(response['data']['usersAbstract']) == nof_users)
============== Profiler start ==============
 Fetch using ObjectType ... 
Execution time 8.595509 seconds
--------------------------------------------

============== Profiler start ==============
 Fetch using Scalar ... 
Execution time 1.008313 seconds
--------------------------------------------

wxkin commented Jul 31, 2018

is there any update on this? Consider the following example, inspired by @mwilliamson-healx, where 100K Users need to be returned. Using graphene.Scalar the response is x8 faster.

  • Is there any way to drop the type checking/casting? Or anyway to speed up the resolves?
    It is not really a use case to return such number of objects however it is a use case to return a few thousand nested objects.
import graphene
import cProfile
import StringIO
import pstats
from contextlib import contextmanager
from graphene.test import Client

@contextmanager
def profile(show_calls=None, message=None):
  print("\n============== Profiler start ==============" )
  if message:
    print(" " + message + " ... ")

  pr = cProfile.Profile()
  pr.enable()
  yield
  pr.disable()
  s = StringIO.StringIO()
  sortby = 'cumulative'
  ps = pstats.Stats(pr, stream=s).sort_stats(sortby)
  if show_calls:
    ps.print_stats()
    print s.getvalue()
  else:
    print ('Execution time %f seconds' % ps.total_tt)
  print("--------------------------------------------")


class UserQuery(graphene.ObjectType):
    id = graphene.Int()

class UserAbstract(graphene.Scalar):

  @staticmethod
  def serialize(dt):
    return dt

class Query(graphene.ObjectType):
    users = graphene.Field(graphene.List(UserQuery))
    users_abstract = graphene.Field(graphene.List(UserAbstract))

    def resolve_users(self, context):
      return users

    def resolve_users_abstract(self, context):
      return resolved_users


class User(object):
    def __init__(self, id):
        self.id = id

nof_users = 100000
users = [User(index) for index in range(nof_users)]
resolved_users = [user.__dict__ for user in users]

schema = graphene.Schema(query=Query)

with profile(message="Fetch using ObjectType", show_calls=False):
  response = Client(schema).execute('{users{id}}')
  assert (len(response['data']['users']) == nof_users)


with profile(message="Fetch using Scalar", show_calls=False):
  response = Client(schema).execute('{usersAbstract}')
  assert (len(response['data']['usersAbstract']) == nof_users)
============== Profiler start ==============
 Fetch using ObjectType ... 
Execution time 8.595509 seconds
--------------------------------------------

============== Profiler start ==============
 Fetch using Scalar ... 
Execution time 1.008313 seconds
--------------------------------------------
@syrusakbary

This comment has been minimized.

Show comment
Hide comment
@syrusakbary

syrusakbary Aug 1, 2018

Member

With the current execution model is almost impossible to achieve more speedup.

PS: I just tested your code using Quiver, and the performance Gains are considerable (10x)

Without Quiver

============== Profiler start ==============
 Fetch using ObjectType ...
Execution time 3.200170 seconds
--------------------------------------------

============== Profiler start ==============
 Fetch using Scalar ...
Execution time 0.540605 seconds
--------------------------------------------

With Quiver

About 10x speedup.

============== Profiler start ==============
 Fetch using ObjectType ...
Execution time 0.333384 seconds
--------------------------------------------

============== Profiler start ==============
 Fetch using Scalar ...
Execution time 0.048430 seconds
--------------------------------------------

Here is the code I used for testing:

import graphene
import cProfile
from io import StringIO
import pstats
from contextlib import contextmanager

from graphql import GraphQLDeciderBackend, GraphQLCachedBackend, GraphQLCoreBackend
from graphql.backend.quiver_cloud import GraphQLQuiverCloudBackend


@contextmanager
def profile(show_calls=None, message=None):
    print("\n============== Profiler start ==============")
    if message:
        print(" " + message + " ... ")

    pr = cProfile.Profile()
    pr.enable()
    yield
    pr.disable()
    s = StringIO()
    sortby = "cumulative"
    ps = pstats.Stats(pr, stream=s).sort_stats(sortby)
    if show_calls:
        ps.print_stats()
        print(s.getvalue())
    else:
        print("Execution time %f seconds" % ps.total_tt)
    print("--------------------------------------------")


class UserQuery(graphene.ObjectType):
    id = graphene.Int()


class UserAbstract(graphene.Scalar):
    @staticmethod
    def serialize(dt):
        return dt


class Query(graphene.ObjectType):
    users = graphene.Field(graphene.List(UserQuery))
    users_abstract = graphene.Field(graphene.List(UserAbstract))

    def resolve_users(self, context):
        return users

    def resolve_users_abstract(self, context):
        return resolved_users


class User(object):
    def __init__(self, id):
        self.id = id


# For using Quiver
QUIVER_DSN = "https://YOUR_DSN@api.graphql-quiver.com/"  # You get this DSN when registering in GraphQL-quiver.com website and creating a Project, no cost for trying it
backend = GraphQLQuiverCloudBackend(QUIVER_DSN, {"asyncFramework": None})

# For use the normal backend
# backend = GraphQLCoreBackend()


nof_users = 100000
users = [User(index) for index in range(nof_users)]
resolved_users = [user.__dict__ for user in users]

schema = graphene.Schema(query=Query)

document1 = backend.document_from_string(schema, "{users{id}}")
document2 = backend.document_from_string(schema, "{usersAbstract}")

with profile(message="Fetch using ObjectType", show_calls=False):
    response = document1.execute()
    assert len(response.data["users"]) == nof_users


with profile(message="Fetch using Scalar", show_calls=False):
    response = document2.execute()
    assert len(response.data["usersAbstract"]) == nof_users
Member

syrusakbary commented Aug 1, 2018

With the current execution model is almost impossible to achieve more speedup.

PS: I just tested your code using Quiver, and the performance Gains are considerable (10x)

Without Quiver

============== Profiler start ==============
 Fetch using ObjectType ...
Execution time 3.200170 seconds
--------------------------------------------

============== Profiler start ==============
 Fetch using Scalar ...
Execution time 0.540605 seconds
--------------------------------------------

With Quiver

About 10x speedup.

============== Profiler start ==============
 Fetch using ObjectType ...
Execution time 0.333384 seconds
--------------------------------------------

============== Profiler start ==============
 Fetch using Scalar ...
Execution time 0.048430 seconds
--------------------------------------------

Here is the code I used for testing:

import graphene
import cProfile
from io import StringIO
import pstats
from contextlib import contextmanager

from graphql import GraphQLDeciderBackend, GraphQLCachedBackend, GraphQLCoreBackend
from graphql.backend.quiver_cloud import GraphQLQuiverCloudBackend


@contextmanager
def profile(show_calls=None, message=None):
    print("\n============== Profiler start ==============")
    if message:
        print(" " + message + " ... ")

    pr = cProfile.Profile()
    pr.enable()
    yield
    pr.disable()
    s = StringIO()
    sortby = "cumulative"
    ps = pstats.Stats(pr, stream=s).sort_stats(sortby)
    if show_calls:
        ps.print_stats()
        print(s.getvalue())
    else:
        print("Execution time %f seconds" % ps.total_tt)
    print("--------------------------------------------")


class UserQuery(graphene.ObjectType):
    id = graphene.Int()


class UserAbstract(graphene.Scalar):
    @staticmethod
    def serialize(dt):
        return dt


class Query(graphene.ObjectType):
    users = graphene.Field(graphene.List(UserQuery))
    users_abstract = graphene.Field(graphene.List(UserAbstract))

    def resolve_users(self, context):
        return users

    def resolve_users_abstract(self, context):
        return resolved_users


class User(object):
    def __init__(self, id):
        self.id = id


# For using Quiver
QUIVER_DSN = "https://YOUR_DSN@api.graphql-quiver.com/"  # You get this DSN when registering in GraphQL-quiver.com website and creating a Project, no cost for trying it
backend = GraphQLQuiverCloudBackend(QUIVER_DSN, {"asyncFramework": None})

# For use the normal backend
# backend = GraphQLCoreBackend()


nof_users = 100000
users = [User(index) for index in range(nof_users)]
resolved_users = [user.__dict__ for user in users]

schema = graphene.Schema(query=Query)

document1 = backend.document_from_string(schema, "{users{id}}")
document2 = backend.document_from_string(schema, "{usersAbstract}")

with profile(message="Fetch using ObjectType", show_calls=False):
    response = document1.execute()
    assert len(response.data["users"]) == nof_users


with profile(message="Fetch using Scalar", show_calls=False):
    response = document2.execute()
    assert len(response.data["usersAbstract"]) == nof_users
@samlll42-github

This comment has been minimized.

Show comment
Hide comment
@samlll42-github

samlll42-github Aug 8, 2018

We are also having a lot of performance problem with large set (few hundreds, or few thousands) being excruciatingly slow to resolve.

Besides non-opensource options (.ie Quiver), will graphene/graphql-core ever have decent performance on large sets? Is anybody working on that or have ideas on how we could optimize? We are happy to help if there is any such initiative in progress.

We are also having a lot of performance problem with large set (few hundreds, or few thousands) being excruciatingly slow to resolve.

Besides non-opensource options (.ie Quiver), will graphene/graphql-core ever have decent performance on large sets? Is anybody working on that or have ideas on how we could optimize? We are happy to help if there is any such initiative in progress.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment