New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query batching and/or dynamic queries. #17

Open
metalmatze opened this Issue Jul 20, 2017 · 9 comments

Comments

3 participants
@metalmatze

metalmatze commented Jul 20, 2017

I want to query multiple repositories at the same time. But I don't want to write the query for a specific number of repositories, but rather create the query at runtime.
I currently don't see a way to do that.

{
  justwatchcom_gopass: repository(owner: "justwatchcom", name: "gopass") {
    id
    name
  }
  prometheus_prometheus: repository(owner: "prometheus", name: "prometheus") {
    id
    name
  }
   ... a lot of other dynamic repositories added at runtime
}

As a work around I'm doing these queries one after another, but as it's one of the benefits of doing multiple resource queries at ones we should try to support this.

@dmitshur

This comment has been minimized.

Member

dmitshur commented Jul 20, 2017

This is a valid use case that githubql should support, but currently doesn't (at least I can't think of a way). Thanks for reporting.

I'll need to think about how to best handle this.

@dmitshur dmitshur changed the title from Dynamically add multiple aliases to a query to Dynamic queries. Jul 20, 2017

@dmitshur

This comment has been minimized.

Member

dmitshur commented Jul 31, 2017

One idea I have (that I want to jot down) about how to potentially tackle this is to support map types for dynamic queries such as this one.

So, your query could look something like this:

var q = map[string]interface{}{
	`repository(owner: "octocat", name: "Hello-World")`: struct {
		Description githubql.String
	}{},
	`repository(owner: "justwatchcom", name: "gopass")`: struct {
		ID   githubql.ID
		Name githubql.String
	}{},
	`repository(owner: "prometheus", name: "prometheus")`: struct {
		ID   githubql.ID
		Name githubql.String
	}{},
}

However, it needs to be carefully thought out. It introduces mixing query information within types and values, which I've seen work poorly when I attempted it in the past.

(Just wanted to write it down so I don't forget. But it's possible another solution will be better.)

@dmitshur

This comment has been minimized.

Member

dmitshur commented Jul 31, 2017

Another solution might be to add a new different method that lets you pass a GraphQL query as a string (which you still have to construct yourself), and returns results as JSON you have to parse yourself, or already parsed into a map[string]interface{}. But this would be a very different API to make queries, so I see it as a last resort, if no better solution can be found.

@dmitshur

This comment has been minimized.

Member

dmitshur commented Sep 1, 2017

I wanted to post an update here. I also ran into this need recently at:

https://github.com/shurcooL/notifications/blob/b2920e64fbc3c388d5191433dd5493c75f302c14/githubapi/githubapi.go#L92-L96

I will continue to think about the best possible resolution to this, and post updates here if I have any. The above just means it'll be slightly easier for me to evaluate an idea, if I get a new one.

@dmitshur

This comment has been minimized.

Member

dmitshur commented Mar 1, 2018

There's been yet another place I would've found this handy:

// fetchCommit fetches the specified commit.
func (s *service) fetchCommit(ctx context.Context, repoID int64, sha string) (*commit, error) {
	// TODO: It'd be better to be able to batch and fetch all commits at once (in fetchEvents loop),
	//       rather than making an individual query for each.
	//       See https://github.com/shurcooL/githubql/issues/17.

	// ...
	err := s.clV4.Query(ctx, &q, variables) // Fetch a single commit.
	// ...
}

I'm starting to think that a good way of thinking about this issue might be as "query batching" rather than fully dynamic queries. The idea would be you provide a single query and a array of different variables, and you get a result for that query for each element in the variables array.

This line of thinking might help arrive at a reasonable API that works for most usescases.

@dmitshur dmitshur changed the title from Dynamic queries. to Query batching and/or dynamic queries. Mar 1, 2018

@osela

This comment has been minimized.

osela commented Sep 3, 2018

@dmitshur Any progress on this? I'd love to help. I'm currently skipping this library entirely in these scenarios and using simple POST requests with string queries I construct myself.

It seems to me like many scenarios could be represented as a field of type map[string]SomeType. So the question is about making it easy and clear to provide aliases and parameters? Or is it more fundamental?

@dmitshur

This comment has been minimized.

Member

dmitshur commented Oct 28, 2018

@osela There's no progress on this issue from me, because I haven't had any free time left over to think about this. If you're looking to solve this, I'd recommend prototyping a solution on your own branch and sharing your updates here.

dmitshur added a commit to shurcooL/notifications that referenced this issue Nov 11, 2018

githubapi: Batch GraphQL queries in List.
One of the advantages of using GraphQL is that it's possible to fetch
all the required information in a single query. We were making many
GraphQL queries, one for each notification. This can become very slow
and inefficient when there are many notifications.

Ideally, the entire List endpoint should be implemented with a single
GraphQL query. However, it's not possible because GitHub GraphQL API v4
still doesn't offer access to notifications the way GitHub API v3 does.

So, we do the best we can for now, and batch all GraphQL queries into
a single query. Use top-level aliases to combine multiple queries into
one. Use reflect.StructOf to construct the query struct type at runtime.
This is functional, although perhaps there are opportunities to make it
more user friendly in the graphql/githubv4 libraries. That will be
investigated in the future.

The performance of List endpoint when listing 145 GitHub notifications
improves from 15~ seconds to 3~ seconds after this change.

Updates shurcooL/githubv4#17.
@dmitshur

This comment has been minimized.

Member

dmitshur commented Nov 11, 2018

I've made significant progress on this issue this weekend. It turns out it has been possible to perform query batching and/or dynamic queries all along, without any API changes to this package. Read on for details.

Consider the following GraphQL query to fetch multiple GitHub repositories:

{
  go: repository(owner: "golang", name: "go") {
    nameWithOwner
    createdAt
    description
  }
  graphql: repository(owner: "shurcooL", name: "githubv4") {
    nameWithOwner
    createdAt
    description
  }
}

If executed against GitHub GraphQL API v4 It returns a JSON response like:

GraphQL JSON Response
{
  "data": {
    "go": {
      "nameWithOwner": "golang/go",
      "createdAt": "2014-08-19T04:33:40Z",
      "description": "The Go programming language"
    },
    "graphql": {
      "nameWithOwner": "shurcooL/githubv4",
      "createdAt": "2017-05-27T05:05:31Z",
      "description": "Package githubv4 is a client library for accessing GitHub GraphQL API v4 (https://developer.github.com/v4/)."
    }
  }
}

It's possible to perform that exact query using githubv4 package like so:

var q struct {
	Go struct {
		NameWithOwner string
		CreatedAt     time.Time
		Description   string
	} `graphql:"go: repository(owner: \"golang\", name: \"go\")"`
	GitHubV4 struct {
		NameWithOwner string
		CreatedAt     time.Time
		Description   string
	} `graphql:"graphql: repository(owner: \"shurcooL\", name: \"githubv4\")"`
}
err := client.Query(context.Background(), &q, nil)
if err != nil {
	return err
}

enc := json.NewEncoder(os.Stdout)
enc.SetIndent("", "\t")
enc.Encode(q)

// Output:
// {
// 	"Go": {
// 		"NameWithOwner": "golang/go",
// 		"CreatedAt": "2014-08-19T04:33:40Z",
// 		"Description": "The Go programming language"
// 	},
// 	"GitHubV4": {
// 		"NameWithOwner": "shurcooL/githubv4",
// 		"CreatedAt": "2017-05-27T05:05:31Z",
// 		"Description": "Package githubv4 is a client library for accessing GitHub GraphQL API v4 (https://developer.github.com/v4/)."
// 	}
// }

Of course, the list of repositories can only be adjusted at compile time, since it's a part of the query struct type.

However, I got an idea: it's possible to use reflect package and its reflect.StructOf function to dynamically construct a query struct. For example, the same query as above, created at runtime:

q := reflect.New(reflect.StructOf([]reflect.StructField{
	{
		Name: "Go", Type: reflect.TypeOf(struct {
			NameWithOwner string
			CreatedAt     time.Time
			Description   string
		}{}), Tag: `graphql:"go: repository(owner: \"golang\", name: \"go\")"`,
	},
	{
		Name: "GitHubV4", Type: reflect.TypeOf(struct {
			NameWithOwner string
			CreatedAt     time.Time
			Description   string
		}{}), Tag: `graphql:"graphql: repository(owner: \"shurcooL\", name: \"githubv4\")"`,
	},
})).Elem()
err := client.Query(context.Background(), q.Addr().Interface(), nil)
if err != nil {
	return err
}

enc := json.NewEncoder(os.Stdout)
enc.SetIndent("", "\t")
enc.Encode(q.Interface())

// Output:
// {
// 	"Go": {
// 		"NameWithOwner": "golang/go",
// 		"CreatedAt": "2014-08-19T04:33:40Z",
// 		"Description": "The Go programming language"
// 	},
// 	"GitHubV4": {
// 		"NameWithOwner": "shurcooL/githubv4",
// 		"CreatedAt": "2017-05-27T05:05:31Z",
// 		"Description": "Package githubv4 is a client library for accessing GitHub GraphQL API v4 (https://developer.github.com/v4/)."
// 	}
// }

As you can see, it works, and produces the same results. Unlike the case above, the struct is generated dynamically, so it's possible to add arbitrary repositories to query at runtime.

It's important to note I used the word "possible" at the beginning. The syntax for using reflect is more cumbersome compared to declaring a Go type using normal Go code, and this is not necessarily the final solution. But it's a good step forward. I have some ideas for how to wrap this same functionality in a nicer API, to be explored later.

I've prototyped this approach in a real codebase where I wanted to perform GraphQL query batching, and it seems to work well. See shurcooL/notifications@9264031.

@osela

This comment has been minimized.

osela commented Nov 14, 2018

@dmitshur That's a nice approach, and it allows great flexibility. I don't know what is the nicer API you had in mind, but I'm wondering whether the common use case of query batching (rather than fully dynamic queries) doesn't deserve it's own API.

The major drawback I see in this approach (apart from the cumbersome use of reflect) is that it doesn't allow reuse of simpler queries structs.

Ideally, I would like to take a simple struct that I already use for querying a single repo

type q struct {
	Repository struct {
		Description string
	} `graphql:"repository(owner: $owner, name: $name)"`
}

and use it in a batch query, in a way that closely resembles the single query API. Something like

var batch []*q
variables := []map[string]interface{}{
	{
		"owner": "golang",
		"name":  "go",
	},
	{
		"owner": "shurcooL",
		"name":  "githubv4",
	},
}
err := client.BatchQuery(context.Background(), &batch, variables)

The implementation would probably involve some ugly regex work to rename the arguments and the aliases so they are unique, but it's mostly hidden from the user. I'm still thinking about the best way to handle errors though.

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment