Query batching and/or dynamic queries. #17

metalmatze · 2017-07-20T17:53:50Z

I want to query multiple repositories at the same time. But I don't want to write the query for a specific number of repositories, but rather create the query at runtime.
I currently don't see a way to do that.

{
  justwatchcom_gopass: repository(owner: "justwatchcom", name: "gopass") {
    id
    name
  }
  prometheus_prometheus: repository(owner: "prometheus", name: "prometheus") {
    id
    name
  }
   ... a lot of other dynamic repositories added at runtime
}

As a work around I'm doing these queries one after another, but as it's one of the benefits of doing multiple resource queries at ones we should try to support this.

dmitshur · 2017-07-20T18:11:35Z

This is a valid use case that githubql should support, but currently doesn't (at least I can't think of a way). Thanks for reporting.

I'll need to think about how to best handle this.

dmitshur · 2017-07-31T20:33:14Z

One idea I have (that I want to jot down) about how to potentially tackle this is to support map types for dynamic queries such as this one.

So, your query could look something like this:

var q = map[string]interface{}{
	`repository(owner: "octocat", name: "Hello-World")`: struct {
		Description githubql.String
	}{},
	`repository(owner: "justwatchcom", name: "gopass")`: struct {
		ID   githubql.ID
		Name githubql.String
	}{},
	`repository(owner: "prometheus", name: "prometheus")`: struct {
		ID   githubql.ID
		Name githubql.String
	}{},
}

However, it needs to be carefully thought out. It introduces mixing query information within types and values, which I've seen work poorly when I attempted it in the past.

(Just wanted to write it down so I don't forget. But it's possible another solution will be better.)

dmitshur · 2017-07-31T20:36:59Z

Another solution might be to add a new different method that lets you pass a GraphQL query as a string (which you still have to construct yourself), and returns results as JSON you have to parse yourself, or already parsed into a map[string]interface{}. But this would be a very different API to make queries, so I see it as a last resort, if no better solution can be found.

dmitshur · 2017-09-01T00:14:07Z

I wanted to post an update here. I also ran into this need recently at:

https://github.com/shurcooL/notifications/blob/b2920e64fbc3c388d5191433dd5493c75f302c14/githubapi/githubapi.go#L92-L96

I will continue to think about the best possible resolution to this, and post updates here if I have any. The above just means it'll be slightly easier for me to evaluate an idea, if I get a new one.

dmitshur · 2018-03-01T05:25:13Z

There's been yet another place I would've found this handy:

// fetchCommit fetches the specified commit.
func (s *service) fetchCommit(ctx context.Context, repoID int64, sha string) (*commit, error) {
	// TODO: It'd be better to be able to batch and fetch all commits at once (in fetchEvents loop),
	//       rather than making an individual query for each.
	//       See https://github.com/shurcooL/githubql/issues/17.

	// ...
	err := s.clV4.Query(ctx, &q, variables) // Fetch a single commit.
	// ...
}

I'm starting to think that a good way of thinking about this issue might be as "query batching" rather than fully dynamic queries. The idea would be you provide a single query and a array of different variables, and you get a result for that query for each element in the variables array.

This line of thinking might help arrive at a reasonable API that works for most usescases.

osela · 2018-09-03T23:24:46Z

@dmitshur Any progress on this? I'd love to help. I'm currently skipping this library entirely in these scenarios and using simple POST requests with string queries I construct myself.

It seems to me like many scenarios could be represented as a field of type map[string]SomeType. So the question is about making it easy and clear to provide aliases and parameters? Or is it more fundamental?

dmitshur · 2018-10-28T22:06:24Z

@osela There's no progress on this issue from me, because I haven't had any free time left over to think about this. If you're looking to solve this, I'd recommend prototyping a solution on your own branch and sharing your updates here.

One of the advantages of using GraphQL is that it's possible to fetch all the required information in a single query. We were making many GraphQL queries, one for each notification. This can become very slow and inefficient when there are many notifications. Ideally, the entire List endpoint should be implemented with a single GraphQL query. However, it's not possible because GitHub GraphQL API v4 still doesn't offer access to notifications the way GitHub API v3 does. So, we do the best we can for now, and batch all GraphQL queries into a single query. Use top-level aliases to combine multiple queries into one. Use reflect.StructOf to construct the query struct type at runtime. This is functional, although perhaps there are opportunities to make it more user friendly in the graphql/githubv4 libraries. That will be investigated in the future. The performance of List endpoint when listing 145 GitHub notifications improves from 15~ seconds to 3~ seconds after this change. Updates shurcooL/githubv4#17.

dmitshur · 2018-11-11T06:22:19Z

I've made significant progress on this issue this weekend. It turns out it has been possible to perform query batching and/or dynamic queries all along, without any API changes to this package. Read on for details.

Consider the following GraphQL query to fetch multiple GitHub repositories:

{
  go: repository(owner: "golang", name: "go") {
    nameWithOwner
    createdAt
    description
  }
  graphql: repository(owner: "shurcooL", name: "githubv4") {
    nameWithOwner
    createdAt
    description
  }
}

If executed against GitHub GraphQL API v4 It returns a JSON response like:

GraphQL JSON Response

{
  "data": {
    "go": {
      "nameWithOwner": "golang/go",
      "createdAt": "2014-08-19T04:33:40Z",
      "description": "The Go programming language"
    },
    "graphql": {
      "nameWithOwner": "shurcooL/githubv4",
      "createdAt": "2017-05-27T05:05:31Z",
      "description": "Package githubv4 is a client library for accessing GitHub GraphQL API v4 (https://developer.github.com/v4/)."
    }
  }
}

It's possible to perform that exact query using githubv4 package like so:

var q struct {
	Go struct {
		NameWithOwner string
		CreatedAt     time.Time
		Description   string
	} `graphql:"go: repository(owner: \"golang\", name: \"go\")"`
	GitHubV4 struct {
		NameWithOwner string
		CreatedAt     time.Time
		Description   string
	} `graphql:"graphql: repository(owner: \"shurcooL\", name: \"githubv4\")"`
}
err := client.Query(context.Background(), &q, nil)
if err != nil {
	return err
}

enc := json.NewEncoder(os.Stdout)
enc.SetIndent("", "\t")
enc.Encode(q)

// Output:
// {
// 	"Go": {
// 		"NameWithOwner": "golang/go",
// 		"CreatedAt": "2014-08-19T04:33:40Z",
// 		"Description": "The Go programming language"
// 	},
// 	"GitHubV4": {
// 		"NameWithOwner": "shurcooL/githubv4",
// 		"CreatedAt": "2017-05-27T05:05:31Z",
// 		"Description": "Package githubv4 is a client library for accessing GitHub GraphQL API v4 (https://developer.github.com/v4/)."
// 	}
// }

Of course, the list of repositories can only be adjusted at compile time, since it's a part of the query struct type.

However, I got an idea: it's possible to use reflect package and its reflect.StructOf function to dynamically construct a query struct. For example, the same query as above, created at runtime:

q := reflect.New(reflect.StructOf([]reflect.StructField{
	{
		Name: "Go", Type: reflect.TypeOf(struct {
			NameWithOwner string
			CreatedAt     time.Time
			Description   string
		}{}), Tag: `graphql:"go: repository(owner: \"golang\", name: \"go\")"`,
	},
	{
		Name: "GitHubV4", Type: reflect.TypeOf(struct {
			NameWithOwner string
			CreatedAt     time.Time
			Description   string
		}{}), Tag: `graphql:"graphql: repository(owner: \"shurcooL\", name: \"githubv4\")"`,
	},
})).Elem()
err := client.Query(context.Background(), q.Addr().Interface(), nil)
if err != nil {
	return err
}

enc := json.NewEncoder(os.Stdout)
enc.SetIndent("", "\t")
enc.Encode(q.Interface())

// Output:
// {
// 	"Go": {
// 		"NameWithOwner": "golang/go",
// 		"CreatedAt": "2014-08-19T04:33:40Z",
// 		"Description": "The Go programming language"
// 	},
// 	"GitHubV4": {
// 		"NameWithOwner": "shurcooL/githubv4",
// 		"CreatedAt": "2017-05-27T05:05:31Z",
// 		"Description": "Package githubv4 is a client library for accessing GitHub GraphQL API v4 (https://developer.github.com/v4/)."
// 	}
// }

As you can see, it works, and produces the same results. Unlike the case above, the struct is generated dynamically, so it's possible to add arbitrary repositories to query at runtime.

It's important to note I used the word "possible" at the beginning. The syntax for using reflect is more cumbersome compared to declaring a Go type using normal Go code, and this is not necessarily the final solution. But it's a good step forward. I have some ideas for how to wrap this same functionality in a nicer API, to be explored later.

I've prototyped this approach in a real codebase where I wanted to perform GraphQL query batching, and it seems to work well. See shurcooL/notifications@9264031.

osela · 2018-11-14T20:46:26Z

@dmitshur That's a nice approach, and it allows great flexibility. I don't know what is the nicer API you had in mind, but I'm wondering whether the common use case of query batching (rather than fully dynamic queries) doesn't deserve it's own API.

The major drawback I see in this approach (apart from the cumbersome use of reflect) is that it doesn't allow reuse of simpler queries structs.

Ideally, I would like to take a simple struct that I already use for querying a single repo

type q struct {
	Repository struct {
		Description string
	} `graphql:"repository(owner: $owner, name: $name)"`
}

and use it in a batch query, in a way that closely resembles the single query API. Something like

var batch []*q
variables := []map[string]interface{}{
	{
		"owner": "golang",
		"name":  "go",
	},
	{
		"owner": "shurcooL",
		"name":  "githubv4",
	},
}
err := client.BatchQuery(context.Background(), &batch, variables)

The implementation would probably involve some ugly regex work to rename the arguments and the aliases so they are unique, but it's mostly hidden from the user. I'm still thinking about the best way to handle errors though.

What do you think?

cheshire137 · 2019-08-19T19:06:18Z

Unlike the case above, the struct is generated dynamically, so it's possible to add arbitrary repositories to query at runtime.

Can you have a dynamic alias for each query, though? Your example hard-codes the tag like Tag: graphql:"go: repository(owner: "golang", name: "go")"``, and I notice I get an error if I try to generate a tag:

paultyng · 2019-08-19T19:50:51Z

@cheshire137 I think in that case you'd need to just cast to reflect.StructTag, ie: reflect.StructTag(fmt.Sprintf(...))

dmitshur · 2019-08-20T02:53:45Z

@cheshire137 Yes, that should work if you use the suggestion @paultyng posted above. Please feel free to let me know if there's more to it.

micimize · 2021-03-16T15:59:00Z

Another solution might be to add a new different method that lets you pass a GraphQL query as a string (which you still have to construct yourself), and returns results as JSON you have to parse yourself, or already parsed into a map[string]interface{}. But this would be a very different API to make queries, so I see it as a last resort, if no better solution can be found.

This still seems like a decent solution to me, and IMO it is not an API minus to expose the internal "untyped" layer.

The "map with tags as fields" approach is very pretty – looks difficult to make work, but it could replace
helloworld: repository(owner: "octocat", name: "Hello-World") with "helloworld" in the end result.

rhcarvalho · 2021-04-05T10:52:59Z

I tried out the idea of using reflect to build dynamic queries from #17 (comment) for mutations.

Documenting the experience here for posteriority & sharing. TL;DR: don't do it.

I wanted to add several issues to a project at once. With the code below the request is well formed, but if the batch is too large the request times out with some mutations applied, some not.

I also noticed an inconsistency in that issues will be associated with a project (as seen when looking at the issue page on GitHub), but then the search API misses that association. It was not a case of eventual consistency, data was still inconsistent after 12+h.

Code

	type AddProjectCard struct {
		CardEdge struct {
			Node struct {
				URL githubv4.URI
			}
		}
	}
	var fields []reflect.StructField
	vars := make(map[string]interface{})
	for i, contentID := range contentIDs {
		fields = append(fields, reflect.StructField{
			Name: fmt.Sprintf("AddProjectCard%d", i),
			Type: reflect.TypeOf(AddProjectCard{}),
			Tag:  reflect.StructTag(fmt.Sprintf(`graphql:"addProjectCard%d:addProjectCard(input:$input%[1]d)"`, i)),
		})
		vars[fmt.Sprintf("input%d", i)] = githubv4.AddProjectCardInput{
			ProjectColumnID: projectColumnID,
			ContentID:       githubv4.NewID(contentID),
		}
	}

	// Work around githubv4.Client.Mutate requiring a variable named "input".
	fields[0].Tag = reflect.StructTag(strings.Replace(string(fields[0].Tag), "$input0", "$input", 1))
	vars["input"] = vars["input0"]
	delete(vars, "input0")

	m := reflect.New(reflect.StructOf(fields)).Elem()
	err := pm.client.Mutate(context.Background(), m.Addr().Interface(), vars["input"], vars)

Looking for documentation around batching mutations, I found https://docs.github.com/en/graphql/overview/resource-limitations which talks about the cost of queries, but not about the cost of mutations. I couldn't find anywhere in the GitHub GraphQL API docs that we should always do only one mutation per request, but that seems to be the safest approach. In https://docs.github.com/en/rest/guides/best-practices-for-integrators#dealing-with-abuse-rate-limits, for the REST API, there's a guideline:

If you're making a large number of POST, PATCH, PUT, or DELETE requests for a single user or client ID, wait at least one second between each request.

That's the closest to "take it slow" I could find. So a time.Ticker to limit throughput and a single mutation per request it is.

patrickdevivo · 2022-04-18T15:03:14Z

I've been encountering this issue as well (a need for "dynamic" queries where fields are only known at runtime). I recently started this project as a way to help address it, as an alternative to the reflect-based approach. It's still an early project but wanted to share it here in case folks find it useful.

dmitshur added enhancement thinking labels Jul 20, 2017

dmitshur changed the title ~~Dynamically add multiple aliases to a query~~ Dynamic queries. Jul 20, 2017

dmitshur added the API decision label Mar 1, 2018

dmitshur changed the title ~~Dynamic queries.~~ Query batching and/or dynamic queries. Mar 1, 2018

dmitshur mentioned this issue Nov 11, 2018

githubapi: Optimize List performance when there are many notifications. shurcooL/notifications#4

Merged

dmitshur mentioned this issue Jun 16, 2019

Help: Can you dynamically construct a graphql query? shurcooL/graphql#43

Closed

StevenACoffman mentioned this issue Sep 2, 2021

Support Apollo-style batching Khan/genqlient#46

Open

jmguzik mentioned this issue Dec 5, 2021

Optimize APIv4 rate limit usage in blunderbuss plugin kubernetes/test-infra#24532

Closed

jmguzik mentioned this issue Dec 13, 2021

Is it possible to dynamically include graphql fields? #95

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query batching and/or dynamic queries. #17

Query batching and/or dynamic queries. #17

metalmatze commented Jul 20, 2017 •

edited

Loading

dmitshur commented Jul 20, 2017

dmitshur commented Jul 31, 2017 •

edited

Loading

dmitshur commented Jul 31, 2017

dmitshur commented Sep 1, 2017 •

edited

Loading

dmitshur commented Mar 1, 2018 •

edited

Loading

osela commented Sep 3, 2018 •

edited

Loading

dmitshur commented Oct 28, 2018

dmitshur commented Nov 11, 2018

osela commented Nov 14, 2018

cheshire137 commented Aug 19, 2019

paultyng commented Aug 19, 2019

dmitshur commented Aug 20, 2019

micimize commented Mar 16, 2021

rhcarvalho commented Apr 5, 2021

patrickdevivo commented Apr 18, 2022

Query batching and/or dynamic queries. #17

Query batching and/or dynamic queries. #17

Comments

metalmatze commented Jul 20, 2017 • edited Loading

dmitshur commented Jul 20, 2017

dmitshur commented Jul 31, 2017 • edited Loading

dmitshur commented Jul 31, 2017

dmitshur commented Sep 1, 2017 • edited Loading

dmitshur commented Mar 1, 2018 • edited Loading

osela commented Sep 3, 2018 • edited Loading

dmitshur commented Oct 28, 2018

dmitshur commented Nov 11, 2018

osela commented Nov 14, 2018

cheshire137 commented Aug 19, 2019

paultyng commented Aug 19, 2019

dmitshur commented Aug 20, 2019

micimize commented Mar 16, 2021

rhcarvalho commented Apr 5, 2021

patrickdevivo commented Apr 18, 2022

metalmatze commented Jul 20, 2017 •

edited

Loading

dmitshur commented Jul 31, 2017 •

edited

Loading

dmitshur commented Sep 1, 2017 •

edited

Loading

dmitshur commented Mar 1, 2018 •

edited

Loading

osela commented Sep 3, 2018 •

edited

Loading