Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query batching and/or dynamic queries. #17

Open
metalmatze opened this issue Jul 20, 2017 · 15 comments
Open

Query batching and/or dynamic queries. #17

metalmatze opened this issue Jul 20, 2017 · 15 comments

Comments

@metalmatze
Copy link

metalmatze commented Jul 20, 2017

I want to query multiple repositories at the same time. But I don't want to write the query for a specific number of repositories, but rather create the query at runtime.
I currently don't see a way to do that.

{
  justwatchcom_gopass: repository(owner: "justwatchcom", name: "gopass") {
    id
    name
  }
  prometheus_prometheus: repository(owner: "prometheus", name: "prometheus") {
    id
    name
  }
   ... a lot of other dynamic repositories added at runtime
}

As a work around I'm doing these queries one after another, but as it's one of the benefits of doing multiple resource queries at ones we should try to support this.

@dmitshur
Copy link
Member

This is a valid use case that githubql should support, but currently doesn't (at least I can't think of a way). Thanks for reporting.

I'll need to think about how to best handle this.

@dmitshur dmitshur changed the title Dynamically add multiple aliases to a query Dynamic queries. Jul 20, 2017
@dmitshur
Copy link
Member

dmitshur commented Jul 31, 2017

One idea I have (that I want to jot down) about how to potentially tackle this is to support map types for dynamic queries such as this one.

So, your query could look something like this:

var q = map[string]interface{}{
	`repository(owner: "octocat", name: "Hello-World")`: struct {
		Description githubql.String
	}{},
	`repository(owner: "justwatchcom", name: "gopass")`: struct {
		ID   githubql.ID
		Name githubql.String
	}{},
	`repository(owner: "prometheus", name: "prometheus")`: struct {
		ID   githubql.ID
		Name githubql.String
	}{},
}

However, it needs to be carefully thought out. It introduces mixing query information within types and values, which I've seen work poorly when I attempted it in the past.

(Just wanted to write it down so I don't forget. But it's possible another solution will be better.)

@dmitshur
Copy link
Member

Another solution might be to add a new different method that lets you pass a GraphQL query as a string (which you still have to construct yourself), and returns results as JSON you have to parse yourself, or already parsed into a map[string]interface{}. But this would be a very different API to make queries, so I see it as a last resort, if no better solution can be found.

@dmitshur
Copy link
Member

dmitshur commented Sep 1, 2017

I wanted to post an update here. I also ran into this need recently at:

https://github.com/shurcooL/notifications/blob/b2920e64fbc3c388d5191433dd5493c75f302c14/githubapi/githubapi.go#L92-L96

I will continue to think about the best possible resolution to this, and post updates here if I have any. The above just means it'll be slightly easier for me to evaluate an idea, if I get a new one.

@dmitshur
Copy link
Member

dmitshur commented Mar 1, 2018

There's been yet another place I would've found this handy:

// fetchCommit fetches the specified commit.
func (s *service) fetchCommit(ctx context.Context, repoID int64, sha string) (*commit, error) {
	// TODO: It'd be better to be able to batch and fetch all commits at once (in fetchEvents loop),
	//       rather than making an individual query for each.
	//       See https://github.com/shurcooL/githubql/issues/17.

	// ...
	err := s.clV4.Query(ctx, &q, variables) // Fetch a single commit.
	// ...
}

I'm starting to think that a good way of thinking about this issue might be as "query batching" rather than fully dynamic queries. The idea would be you provide a single query and a array of different variables, and you get a result for that query for each element in the variables array.

This line of thinking might help arrive at a reasonable API that works for most usescases.

@dmitshur dmitshur changed the title Dynamic queries. Query batching and/or dynamic queries. Mar 1, 2018
@osela
Copy link

osela commented Sep 3, 2018

@dmitshur Any progress on this? I'd love to help. I'm currently skipping this library entirely in these scenarios and using simple POST requests with string queries I construct myself.

It seems to me like many scenarios could be represented as a field of type map[string]SomeType. So the question is about making it easy and clear to provide aliases and parameters? Or is it more fundamental?

@dmitshur
Copy link
Member

@osela There's no progress on this issue from me, because I haven't had any free time left over to think about this. If you're looking to solve this, I'd recommend prototyping a solution on your own branch and sharing your updates here.

dmitshur added a commit to shurcooL/notifications that referenced this issue Nov 11, 2018
One of the advantages of using GraphQL is that it's possible to fetch
all the required information in a single query. We were making many
GraphQL queries, one for each notification. This can become very slow
and inefficient when there are many notifications.

Ideally, the entire List endpoint should be implemented with a single
GraphQL query. However, it's not possible because GitHub GraphQL API v4
still doesn't offer access to notifications the way GitHub API v3 does.

So, we do the best we can for now, and batch all GraphQL queries into
a single query. Use top-level aliases to combine multiple queries into
one. Use reflect.StructOf to construct the query struct type at runtime.
This is functional, although perhaps there are opportunities to make it
more user friendly in the graphql/githubv4 libraries. That will be
investigated in the future.

The performance of List endpoint when listing 145 GitHub notifications
improves from 15~ seconds to 3~ seconds after this change.

Updates shurcooL/githubv4#17.
@dmitshur
Copy link
Member

I've made significant progress on this issue this weekend. It turns out it has been possible to perform query batching and/or dynamic queries all along, without any API changes to this package. Read on for details.

Consider the following GraphQL query to fetch multiple GitHub repositories:

{
  go: repository(owner: "golang", name: "go") {
    nameWithOwner
    createdAt
    description
  }
  graphql: repository(owner: "shurcooL", name: "githubv4") {
    nameWithOwner
    createdAt
    description
  }
}

If executed against GitHub GraphQL API v4 It returns a JSON response like:

GraphQL JSON Response
{
  "data": {
    "go": {
      "nameWithOwner": "golang/go",
      "createdAt": "2014-08-19T04:33:40Z",
      "description": "The Go programming language"
    },
    "graphql": {
      "nameWithOwner": "shurcooL/githubv4",
      "createdAt": "2017-05-27T05:05:31Z",
      "description": "Package githubv4 is a client library for accessing GitHub GraphQL API v4 (https://developer.github.com/v4/)."
    }
  }
}

It's possible to perform that exact query using githubv4 package like so:

var q struct {
	Go struct {
		NameWithOwner string
		CreatedAt     time.Time
		Description   string
	} `graphql:"go: repository(owner: \"golang\", name: \"go\")"`
	GitHubV4 struct {
		NameWithOwner string
		CreatedAt     time.Time
		Description   string
	} `graphql:"graphql: repository(owner: \"shurcooL\", name: \"githubv4\")"`
}
err := client.Query(context.Background(), &q, nil)
if err != nil {
	return err
}

enc := json.NewEncoder(os.Stdout)
enc.SetIndent("", "\t")
enc.Encode(q)

// Output:
// {
// 	"Go": {
// 		"NameWithOwner": "golang/go",
// 		"CreatedAt": "2014-08-19T04:33:40Z",
// 		"Description": "The Go programming language"
// 	},
// 	"GitHubV4": {
// 		"NameWithOwner": "shurcooL/githubv4",
// 		"CreatedAt": "2017-05-27T05:05:31Z",
// 		"Description": "Package githubv4 is a client library for accessing GitHub GraphQL API v4 (https://developer.github.com/v4/)."
// 	}
// }

Of course, the list of repositories can only be adjusted at compile time, since it's a part of the query struct type.

However, I got an idea: it's possible to use reflect package and its reflect.StructOf function to dynamically construct a query struct. For example, the same query as above, created at runtime:

q := reflect.New(reflect.StructOf([]reflect.StructField{
	{
		Name: "Go", Type: reflect.TypeOf(struct {
			NameWithOwner string
			CreatedAt     time.Time
			Description   string
		}{}), Tag: `graphql:"go: repository(owner: \"golang\", name: \"go\")"`,
	},
	{
		Name: "GitHubV4", Type: reflect.TypeOf(struct {
			NameWithOwner string
			CreatedAt     time.Time
			Description   string
		}{}), Tag: `graphql:"graphql: repository(owner: \"shurcooL\", name: \"githubv4\")"`,
	},
})).Elem()
err := client.Query(context.Background(), q.Addr().Interface(), nil)
if err != nil {
	return err
}

enc := json.NewEncoder(os.Stdout)
enc.SetIndent("", "\t")
enc.Encode(q.Interface())

// Output:
// {
// 	"Go": {
// 		"NameWithOwner": "golang/go",
// 		"CreatedAt": "2014-08-19T04:33:40Z",
// 		"Description": "The Go programming language"
// 	},
// 	"GitHubV4": {
// 		"NameWithOwner": "shurcooL/githubv4",
// 		"CreatedAt": "2017-05-27T05:05:31Z",
// 		"Description": "Package githubv4 is a client library for accessing GitHub GraphQL API v4 (https://developer.github.com/v4/)."
// 	}
// }

As you can see, it works, and produces the same results. Unlike the case above, the struct is generated dynamically, so it's possible to add arbitrary repositories to query at runtime.

It's important to note I used the word "possible" at the beginning. The syntax for using reflect is more cumbersome compared to declaring a Go type using normal Go code, and this is not necessarily the final solution. But it's a good step forward. I have some ideas for how to wrap this same functionality in a nicer API, to be explored later.

I've prototyped this approach in a real codebase where I wanted to perform GraphQL query batching, and it seems to work well. See shurcooL/notifications@9264031.

@osela
Copy link

osela commented Nov 14, 2018

@dmitshur That's a nice approach, and it allows great flexibility. I don't know what is the nicer API you had in mind, but I'm wondering whether the common use case of query batching (rather than fully dynamic queries) doesn't deserve it's own API.

The major drawback I see in this approach (apart from the cumbersome use of reflect) is that it doesn't allow reuse of simpler queries structs.

Ideally, I would like to take a simple struct that I already use for querying a single repo

type q struct {
	Repository struct {
		Description string
	} `graphql:"repository(owner: $owner, name: $name)"`
}

and use it in a batch query, in a way that closely resembles the single query API. Something like

var batch []*q
variables := []map[string]interface{}{
	{
		"owner": "golang",
		"name":  "go",
	},
	{
		"owner": "shurcooL",
		"name":  "githubv4",
	},
}
err := client.BatchQuery(context.Background(), &batch, variables)

The implementation would probably involve some ugly regex work to rename the arguments and the aliases so they are unique, but it's mostly hidden from the user. I'm still thinking about the best way to handle errors though.

What do you think?

@cheshire137
Copy link

Unlike the case above, the struct is generated dynamically, so it's possible to add arbitrary repositories to query at runtime.

Can you have a dynamic alias for each query, though? Your example hard-codes the tag like Tag: graphql:"go: repository(owner: "golang", name: "go")"``, and I notice I get an error if I try to generate a tag:

image

@paultyng
Copy link

@cheshire137 I think in that case you'd need to just cast to reflect.StructTag, ie: reflect.StructTag(fmt.Sprintf(...))

@dmitshur
Copy link
Member

@cheshire137 Yes, that should work if you use the suggestion @paultyng posted above. Please feel free to let me know if there's more to it.

@micimize
Copy link

Another solution might be to add a new different method that lets you pass a GraphQL query as a string (which you still have to construct yourself), and returns results as JSON you have to parse yourself, or already parsed into a map[string]interface{}. But this would be a very different API to make queries, so I see it as a last resort, if no better solution can be found.

This still seems like a decent solution to me, and IMO it is not an API minus to expose the internal "untyped" layer.

The "map with tags as fields" approach is very pretty – looks difficult to make work, but it could replace
helloworld: repository(owner: "octocat", name: "Hello-World") with "helloworld" in the end result.

@rhcarvalho
Copy link

I tried out the idea of using reflect to build dynamic queries from #17 (comment) for mutations.

Documenting the experience here for posteriority & sharing. TL;DR: don't do it.


I wanted to add several issues to a project at once. With the code below the request is well formed, but if the batch is too large the request times out with some mutations applied, some not.

I also noticed an inconsistency in that issues will be associated with a project (as seen when looking at the issue page on GitHub), but then the search API misses that association. It was not a case of eventual consistency, data was still inconsistent after 12+h.

Code
	type AddProjectCard struct {
		CardEdge struct {
			Node struct {
				URL githubv4.URI
			}
		}
	}
	var fields []reflect.StructField
	vars := make(map[string]interface{})
	for i, contentID := range contentIDs {
		fields = append(fields, reflect.StructField{
			Name: fmt.Sprintf("AddProjectCard%d", i),
			Type: reflect.TypeOf(AddProjectCard{}),
			Tag:  reflect.StructTag(fmt.Sprintf(`graphql:"addProjectCard%d:addProjectCard(input:$input%[1]d)"`, i)),
		})
		vars[fmt.Sprintf("input%d", i)] = githubv4.AddProjectCardInput{
			ProjectColumnID: projectColumnID,
			ContentID:       githubv4.NewID(contentID),
		}
	}

	// Work around githubv4.Client.Mutate requiring a variable named "input".
	fields[0].Tag = reflect.StructTag(strings.Replace(string(fields[0].Tag), "$input0", "$input", 1))
	vars["input"] = vars["input0"]
	delete(vars, "input0")

	m := reflect.New(reflect.StructOf(fields)).Elem()
	err := pm.client.Mutate(context.Background(), m.Addr().Interface(), vars["input"], vars)

Looking for documentation around batching mutations, I found https://docs.github.com/en/graphql/overview/resource-limitations which talks about the cost of queries, but not about the cost of mutations. I couldn't find anywhere in the GitHub GraphQL API docs that we should always do only one mutation per request, but that seems to be the safest approach. In https://docs.github.com/en/rest/guides/best-practices-for-integrators#dealing-with-abuse-rate-limits, for the REST API, there's a guideline:

If you're making a large number of POST, PATCH, PUT, or DELETE requests for a single user or client ID, wait at least one second between each request.

That's the closest to "take it slow" I could find. So a time.Ticker to limit throughput and a single mutation per request it is.

@patrickdevivo
Copy link

I've been encountering this issue as well (a need for "dynamic" queries where fields are only known at runtime). I recently started this project as a way to help address it, as an alternative to the reflect-based approach. It's still an early project but wanted to share it here in case folks find it useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

9 participants