-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query batching and/or dynamic queries. #17
Comments
This is a valid use case that I'll need to think about how to best handle this. |
One idea I have (that I want to jot down) about how to potentially tackle this is to support So, your query could look something like this: var q = map[string]interface{}{
`repository(owner: "octocat", name: "Hello-World")`: struct {
Description githubql.String
}{},
`repository(owner: "justwatchcom", name: "gopass")`: struct {
ID githubql.ID
Name githubql.String
}{},
`repository(owner: "prometheus", name: "prometheus")`: struct {
ID githubql.ID
Name githubql.String
}{},
} However, it needs to be carefully thought out. It introduces mixing query information within types and values, which I've seen work poorly when I attempted it in the past. (Just wanted to write it down so I don't forget. But it's possible another solution will be better.) |
Another solution might be to add a new different method that lets you pass a GraphQL query as a string (which you still have to construct yourself), and returns results as JSON you have to parse yourself, or already parsed into a |
I wanted to post an update here. I also ran into this need recently at: I will continue to think about the best possible resolution to this, and post updates here if I have any. The above just means it'll be slightly easier for me to evaluate an idea, if I get a new one. |
There's been yet another place I would've found this handy: // fetchCommit fetches the specified commit.
func (s *service) fetchCommit(ctx context.Context, repoID int64, sha string) (*commit, error) {
// TODO: It'd be better to be able to batch and fetch all commits at once (in fetchEvents loop),
// rather than making an individual query for each.
// See https://github.com/shurcooL/githubql/issues/17.
// ...
err := s.clV4.Query(ctx, &q, variables) // Fetch a single commit.
// ...
} I'm starting to think that a good way of thinking about this issue might be as "query batching" rather than fully dynamic queries. The idea would be you provide a single query and a array of different variables, and you get a result for that query for each element in the variables array. This line of thinking might help arrive at a reasonable API that works for most usescases. |
@dmitshur Any progress on this? I'd love to help. I'm currently skipping this library entirely in these scenarios and using simple POST requests with string queries I construct myself. It seems to me like many scenarios could be represented as a field of type map[string]SomeType. So the question is about making it easy and clear to provide aliases and parameters? Or is it more fundamental? |
@osela There's no progress on this issue from me, because I haven't had any free time left over to think about this. If you're looking to solve this, I'd recommend prototyping a solution on your own branch and sharing your updates here. |
One of the advantages of using GraphQL is that it's possible to fetch all the required information in a single query. We were making many GraphQL queries, one for each notification. This can become very slow and inefficient when there are many notifications. Ideally, the entire List endpoint should be implemented with a single GraphQL query. However, it's not possible because GitHub GraphQL API v4 still doesn't offer access to notifications the way GitHub API v3 does. So, we do the best we can for now, and batch all GraphQL queries into a single query. Use top-level aliases to combine multiple queries into one. Use reflect.StructOf to construct the query struct type at runtime. This is functional, although perhaps there are opportunities to make it more user friendly in the graphql/githubv4 libraries. That will be investigated in the future. The performance of List endpoint when listing 145 GitHub notifications improves from 15~ seconds to 3~ seconds after this change. Updates shurcooL/githubv4#17.
I've made significant progress on this issue this weekend. It turns out it has been possible to perform query batching and/or dynamic queries all along, without any API changes to this package. Read on for details. Consider the following GraphQL query to fetch multiple GitHub repositories: {
go: repository(owner: "golang", name: "go") {
nameWithOwner
createdAt
description
}
graphql: repository(owner: "shurcooL", name: "githubv4") {
nameWithOwner
createdAt
description
}
} If executed against GitHub GraphQL API v4 It returns a JSON response like: GraphQL JSON Response
It's possible to perform that exact query using var q struct {
Go struct {
NameWithOwner string
CreatedAt time.Time
Description string
} `graphql:"go: repository(owner: \"golang\", name: \"go\")"`
GitHubV4 struct {
NameWithOwner string
CreatedAt time.Time
Description string
} `graphql:"graphql: repository(owner: \"shurcooL\", name: \"githubv4\")"`
}
err := client.Query(context.Background(), &q, nil)
if err != nil {
return err
}
enc := json.NewEncoder(os.Stdout)
enc.SetIndent("", "\t")
enc.Encode(q)
// Output:
// {
// "Go": {
// "NameWithOwner": "golang/go",
// "CreatedAt": "2014-08-19T04:33:40Z",
// "Description": "The Go programming language"
// },
// "GitHubV4": {
// "NameWithOwner": "shurcooL/githubv4",
// "CreatedAt": "2017-05-27T05:05:31Z",
// "Description": "Package githubv4 is a client library for accessing GitHub GraphQL API v4 (https://developer.github.com/v4/)."
// }
// } Of course, the list of repositories can only be adjusted at compile time, since it's a part of the query struct type. However, I got an idea: it's possible to use q := reflect.New(reflect.StructOf([]reflect.StructField{
{
Name: "Go", Type: reflect.TypeOf(struct {
NameWithOwner string
CreatedAt time.Time
Description string
}{}), Tag: `graphql:"go: repository(owner: \"golang\", name: \"go\")"`,
},
{
Name: "GitHubV4", Type: reflect.TypeOf(struct {
NameWithOwner string
CreatedAt time.Time
Description string
}{}), Tag: `graphql:"graphql: repository(owner: \"shurcooL\", name: \"githubv4\")"`,
},
})).Elem()
err := client.Query(context.Background(), q.Addr().Interface(), nil)
if err != nil {
return err
}
enc := json.NewEncoder(os.Stdout)
enc.SetIndent("", "\t")
enc.Encode(q.Interface())
// Output:
// {
// "Go": {
// "NameWithOwner": "golang/go",
// "CreatedAt": "2014-08-19T04:33:40Z",
// "Description": "The Go programming language"
// },
// "GitHubV4": {
// "NameWithOwner": "shurcooL/githubv4",
// "CreatedAt": "2017-05-27T05:05:31Z",
// "Description": "Package githubv4 is a client library for accessing GitHub GraphQL API v4 (https://developer.github.com/v4/)."
// }
// } As you can see, it works, and produces the same results. Unlike the case above, the struct is generated dynamically, so it's possible to add arbitrary repositories to query at runtime. It's important to note I used the word "possible" at the beginning. The syntax for using I've prototyped this approach in a real codebase where I wanted to perform GraphQL query batching, and it seems to work well. See shurcooL/notifications@9264031. |
@dmitshur That's a nice approach, and it allows great flexibility. I don't know what is the nicer API you had in mind, but I'm wondering whether the common use case of query batching (rather than fully dynamic queries) doesn't deserve it's own API. The major drawback I see in this approach (apart from the cumbersome use of Ideally, I would like to take a simple struct that I already use for querying a single repo type q struct {
Repository struct {
Description string
} `graphql:"repository(owner: $owner, name: $name)"`
} and use it in a batch query, in a way that closely resembles the single query API. Something like var batch []*q
variables := []map[string]interface{}{
{
"owner": "golang",
"name": "go",
},
{
"owner": "shurcooL",
"name": "githubv4",
},
}
err := client.BatchQuery(context.Background(), &batch, variables) The implementation would probably involve some ugly regex work to rename the arguments and the aliases so they are unique, but it's mostly hidden from the user. I'm still thinking about the best way to handle errors though. What do you think? |
Can you have a dynamic alias for each query, though? Your example hard-codes the tag like |
@cheshire137 I think in that case you'd need to just cast to |
@cheshire137 Yes, that should work if you use the suggestion @paultyng posted above. Please feel free to let me know if there's more to it. |
This still seems like a decent solution to me, and IMO it is not an API minus to expose the internal "untyped" layer. The "map with tags as fields" approach is very pretty – looks difficult to make work, but it could replace |
I tried out the idea of using Documenting the experience here for posteriority & sharing. TL;DR: don't do it. I wanted to add several issues to a project at once. With the code below the request is well formed, but if the batch is too large the request times out with some mutations applied, some not. I also noticed an inconsistency in that issues will be associated with a project (as seen when looking at the issue page on GitHub), but then the search API misses that association. It was not a case of eventual consistency, data was still inconsistent after 12+h. Code type AddProjectCard struct {
CardEdge struct {
Node struct {
URL githubv4.URI
}
}
}
var fields []reflect.StructField
vars := make(map[string]interface{})
for i, contentID := range contentIDs {
fields = append(fields, reflect.StructField{
Name: fmt.Sprintf("AddProjectCard%d", i),
Type: reflect.TypeOf(AddProjectCard{}),
Tag: reflect.StructTag(fmt.Sprintf(`graphql:"addProjectCard%d:addProjectCard(input:$input%[1]d)"`, i)),
})
vars[fmt.Sprintf("input%d", i)] = githubv4.AddProjectCardInput{
ProjectColumnID: projectColumnID,
ContentID: githubv4.NewID(contentID),
}
}
// Work around githubv4.Client.Mutate requiring a variable named "input".
fields[0].Tag = reflect.StructTag(strings.Replace(string(fields[0].Tag), "$input0", "$input", 1))
vars["input"] = vars["input0"]
delete(vars, "input0")
m := reflect.New(reflect.StructOf(fields)).Elem()
err := pm.client.Mutate(context.Background(), m.Addr().Interface(), vars["input"], vars) Looking for documentation around batching mutations, I found https://docs.github.com/en/graphql/overview/resource-limitations which talks about the cost of queries, but not about the cost of mutations. I couldn't find anywhere in the GitHub GraphQL API docs that we should always do only one mutation per request, but that seems to be the safest approach. In https://docs.github.com/en/rest/guides/best-practices-for-integrators#dealing-with-abuse-rate-limits, for the REST API, there's a guideline:
That's the closest to "take it slow" I could find. So a |
I've been encountering this issue as well (a need for "dynamic" queries where fields are only known at runtime). I recently started this project as a way to help address it, as an alternative to the reflect-based approach. It's still an early project but wanted to share it here in case folks find it useful. |
I want to query multiple repositories at the same time. But I don't want to write the query for a specific number of repositories, but rather create the query at runtime.
I currently don't see a way to do that.
As a work around I'm doing these queries one after another, but as it's one of the benefits of doing multiple resource queries at ones we should try to support this.
The text was updated successfully, but these errors were encountered: