Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit introspection capability programatically #113

Open
KyleAMathews opened this issue Aug 8, 2015 · 43 comments

Comments

Projects
None yet
@KyleAMathews
Copy link

commented Aug 8, 2015

Many of my types and many fields on some of my types shouldn't be exposed to some subsets of my users. Is there a way to limit visibility on introspection programmatically?

@rdooh

This comment has been minimized.

Copy link

commented Aug 8, 2015

I have the same general question - how should one introduce authorization (different fields available to different users/roles)? I admit haven't gone through the spec with this in mind though, so I don't know if there's some consideration there. If you have just a few well-defined user-role topologies, then I could imagine actually having multiple type systems, and swapping based on who is making the request. Feels awkward and hacky though. Probably would need to extend the schema to have some sort of authorization function.

@rmosolgo

This comment has been minimized.

Copy link

commented Aug 9, 2015

Here's my understanding of it: http://rmosolgo.github.io/blog/2015/08/04/authorization-in-graphql/

That's tough for introspection though, since it's mostly built-in!

@devknoll

This comment has been minimized.

Copy link

commented Aug 9, 2015

@rmosolgo's approach is how I am securing particular queries/mutations/fields. In our case, we don't care that we might expose some authorized user only fields or mutations via introspection. Even if you can see it, you can't run it, so it doesn't really matter.

If I did want to hide that information, I would probably just wrap the object/interface constructors, add some metadata to each field, and just generate multiple schemas per user or per authentication level as necessary. I'm not really convinced this is an issue that the library or spec should try to solve though.

That being said, I wouldn't be opposed to some hooks in the GraphQLSchema to make something like this a little easier, like being able to preprocess types, instead of wrapping them manually.

@rdooh

This comment has been minimized.

Copy link

commented Aug 9, 2015

Point taken re: scope of library/spec. Might try dynamically generating the type schema.

@KyleAMathews

This comment has been minimized.

Copy link
Author

commented Aug 9, 2015

I don't see why this wouldn't be something the library handles. Limiting introspection will be a very common need. Also I'll be using scopes for authentication so could have literally 100s of thousands of possible schema combinations.

I really like how Hapi.js handles this. You can set a string or array of strings of scopes on routes that are matched against the user object. Very easy to setup and maintain.

@rdooh

This comment has been minimized.

Copy link

commented Aug 9, 2015

@KyleAMathews No argument that this is a feature that a full server would need to implement (and that I would love to have in my hands right now). My rough understanding of this project though is to keep this as agnostic as possible, focusing on core GraphQL concepts while letting the community figure out some different ways to implement. The approach you suggest sounds very reasonable, but I'm thinking that deciding on a single methodology for handling authentication at the core might be quite the can of worms. Encouraging growth of an ecosystem without locking in too much is the stage we're at, I think.

Anyway, as to the scale of possible schema combinations you indicate, and assuming there's currently no baked-in solution on the near-horizon, do you think that full introspection of dynamically-generated type schemas could work in lieu of dynamically-limited introspection of a full type schema? I mean, if you could config it similar to Hapi routes. There's probably good opportunity here to write some preprocessors...

@KyleAMathews

This comment has been minimized.

Copy link
Author

commented Aug 10, 2015

I'd dismissed the idea of dynamically generated schemas as my gut said that'd be expensive to do on every query but after thinking about it a bit more, that should actually work just fine—by far most of the work is in actually executing the query. Stringing together a few objects to create a schema should be very quick.

I noticed your repo @rdooh https://github.com/rdooh/graphql-gateway where you're working on something like this. Will keep an eye on it :)

This isn't something I need to work on right now—just using GraphQL for internal purposes—but will need to move on something in a month or two.

Perhaps I could just add the scope array onto types/fields as I suggested and the preprocessor would take a schema + scope(s) and then "prune" off types and fields that aren't supported. Would that work @leebyron?

@rmosolgo

This comment has been minimized.

Copy link

commented Aug 10, 2015

Another possibility could be serving different schemas to different people, eg

var schema = user.isAdmin ? fullSchema : limitedSchema 

Depends on how fine-grained the permission levels are though, that would become ridiculous pretty fast!

@tristanz

This comment has been minimized.

Copy link

commented Aug 10, 2015

I believe FB just returns null for fields that you do not have permission to. Having multiple type systems with fine grained permissions will make code generation, relay, type checking, etc, from GraphQL schemas much harder.

@KyleAMathews why do you need different schemas for 100s of thousands of possible schema combinations? Is it really bad for users to see the full schema?

If you need secret admin only fields for internal use, I'd suggest exposing a different endpoint along the lines of what @rmosolgo suggests. But this seems like a much more limited use case.

@KyleAMathews

This comment has been minimized.

Copy link
Author

commented Aug 10, 2015

@tristanz I'll be letting people define custom access tokens like

so the number of combinations grows very fast.

It might not be that bad to expose the full schema in some cases but for general cleanliness reasons + also I don't want to have to think every time I add a new field if this is potentially exposing sensitive information to customers or to the public at large (some types will be public)

@tristanz

This comment has been minimized.

Copy link

commented Aug 10, 2015

Right, but this shouldn't be a schema concern. For code generation, relay, and type checking to work, the schema can't rely on runtime attributes generating new schemas. The schema should just allow nulls. You can then have a middleware layer that uses these tokens to null specific fields and/or short-circuit the resolve logic.

Something like:

resolve: hasPermissions(['user_groups'], resolveFunc)

where hasPermissions returns a resolve function that nulls or short-circuits your real resolveFunc.

@rdooh

This comment has been minimized.

Copy link

commented Aug 10, 2015

@tristanz, probably there are many cases where allowing users to see the full schema isn't a problem, even if their access will be limited to a subset upon making requests for queries/mutations. That said, sometimes projects have security requirements that are dictated by other stakeholders, and this would reduce the number of battles to fight, even if there is technically no security hole. It's probably not a core concern for this library, but I think there will be use cases for limiting introspection.

(Understand that I'm speculating here, based on imagining trying to rebuild some past projects.)

Dynamic Anyway???
If we ignore the introspection question for a moment, and just think about implementing authorization in general, it seems to me there are roughly two cases: 1) content filtering (e.g. comment list based on owner) and 2) structural filtering of terminal leaf-fields or even whole branches of the schema tree. Both could be handled by arbitrary code inside the resolve function. In fact, the first one probably has to be handled there. But the second one could be handled up front by pruning the full schema tree(s) down to some subspace. There would be some cost to this of course, but I think that a filter function could operate pretty quickly. In same cases things might balance out anyway - an over-reaching query/mutation request against the pruned schema would be rejected outright instead of running through the various (valid) field resolvers and being rejected at some particular point. One can imagine that if you have a request with a lot of sub-actions that can be run in parallel, you might want to avoid kicking them all off if you can catch it earlier. Particularly if there is the potential of having to roll back side effects of mutations.

Anyway, I'm starting to think that dynamic schema generation might make sense in some cases anyhow (provided the cost is low) - in which case the limited introspection question becomes a convenient side effect. Balance of pros/cons will depend on the application of course.

Note
Another potential (small) win for limiting introspection by pruning might be that you can automate testing required data structures for a particular user class against the schema - monitoring whether proposed changes to permissions are going to impact particular use cases.

@rdooh

This comment has been minimized.

Copy link

commented Aug 10, 2015

Might be going out of scope here - maybe limiting introspection etc is outside of Graph-JS's core mandate, but it's a problem worth addressing elsewhere...

@KyleAMathews

This comment has been minimized.

Copy link
Author

commented Aug 10, 2015

@tristanz a concrete example of why securing introspection is needed.

Say you're releasing a top-secret new feature. You need to expose new fields and types to support product dev and live testing by a few beta customers. If you can't limit introspection then an aspiring reporter or one of your competitors could easily poke around your GraphQL instance and see what's happening.

Security for data matters a ton obviously but what we're saying is security for metadata matters just as much sometimes.

@KyleAMathews

This comment has been minimized.

Copy link
Author

commented Aug 10, 2015

@tristanz btw, sense.io looks awesome :)

@leebyron

This comment has been minimized.

Copy link
Collaborator

commented Aug 10, 2015

This is interesting discussion, thanks for raising this!

At Facebook our schema is static. More specifically we have two schema: an internal and an external schema. The internal schema includes prototype features we don't want to leak as well as site admin tools that we would rather not expose to our mobile clients.

A static schema is important as almost all of our client side tools assume the same schema at development time and runtime. That means your iOS or Relay engineer should see the same schema your users see. This gives us the ability to do code generation for model objects and fast parsers, and validate that queries written on the client are valid before the app is even run, ensuring quality.

@leebyron

This comment has been minimized.

Copy link
Collaborator

commented Aug 10, 2015

Facebook is full of access control mechanisms though. There are user level permissions - like if you can admin a group, there are per content level permissions like privacy rules - and there are site wide limits like access to new features that roll out slowly.

For these, we ensure fields are nullable and return null when access to certain data is not allowed for whatever reason.

This is also nice as it does not differentiate between "there is no data" and "you cannot see this data" - which itself could be a security hole.

@tristanz

This comment has been minimized.

Copy link

commented Aug 10, 2015

@KyleAMathews Absolutely. In that case I think the solution is multiple endpoints. Have a flag for your new secret release and return a different schema if that flag is present. This should not be the recommended approach for general authorization issues though, since it breaks higher level tools that need to point at a single schema.

re Sense: thanks :)

@leebyron

This comment has been minimized.

Copy link
Collaborator

commented Aug 10, 2015

One way we might be able to resolve the core concerns here is to enable or disable introspection on a per-request basis.

The audience for introspection are dev tool builders and users. If you want to limit who can see this metadata to only your developers, that seems like a reasonable thing to want to do.

@rdooh

This comment has been minimized.

Copy link

commented Aug 10, 2015

Thanks @leebyron and @tristanz for the food for thought.

What I'm hearing then about the vision for how a schema grows is that this is usually an append-only situation, where some branches and leaves may be dead/null for some or all users under varying conditions. Depending on the underlying reason, there may be metadata provided about context (e.g. deprecation), or not (ambiguity re: unauthorized vs not available). I'm still trying to figure out if, for the second case, complete omission by user class/context will play nicely.

@leebyron I see the nullable point re: empty vs unauthorized data... wouldn't "I don't/won't even recognize your request" be even more ambiguous in some respects?

I can also see that receiving a partial result (populated with strategic nulls) would be a more graceful way to handle run-time requests that overreach authorizations, rather than all-out rejection by a pruned schema. On the other hand, introspection of a pruned projection of the schema for a particular user class during dev seems like a workable way to detect overreaching queries early... an introspection check against both the global schema and a particular projection would identify whether a given broken query is globally invalid, or just invalid for the current user context. I don't use React (yet), so I'm probably missing a good chunk of the mental framework here, but if I wanted to write a client that can shape itself according to a users context, I might want to do some of this based on the shape of the available schema projection (as I'm apparently calling it) rather than a fully shaped data response with strategic but potentially ambiguous nulls.

Maybe what I'm trying to understand is whether you just consider dynamic schema generation impractical given your product and your development and iteration models, or if you also consider the impact on security to be either null or negative.

I'm probably overthinking it.

@leebyron

This comment has been minimized.

Copy link
Collaborator

commented Aug 11, 2015

wouldn't "I don't/won't even recognize your request" be even more ambiguous in some respects?

Yeah, this is why we shy away from this strategy. Since GraphQL often requests lots of fields across many types, we want to ensure that a permissions issue by any particular end-user could result in the query failing to validate and thus not execute at all.

On the other hand, introspection of a pruned projection of the schema for a particular user class during dev seems like a workable way to detect overreaching queries early... an introspection check against both the global schema and a particular projection would identify whether a given broken query is globally invalid, or just invalid for the current user context.

This can become a huge burden if your permissions system becomes at all complex. You have a 2^N problem where any pair of permissions results in double the amount of validation to perform. On large codebases this could quickly enter the realm of the untenable.

Another source of permissions complexity which causes this approach to fall short is conditional permission. So it's not that a certain type of user cannot access a particular field at all, but they cannot access that particular field only under certain circumstances - like if that object has some flag set or not. Then the field-level permission is too coarse to apply and you again fall back to data-level permissions.

Maybe what I'm trying to understand is whether you just consider dynamic schema generation impractical given your product and your development and iteration models, or if you also consider the impact on security to be either null or negative.

Both of these things.

Dynamic schema generation causes us to either super-exponentially increase the time required to do static build-time query validation, or removes the ability to do so at all, and dramatically reduces the value of tools which rely on a static schema like a developer IDE environment that can provide smart typeahead and inline error detection.

For security, I think the impact is negative as field-level permissions are less powerful than data-level permissions. In fact data-level permissions can accomplish everything that a field-level permission can do. For instance, in the resolve function, you could just insert:

  if (userFailsCheck(context.user)) {
    return null;
  }

Which should be operationally the same as that field being dynamically omitted from the schema.

Now - one point brought up here that I think is worth addressing is to protect against someone using introspection to crawl your application's schema and learning about features not yet exposed in your public product.

I think it is compelling to dynamically enable or disable introspection on a per-request basis, so that you could use request authentication information to determine this. So, for example, you could only allow introspection by admins of your application (e.g. your fellow employees), as these are the people you expect to actually derive value from this feature.

@rdooh

This comment has been minimized.

Copy link

commented Aug 12, 2015

Thanks for taking the time to pull me out of the rabbit hole @leebyron. In addition to your many practical points, dynamic generation would also wind up duplicating some existing permissions work for underlying resources (if considering retrofitting existing brownfield projects), introducing yet another maintenance burden/minefield. I think there's some parallel unresolved discussion happening around Swagger.

In any case, I'm a re-convert to the single static schema, deferring data security concerns to resolve functions or underlying resources. Having the option to effectively make introspection for developers-only and prevent snooping is simple and to the point.

@KyleAMathews

This comment has been minimized.

Copy link
Author

commented Aug 12, 2015

Enabling/disabling introspection seems like a pragmatic next-step.

The scenario where I see this not working is if you want to open up your GraphQL endpoint for users to build on similar to how many applications expose/document their REST api. In that case, less-privileged users would need to be able introspect but still not see experimental/beta GraphQL "APIs".

@rdooh

This comment has been minimized.

Copy link

commented Aug 12, 2015

I guess it depends on whether this is framed as a question of differing internal vs external needs, or different classes of external needs. If you're supporting public introspection, but need your private/internal developer one to be different/enhanced, then I think you'd handle it as a work-around by maintaining two static schemas, as FB apparently does (maintainable as a secondary branch perhaps?). Assuming it is just two-tiered, and that both need to be live/in production, then I suppose you're also looking at two endpoints, as suggested earlier. This is workable for my own foreseeable needs, but if you start needing multiple levels on the public side, then it could get crazy again.

The point was made that ensuring (or assuming) that developers are seeing the same schema as end users keeps things from getting overly-complicated, enables better tooling, etc. But given that FB does in fact maintain two versions, I'm wondering how prototype tracking and release to from internal to external would best be managed. Having some sort of 'prototype' meta on an internal version might have similar benefits as 'deprecated' meta, allowing tooling to help identify things that are valid internally, but not externally. Here, I'm imagining that this could be a dynamic control on the schema that could even be imposed by environment settings, rather than authorization/security rules in the business domain.

So, maybe supporting this additional level of resolution between developer and public schemas could be entertained without getting into application-level security concerns.

@leebyron

This comment has been minimized.

Copy link
Collaborator

commented Aug 12, 2015

The scenario where I see this not working is if you want to open up your GraphQL endpoint for users to build on similar to how many applications expose/document their REST api. In that case, less-privileged users would need to be able introspect but still not see experimental/beta GraphQL "APIs".

Most REST API that I've encountered use data-level permissions rather than field-level permissions. That is to say, when I visit a REST API documentation (the rough equivalent of introspection) they list out all of the available resources and parameters, but usually document which of these require a special authentication permission in order to access.

The translation of this to an equivalent GraphQL API would be an introspection (data-driven documentation) which presents all capabilities, but also describes which require special authentication permissions in order to access. This allows you to still have a single static schema, enabling usage of client tools.


If you find yourself truly in the circumstance where you are building a GraphQL server not for internal product usage, but exposed as a public API surface - and also need to actually limit visibility of certain portions of that schema based on access permissions, then I think the most tenable option you have is to develop multiple schema, each of which is used in the correct case.

I would advise against this if at all possible though, since each schema needs test coverage and by developing multiple schema or some type of dynamic schema, you're dramatically expanding the area of your software that requires test.

I believe that this is going to be a pretty rare case. GraphQL is really designed for end-to-end usage where you control both the server and the client - that is it's designed for internal use for building products. GraphQL is also pretty well suited for external usage as a public facing API, despite our having not invested much time exploring it at Facebook. However most public facing API do not contain secret features in the same way that internal products do, or if they do they typically have a much lower resolution - such as Facebook's two schema: The schema that our iOS/Android apps are exposed to, and the schema that we can access from our internal network to perform administration tasks.

@KyleAMathews

This comment has been minimized.

Copy link
Author

commented Aug 12, 2015

Great stuff @leebyron!

I think two schemas, public and internal should cover everything I'm thinking actually as there's really only two high-level permissions for our systems, internal and user. Customers will be able to create tokens with restricted access abilities e.g. if they wanted to create a read-only token for pulling analytics data but that would always be for programmatic usages. And they'd do it in the context of some sort of API browsing tool like Facebook's API explorer https://developers.facebook.com/tools/explorer so the user would of course be able to introspect everything but the script/app wouldn't need to.

Also for new APIs/types/fields, using code names along with clearly designating them as restricted experimental APIs should be enough.

Great discussion! Thanks for everyone's input.

@smolinari

This comment has been minimized.

Copy link

commented Oct 26, 2015

Great discussion.

Two questions.

From what I read, it was or is deemed untenable to have field level authorization within a GraphQL type system. For the system we want to build, that would be a total no-go. Is this for sure? Nulling inaccessible fields is fine as an answer for, "you can't see this". The fact the field is there shouldn't really be a problem (but only for features that are released of course). To me, the whole two tier system Facebook follows is the old tried and true "develop and master (or production)" branch scheme used in a lot of development workflows. It is cool, when the API can easily follow this development scheme too. Still, we really need field level access. If nulling is ok, would it be tenable.

And going a level higher, it is/ was also deemed untenable to have a dynamic schema. Though, I am not sure what is meant by dynamic schema. Because, since GraphQL is a standard, it should be possible to build types through some sort of code building automation, shouldn't it? I have a hard time imagining many devs hand coding type definitions in a larger system, especially where certain rules or policies need to be abided by (like user access edit: or Facebook's "tracked query"). What if a dev forgets to add the necessary code for a certain business rule or policy? Or does that kind of business logic have to be somewhere else, like in another layer behind the type system?

Or am I completely out in left field with my thinking?

Scott

@rdooh

This comment has been minimized.

Copy link

commented Oct 27, 2015

Hi @smolinari - here's my definitely unauthoritative take..

Data-level vs Field-level

I think of GraphQL as a way to present underlying data in a graph-like manner regardless of how the underlying resources are actually persisted, etc. A GraphQL schema focuses on describing how they relate to one another - how they are connected conceptually. All of the responsibility for authorization, be it per type, or per field within a type, is delegated to the resolver functions and whatever lies behind them... so you could filter out/nullify restricted field data right in the resolver, or (better) pass authorization info on to the underlying service/data store and let it make final judgements about what information to reveal vs nullify. So you retain full field-level control of what data you return - even though the full schema is publicly available. @leebyron's last comment starts off by pointing out that this is essentially what you see in most REST API. The GraphQL schema is all about high-level organization and presentation of your data structures (types and fields), while deferring implementation details, including which data to return, to your custom code and underlying services... Sounds like this meets your needs.

Anyway, if your underlying services were secure enough before, they should continue to be so after putting GraphQL in front of them. If you're building from scratch, then you should continue to think about handling your business rules and policies as close to the data as possible.

Shaping Introspection?

Regarding a dynamic schema, my own original notion was that I didn't want to expose any more fields to a given request than needed - I wanted to do field-level authorization for presentation of the schema itself... so if 'address' wasn't available to a given user, then any introspection they tried would not reveal 'address' as a possible field. I wanted to limit people snooping around... that's where I started from. While I generally think that revealing as little information as possible is ideal, perhaps the optimal version is to 'reveal as little as is practical', given that you're also trying to support client-side developers. Trying to shape what parts of the schema are visible for a given user at a given point in time (as I was contemplating) appears to be the main ingredient in a recipe for over-engineering and probably much poorer performance. There's some discussion about strategies for if you really do need to manage a schema that varies, but I'm feeling convinced that it's not a problem that I desperately need solved.

Automation of Schema-Building

Given my understanding outlined above, I'm not sure I'd be too concerned about automation myself as I keep the logic in the schema to the barest minimum. My resolvers are little more than unsophisticated calls to existing services. I do it this way precisely because I don't want to be opening up the hood on the schema for every little bug fix or modification of business logic. As I think you allude to, it complicates the picture for devs by potentially requiring them to maintain complementary changes at both the schema level, and the underlying services - over-coupling, in other words. I think that in the 'ideal' GraphQL use case, you are only modifying the schema when you add new services/types or fields, or occasionally deprecating something... no business logic to speak of.

In this scenario, most of the code in your schema is just configuration, detailing how things are related - stuff that is largely outside of the scope of the individual underlying services (at least explicitly). I haven't thought about it much, but I suspect most automation would more or less just be some form of syntactic sugar - like Jade to HTML or CoffeeScript to JS, etc. - somewhere, you're still going to need someone to explicitly define how type X is related to type Y. This is in no way meant to suggest there isn't value here - for example, https://github.com/graphql/graphql-relay-js gives you library functions to help wire things up to be relay-ready.

Hope that helps - all my GraphQL experience so far is pretty light proof-of-concept trials right now, so I might not appreciate the problem!

@smolinari

This comment has been minimized.

Copy link

commented Oct 27, 2015

Thanks rdooh. I think we are on the same wavelength for the most part and appreciate your post above a lot.

What I am working toward is a system, where the dev and the even the "business ops" can create any objects they need for an application with point and click functionality. This work, in turn, would also entail creating the relationships you mentioned and with that metadata, we could automate the building of the types.

Don't want to take this too off-topic. So, if anyone would like to chat about these kinds of possibilities some more, I have a Gitter room, where we can discuss it further. Just let me know and I'll invite you.

Scott

@rdooh

This comment has been minimized.

Copy link

commented Oct 27, 2015

Interesting - have toyed with the idea of a gui for quickly mapping out relationships... not so much for maintaining an ongoing large app so much as quickly scaffolding out new ones. Invite would be appreciated.

@bogdanbiv bogdanbiv referenced this issue Nov 11, 2015

Open

Project Plan #2

5 of 9 tasks complete

@abetkin abetkin referenced this issue Feb 24, 2016

Open

graphql #13

@leebyron leebyron added the enhancement label May 7, 2016

@rmosolgo

This comment has been minimized.

Copy link

commented Nov 3, 2016

In case it's helpful, graphql-ruby included in a similar feature in v1.1.0, a description is here: http://rmosolgo.github.io/graphql-ruby/schema/limiting_visibility

@Goblinlordx

This comment has been minimized.

Copy link

commented Dec 20, 2016

So... actually for anyone interested I toyed with doing this and ran into a few issues that I haven't really solved yet. Most simple cases this seems actually simple and I tried prototyping this using a js version here that altered the schema based on a structure before passing the actual schema to graphql-js.

Here is the library that does this: https://github.com/Goblinlordx/authograph
Here is an example of usage: https://bitbucket.org/baldiviab/authed-graphql/src/7be7708ab996c312e02a9e82ae66aea1ab42a735?at=master
disclaimer: None of the above code is implied as being... good.

The problems really start to come when you start thinking about things like interfaces and unions. In those cases, if one of the types that fits the interface has different permissions than another, what do you do? How do you resolve this difference? Do you remove the type from the interface? Does this cause other problems with the schema? Does the interface itself have permissions?

It really brought up a lot of questions that I didn't feel was easy to generalize. Either way, this seemed like it was not actually something that needed to be a "part" of graphql in the first place. This could actually be done by pre-processing the schema prior to passing the schema to graphql. The processed schema can also be potentially cached (where safe) based on the context.

@smolinari

This comment has been minimized.

Copy link

commented Dec 20, 2016

There are two needs or perspectives for introspection as I see it. Please do correct me if I am wrong. I am wrong a lot!

  1. The first perspective is the developer perspective. She would have permissions to see everything anyway. The only thing she might want is a "single user's perspective" view of the available types and fields that a particular user (user's access profile) can access. In fact, she might also want to know if the data is read-only or also writable for that particular user. The dev will need this, to ensure the UI she is building works properly for particular users.

  2. The system security and obfuscation perspective. If the API is available over a URL, it is open to the Internet. Securing the Introspection is a matter of business preference.

To me, with the first perspective, special introspection is needed so any dev can know who can see what and when, and it will also (hopefully/ most likely?) be driven by some sort of querying of a GraphQL API. So, the overall solution is just another type of GraphQL query. And it is in end effect, a totally different graph (or branch, if need be). Instead of querying for application data, the query would be made to request system meta and access data.

The second perspective is the question, what do I want to show to the Internet? If the (again) business concern boils down to also having user profiles govern what is visible about the schema (i.e. what you already need for the first perspective), you could allow users to use that same metadata and data access API. Right? You would only give the limited view to them, as if the dev were looking at the access and metadata for (or even as) a particular user.

So, my conclusion is, fine grained introspection access to the schema just isn't something the GraphQL introspection should be worried about. Data access is a pure business concern, and since this data access isn't standardized, the devs of GraphQL can never make everyone happy all of the time and thus, field level access and the introspection of it should be taken care of at the business logic level.

In other words and in short, I think we are barking up the wrong tree here. 😄

A business logic driven introspection of the type schema isn't something GraphQL should be solving.

Scott

@JeffRMoore

This comment has been minimized.

Copy link
Contributor

commented Jan 8, 2017

I worked on an Enterprise SaaS product with both field level access control and user defined fields. We had a schema driven API with introspection.

The way we handled this was by generating a unique schema for each tenant based on a master organization level schema, then adding the UDFs for that tenant to the schema. We cached this tenant schema with permission metadata.

For each request, either main API or introspection, we would then filter the tenant schema based on current user permissions. That filtered schema was then used to power introspection responses, as well as to drive which input was allowed through to the underlying API logic and how the API output was serialized back in the response.

This is pretty much the solution suggested above. The only difference being that we only applied permission filtering to the portions of the schema relevant to the request that was being satisfied. Implementing that in graphql I think would involve changing code like this line:

https://github.com/graphql/graphql-js/blob/master/src/execution/execute.js#L1091

from

parentType.getFields()[fieldName]

to something like

parentType.getField(fieldName)

And then having the implementation of getField to do the schema filter based on permissions. Its probably more complex than that, but its a place to start investigating and find out.

So the approach would be basically to extend the schema implementation outside of graphql to perform the permission filtering, then pass the extended schema into graphql. Meanwhile, doing a few minor changes inside graphql to go through a more defined interface to the schema. I think a case could be made for making graphql work better with extended schemas. This might help graphql to work better with large schemas by lazy loading them in fragments, for example.

NOT saying that graphql schema implementation should do these things, but just that thinking about ways of extending schemas outside of graphql and passing them to graphql functions might be worth considering.

@rmosolgo

This comment has been minimized.

Copy link

commented Jan 8, 2017

Thanks for sharing that insight!

from

parentType.getFields()[fieldName]

to something like

parentType.getField(fieldName)

😌 Glad that seems like a good idea to you, that's how roughly it played out in graphql-ruby! More specifically, it moved to

query.get_field(parent_type, type_name)

(but that's just a matter of implementation.)

@blevine

This comment has been minimized.

Copy link

commented Feb 6, 2017

I also encountered this issue as I used @rmosolgo's support in graphql-ruby for limiting visibility of schema elements. I am already dynamically generating about 95% of my GraphQL schema by introspecting all available models (filtered by a blacklist) in my RoR application to generate types and standard CRUD fields. Models also have the ability to to declare their own additional queries and mutations in the models thus obviating the need to ever touch the schema directly. This dynamic generation also ensures that all business logic (including authorization) is handled by the target service (the RoR code) and is not coded into the resolvers themselves. This is was based on similar work I had done for our REST API.

However, in the REST API, the available operations (methods) are generated by "progressive disclosure." That is, each response contains a set of link relations representing the available resources which is calculated at request time based on the application's state plus additional state such as the roles of the authenticated user. I was trying to replicate something like this in our GraphQL API, but came to the conclusion (as many have in the preceding discussion) that maybe this wasn't really necessary.

Generating a single schema that changes only when the (models of) the underlying service changes allows us to make policy decisions (e.g. whether to return null or an error when an unauthorized user tries to access restricted fields) in the service itself.

@ivosabev

This comment has been minimized.

Copy link

commented Mar 27, 2017

The GitHub team has already solved this issue and I think it would be a great use for the community if they are willing to show their approach and some code.

cc: @brandonblack @kdaigle

@kdaigle

This comment has been minimized.

Copy link

commented Mar 27, 2017

The GitHub team has already solved this issue and I think it would be a great use for the community if they are willing to show their approach and some code.

We use the warden functionality that @rmosolgo's gem provides. I could probably extract a snippet or two if it'd be valuable but the majority of the heavy lifting is done by the gem itself based on a few items passed into the query's context.

@oxyno-zeta

This comment has been minimized.

Copy link

commented Sep 14, 2018

Hello,
Any news about this feature ?
Thanks !

@yaquawa

This comment has been minimized.

Copy link

commented Nov 27, 2018

Any update on this? I saw the graphql-ruby can provide such a feature!

http://graphql-ruby.org/authorization/visibility

@sibelius

This comment has been minimized.

Copy link

commented Dec 13, 2018

is there any user land solution for this?

@kaqqao

This comment has been minimized.

Copy link

commented Dec 16, 2018

graphql-java also has a solution in the form of customizable GraphQLFieldVisibility which is used in runtime to decide whether the field is visible or not.

@EntityB

This comment has been minimized.

Copy link

commented Jan 2, 2019

I'm quite late to this, but I belive I did exactly the solution OP was asking for with (in the end) elegant and effective code. In short the solution for me is schema transforms and schema stitching:
https://www.apollographql.com/docs/graphql-tools/schema-transforms#api

Services in some project

*If you can't afford to do many services, make this a monolith. Each time you see service, think component, module or package instead ;)

We have multiple backend services running in docker in our project. Lets call them
public website - this might be something anyone can visit and read without any authentification
user handler service - lets say this service has everything about user profile, history, meta data, user settings etc.
estimator service - this service somehow agregates everything that is happening and take care about stuff
data storage service - maybe user wants to store some heavy data. And this wouldn't be good to maintain within user handler, so this is separated here

Logic in nutshell

  • Each service is exposing GraphQL API inside docker network. But none of these services is exposing anything outside to the internet.

  • Instead we have one more service called API gateway. API gateway does schema stiching over these all services. By default for example we want some userId from user handler service to be used in data storage service, so we can do this with schema stiching.

  • We handle user identity in modern way, so we use Firebase. That is right, user handler service doesn't have user credentials, it is really more like just a profile . Now this is going to be very interesting!

Little code, so

schema of user handler service has these methods (among others):

query priv_user(
firebaseUserId: String!
): User

mutation admin_deleteUser(
firebaseUserId: String!
userId: String!
): Boolean

Schema of public website has these methods:

query pub_fetchTenLastArticles(): [Article]!

mutation priv_addComment(
firebaseUserId: String!
comment: Comment!
): Comment

Schema of data storage service has this method:

query priv_getFiles(
firebaseUserId: String!
): [FilesMeta]

Schema of estimator service has this method:

query admin_getWebStats(
firebaseUserId: String!
): WebStats!
  • But!!! When API gateway does schema stiching, it also calls transformations!
  • Outsider doesn't even know some services behind exists. This is hidden. All methods looks like from one monolith service.
  • There are three different transformations you must understand. Schema trasnformation, request trasnformation and result transformation. You must not use all of these when you design new transform object, in fact most of the time you just need one. You can cast multiple transformations after each other.
  • Transformations are kind of like middlewares in Express. And how do you authenticate with Express? Riiiight? Wink wink

Transformations on API gateway in order :

  • "If request doesn't contain cookie named firebaseUserToken, remove all methods (root fields) that are not starting on pub_ from the schema. " Which does, user that has no authentification will be only able to cast pub_fetchTenLastArticles and will not see anything else in the schema. This transforms the schema and schema only (not requests nor results).

  • "If request contains cookie named firebaseUserToken, call Firebase and exchange token for firebaseUserId and role. Then based on role remove all methods (root fields) with specific prefix based on the type of role. " For example common user will see all pub_ and priv_ methods, while admin can see more. Again this only changes schema.

  • "If firebaseUserId and role is set, remove firebaseUserId argument from every single method that has it from the schema. Later if method with removed argument is called, exchange firebaseUserToken for firebaseUserId and role with Firebase and then call underneath method (if allowed) with firebaseUserId placed back as an argument. " This is the tricky one, because you need to implement custom transform object for this. Also this transform schema and request.

  • One more OP needs there somewhere "Based on the role, select all attributes of all fields and filter them based on prefix". Maybe your user looks like:

type User {
  id: ID!
  firebaseUserId: String!
  profile: WhatEverData
  admin_dataYouDerivedSomehow: DataYouDerivedSomehow
}

and you don't want to create multiple methods and output types for whatever reasons you have, so when user fetch himself, he will not find property admin_dataYouDerivedSomehow, but admin does.

Any questions?

So the general idea behind this is quite low coupling. On the level of deep backend services everything is possible if you know firebaseUserId, you can call it from inside the VPN and test your methods. Without authentification!

  • You can run services separately and test them separately with different mock data. And NO AUTH!
  • New method? No problem, you just need to deploy one service and then restart API gateway. Or better! You can even implement the code into API gateway that will fetch status of services behind.
  • New role? No problem, first you implement it into API gateway and deploy it. Then implement role prefix into any method into any service behind it and deploy it.
  • One of the side services is down? Like why you should even care, schema stiching can be easily programed to handle various problems, so if for example data storage service is down? The rest is still working. Only part of the data will become "error", nothing that frontend components in for example react coudlnt handle.
  • User handling service is down? Well system probably doesnt work. But I mean API gateway, the public one. System inside your VPN, kuberneties, docker swarm or whatever masterpiece can still work and can be accessible. For your sysadmins, for your devops, for your whoever need to work and fix problems that happen somewhere. Who says you have one gateway. You can have more with different logic and rights.

Cons

I know how it sounds, but no I do not work for Apollo haha :) GraphQL is not the best thing that ever happen, but it was worth a shot for me.
First biggest cons of this solution is really API of interface Transform. So if you go for this, read this few lines 30 times at least in order to understand how does it work. There are few build in transform object you can use, but you will need more.

interface Transform = {
  transformSchema?: (schema: GraphQLSchema) => GraphQLSchema;
  transformRequest?: (request: Request) => Request;
  transformResult?: (result: Result) => Result;
};

type Request = {
  document: DocumentNode;
  variables: Record<string, any>;
  extensions?: Record<string, any>;
};

type Result = ExecutionResult & {
  extensions?: Record<string, any>;
};

Second biggest cons is that while writting custom transform object you will likely need to iterate thru the AST node which was at the time out of my skillset so I had to dig into some unknow tech and APIs. But In the end, two days and I felt like master piece was just born.
From that day, everything else was like easy peasy and for every future question I had solution in a second.

Code?

Project I'm working on is not out yet, but it is meant to be opensource.
Im sure a lot of help would be if there would exist more build in transforms, so we wouldn't need to write our owns. But on the other side, these are so much powerful that maybe in future GraphQL will be mostly about transforms, like NodeJS is a lot about Express.
Anyway I will be back to this thread with code once I will extract it from our projects. In the meantime let me know what do you guys think about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.