Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excluding Response Fields and Huge Performance Hits #168

Closed
rdegges opened this issue Mar 23, 2013 · 25 comments
Closed

Excluding Response Fields and Huge Performance Hits #168

rdegges opened this issue Mar 23, 2013 · 25 comments
Labels

Comments

@rdegges
Copy link

rdegges commented Mar 23, 2013

Hey dudes,

Love Flask-Restless, but wanted to leave an issue here as I'd love to use the library, but a showstopper in my use case (and possibly that of other people) is that Flask-Restless really has no way (currently) to properly exclude database relationship queries with the pre and postprocessor stuff.

For instance, I have an application which has several models, namely:

State and Person. Each person has a state relationship (via ForeignKey).

My API clients need to frequently grab a list of all states in the system, so they'll do something like:

GET /api/states

And will immediately get their screen flooded with results (because Flask-Restless is traversing the backref and generating JSON for the millions of people in each state).

Even if I were to explicitly write a postprocessor for the GET_MANY hook, this wouldn't really solve the problem as each request to my API endpoint /api/states would still result in Flask-Restless spending an enormous amount of time loading my people backref and eventually discarding that data and returning a response.

I think a good solution to this problem is what issue #161 is suggesting: have Flask-Restless only load foreign key relationships when they're explicitly eagerly loaded by Flask-SQLAlchemy.

I'm documenting this here for discussion purposes. Thank you.

@gpopovic
Copy link

I agree with this, maybe the better solution would be to somehow be able to decide on how deep (how many levels) should flask-restless traverse

@rdegges
Copy link
Author

rdegges commented Mar 24, 2013

I think another good idea would be to just handle URIs instead of navigating ForeignKeys.

For instance, instead of doing a ForeignKey lookup and serializing all results, restless could simply return a URI instead: "people": "/people/+18002223333". This way your client can then make an additional HTTP request to the desired resource URIs and get that data explicitly if they want it.

@jfinkels
Copy link
Owner

jfinkels commented Apr 5, 2013

Hmm, this is a difficult problem to solve. I don't know how to provide a simple way for users of this library to specify how relationships are loaded. The problem with loading only eagerly loaded relationships is that it is unexpected: if a client makes a GET request to /api/person/1, I believe it is reasonable for the client to expect that it will get a list of all the computers related to that person.

Here are some possible solutions:

  1. Add back in the exclude_columns keyword arguments which I so foolishly removed.
  2. Let the user specify a custom serialization function (replacing the default views._to_dict function), as suggested by @klinkin in Add support custom serialization function #167.
  3. Let the user specify the (global) relationship depth for serialization (and have a reasonable default [a depth of one, say]), as suggested by @goranek.
  4. Replace the actual data for each related instance with a link to the related instance, as suggested by @rdegges.

Solution 1 is simple, but inflexible, and creates somewhat complex code. Solution 2 would force the user to fix the problem and so would definitely solve the problem in that sense, but it may be complicated for the user; one of the goals of this library should be to take serialization out of the hands of the user. Solution 3 is simple, but again may be too inflexible. Solution 4 is a bit awkward: in the response's JSON object would include a mixture of some real information ("name": "Jeff") and some metainformation ("computers": [{"url": "/api/computers/1"}, {"url": "/api/computers/2"}]).

I think I am currently most interested in solution 3. I might try it out to see how well it works. Any other comments or thoughts?

@rdegges
Copy link
Author

rdegges commented Apr 5, 2013

Hey, so I think a combination of 1 + 4 would be ideal.

Here's my reasoning:

I think that having include_columns and exclude_columns options are really awesome, as they allow you to do do a lot of work really simply (for the user). For instance, not many APIs will want to expose every field, and defining your own postprocessor hooks is a lot more tedious than simply specifying which fields you'd like to use (or exclude).

I think that my suggestion (4) is really important, however, as it makes things a lot simpler (globally) for all Flask-Restless users. Since REST is all about resources and URIs, I think that having foreign key relationships loaded via a separate API URI is the right thing to do, especially since that makes it such that:

  • Users hitting the new URI will have pagination / etc. per their settings.
  • Users will instantly know that all relationships need to be clearly defined as APIs, forcing them to think about how many results they want to return, how it'll work, etc. I think this encourages 'good practices' for users by forcing them to think about what data they may be (accidentally) exposing with an API that eagerly loads related data.

I'm certainly open to all the other suggestions, and this is a really great topic. Thank you for spending some time reviewing this! <3

@jfinkels
Copy link
Owner

jfinkels commented Apr 5, 2013

That is a fairly convincing argument. I'll start by replacing exclude_columns and include_columns for now...

@bjourne
Copy link
Contributor

bjourne commented Apr 5, 2013

+1 Specifying which columns to load also solves the problem with flask-restless needlessy loading large deferred columns which is another performance drawback.

@klinkin
Copy link
Contributor

klinkin commented Apr 5, 2013

Recently i discovered an interesting project https://github.com/danielholmstrom/dictalchemy.
Maybe we should use it with flask-restless?
Then Solution 2 is not going to be very difficult for users!

@michaelfillier
Copy link

Linking to related resources seems to be the widely used approach to this type of issue. If you look at other APIs, like Instagram, for example, the User endpoint does not include a list of the user's media, rather a link to the resource.

Generally, if the related object list is more than 5, I would think that it is a prime candidate for exclusion.

@jfinkels
Copy link
Owner

@michaelfillier The Instagram example you gave is not convincing. Their sample response for GET /users/user-id/media/recent shows that much of the related information is present. Also, the URLs you see there are links to the plain-old web view of the photo, not the API URL for that resource.

However, if you can find other convincing examples, then I may be swayed into that way of thinking.

@michaelfillier
Copy link

Sorry, I wasn't clear and Instagram probably wasn't the best example and I was wrong about the href to the media endpoint.

Point being, when you hit the user endpoint you don't get the media for that user, in the same sense that the OP didn't want to include the computers when hitting the people endpoint.

Actually, Instagram's API was a really bad example, since a lot of the design decisions don't follow basic HATEOS principals, even with my little exposure to these principals I can see a lot of semantic gaps in their api.

@reubano
Copy link
Contributor

reubano commented Jul 29, 2014

I think the github api provides a better example:

https://api.github.com/users/reubano

{
  "login": "reubano",
  "id": 157864,
  "avatar_url": "https://avatars.githubusercontent.com/u/157864?",
  "gravatar_id": "869402f85dcbabcef3da1ee61b88a45a",
  "url": "https://api.github.com/users/reubano",
  "html_url": "https://github.com/reubano",
  .......
  "received_events_url": "https://api.github.com/users/reubano/received_events",
  "type": "User",
  "site_admin": false,
  .......
}

https://api.github.com/repos/reubano/amzn-search-api

{
  "id": 19201063,
  "name": "amzn-search-api",
  "full_name": "reubano/amzn-search-api",
  "owner": {
    "login": "reubano",
    "id": 157864,
    "avatar_url": "https://avatars.githubusercontent.com/u/157864?",
    "gravatar_id": "869402f85dcbabcef3da1ee61b88a45a",
    .......
    "received_events_url": "https://api.github.com/users/reubano/received_events",
    "type": "User",
    "site_admin": false
  },
  "private": false,
  "html_url": "https://github.com/reubano/amzn-search-api",
  "description": "RESTful API for searching Amazon sites",
  "fork": false,
  "url": "https://api.github.com/repos/reubano/amzn-search-api",
  .......
  "git_refs_url": "https://api.github.com/repos/reubano/amzn-search-api/git/refs{/sha}",
  .......
  "milestones_url": "https://api.github.com/repos/reubano/amzn-search-api/milestones{/number}",
  "language": "Python",
  "open_issues_count": 0,
  .......
  "subscribers_count": 1
}

@reubano
Copy link
Contributor

reubano commented Jul 29, 2014

The github api also implements key aspects of HATEOAS [#78] as well. That is, you can navigate through a users repo without knowing anything about the user aside from his/her screen name. You post to https://api.github.com/users/reubano to find the repos_url. You see it is https://api.github.com/users/reubano/repos so you post there to get a list of repos. You can then parse that response to grab the url item of the first repo in the list: https://api.github.com/repos/reubano/amzn-search-api.

It has the benefit that if github decides to change any of the urls, it won't effect the client because it is getting the url information from the subsequent api responses.

@jfinkels
Copy link
Owner

jfinkels commented Aug 4, 2014

Okay, by now I believe that replacing the content of related instances with a URL that points to the appropriate endpoint in the API is the correct way to solve this problem. However, there is still an issue with the following use case. Suppose I create an API that exposes Person instances, but not Computer instances, even though a Person instance may have a computers relation. What would happen in that situation? Flask-Restless needs to be able to know for which models the user has created an API.

@reubano
Copy link
Contributor

reubano commented Aug 5, 2014

What is the use case where someone would have a relation but not an API? I thought the point of FR was to create API endpoints from all relations.

@jfinkels
Copy link
Owner

jfinkels commented Aug 5, 2014

No, if that were the intention, then it wouldn't make you explicitly call APIManager.create_api() for each model. There may be some private information in a related table that the server shouldn't expose.

@reubano
Copy link
Contributor

reubano commented Aug 13, 2014

Gotcha, went back through an old project and now I remember. So what is preventing the code that returns related models to only do so for tables that were processed with the call for mgr.create_api()? Or can you have mgr.create_api() set a special parameter in the table? Something like hidden=False? Then FR will not display related models if hidden==True.

@jfinkels
Copy link
Owner

It's bad programming for a third-party library (Flask-Restless) to modify a model in that way.

@reubano
Copy link
Contributor

reubano commented Aug 13, 2014

Ok, well it can keep an internal cache instead of modifying the models directly.

@jfinkels
Copy link
Owner

That may indeed be a solution to this problem.

@reubano
Copy link
Contributor

reubano commented Aug 13, 2014

So whenever mgr.create_api() is called, just update a set created_models. Then do a set().intersection to compare the related models to the created_models.

@kbuilds
Copy link

kbuilds commented Jan 13, 2015

Has this issue gotten anywhere? I loved how simple this was to set up on top of SqlAlchemy, but I really can't use it if it is bringing back a lot of extra data, as described in @rdegges OP.

I don't have a ton of experiences with APIs, but even if the overhead was not a concern, I am not sure if many MV frameworks will work out-of-the-box with data formatted in this way.

@jfinkels
Copy link
Owner

jfinkels commented Feb 9, 2015

I think I'm going to implement the following. A request to GET /api/person/1 will return something like

{'id': 1,
 'computers': [1, 2, 3]
}

where [1, 2, 3] is a list of the primary key values for each of the computers in the one-to-many relation.

This follows one of the suggestions for the collection representation standards at jsonapi.org. I'd like to implement all of the suggestions there for version 1.0.0 of Flask-Restless.

This may not be the best solution, but I'd like to at least have a start so that this library can actually be usable with large databases.

@kbuilds
Copy link

kbuilds commented Feb 9, 2015

That's a good way to do it, and would definitely solve the issue outlined in this thread.

Wouldn't it be a lot simpler to just have a link in each person object?
computers: /api/person/1/computers

Which would return a list of computer objects filtered by person 1. This is good for large databases because it only ever gives you the data you asked for.

Just a thought.

@jfinkels jfinkels added the bug label Feb 11, 2015
jfinkels added a commit that referenced this issue Feb 19, 2015
Before the behavior of Flask-Restless was a bit arbitrary. Now we force
it to comply with a concrete (though still changing) specification,
which can be found at http://jsonapi.org/.

This is a (severely) backwards-incompatible change, as it changes which
API endpoints are exposed and the format of requests and responses.

This fixes (or at least makes it much easier to fix or much easier to
mark as "won't fix") several issues, including but not limited to

  - #87
  - #153
  - #168
  - #193
  - #208
  - #211
  - #213
  - #243
  - #252
  - #253
  - #258
  - #261
  - #262
  - #303
  - #394
jfinkels added a commit that referenced this issue Feb 23, 2015
Previously, the behavior of Flask-Restless was a bit arbitrary. Now we
force it to comply with a concrete (though still changing)
specification, which can be found at http://jsonapi.org/.

This is a (severely) backwards-incompatible change, as it changes which
API endpoints are exposed and the format of requests and responses.

This change also moves JSON API compliance tests to a convenient
distinct test module, `tests.test_jsonapi.py`, so that compliance with
the specification can be easily verified. These tests correspond to
version 1.0rc2 of the JSON API specification, which can be found in
commit json-api/json-api@af5dfcc.

This change fixes (or at least makes it much easier to fix or much
easier to mark as "won't fix") quite a few issues, including but not
limited to

  - #87
  - #153
  - #168
  - #193
  - #208
  - #211
  - #213
  - #243
  - #252
  - #253
  - #258
  - #261
  - #262
  - #303
  - #394
jfinkels added a commit that referenced this issue Feb 24, 2015
Previously, the behavior of Flask-Restless was a bit arbitrary. Now we
force it to comply with a concrete (though still changing)
specification, which can be found at http://jsonapi.org/.

This is a (severely) backwards-incompatible change, as it changes which
API endpoints are exposed and the format of requests and responses.

This change also moves JSON API compliance tests to a convenient
distinct test module, `tests.test_jsonapi.py`, so that compliance with
the specification can be easily verified. These tests correspond to
version 1.0rc2 of the JSON API specification, which can be found in
commit json-api/json-api@af5dfcc.

This change fixes (or at least makes it much easier to fix or much
easier to mark as "won't fix") quite a few issues, including but not
limited to

  - #87
  - #153
  - #168
  - #193
  - #208
  - #211
  - #213
  - #243
  - #252
  - #253
  - #258
  - #261
  - #262
  - #303
  - #394
jfinkels added a commit that referenced this issue Mar 3, 2015
Previously, the behavior of Flask-Restless was a bit arbitrary. Now we
force it to comply with a concrete (though still changing)
specification, which can be found at http://jsonapi.org/.

This is a (severely) backwards-incompatible change, as it changes which
API endpoints are exposed and the format of requests and responses.

This change also moves JSON API compliance tests to a convenient
distinct test module, `tests.test_jsonapi.py`, so that compliance with
the specification can be easily verified. These tests correspond to
version 1.0rc2 of the JSON API specification, which can be found in
commit json-api/json-api@af5dfcc.

This change fixes (or at least makes it much easier to fix or much
easier to mark as "won't fix") quite a few issues, including but not
limited to

  - #87
  - #153
  - #168
  - #193
  - #208
  - #211
  - #213
  - #243
  - #252
  - #253
  - #258
  - #261
  - #262
  - #303
  - #394
jfinkels added a commit that referenced this issue Mar 5, 2015
Previously, the behavior of Flask-Restless was a bit arbitrary. Now we
force it to comply with a concrete (though still changing)
specification, which can be found at http://jsonapi.org/.

This is a (severely) backwards-incompatible change, as it changes which
API endpoints are exposed and the format of requests and responses.

This change also moves JSON API compliance tests to a convenient
distinct test module, `tests.test_jsonapi.py`, so that compliance with
the specification can be easily verified. These tests correspond to
version 1.0rc2 of the JSON API specification, which can be found in
commit json-api/json-api@af5dfcc.

This change fixes (or at least makes it much easier to fix or much
easier to mark as "won't fix") quite a few issues, including but not
limited to

  - #87
  - #153
  - #168
  - #193
  - #208
  - #211
  - #213
  - #243
  - #252
  - #253
  - #258
  - #261
  - #262
  - #303
  - #394
jfinkels added a commit that referenced this issue Mar 7, 2015
Previously, the behavior of Flask-Restless was a bit arbitrary. Now we
force it to comply with a concrete (though still changing)
specification, which can be found at http://jsonapi.org/.

This is a (severely) backwards-incompatible change, as it changes which
API endpoints are exposed and the format of requests and responses.

This change also moves JSON API compliance tests to a convenient
distinct test module, `tests.test_jsonapi.py`, so that compliance with
the specification can be easily verified. These tests correspond to
version 1.0rc2 of the JSON API specification, which can be found in
commit json-api/json-api@af5dfcc.

This change fixes (or at least makes it much easier to fix or much
easier to mark as "won't fix") quite a few issues, including but not
limited to

  - #87
  - #153
  - #168
  - #193
  - #208
  - #211
  - #213
  - #243
  - #252
  - #253
  - #258
  - #261
  - #262
  - #303
  - #394
jfinkels added a commit that referenced this issue Mar 8, 2015
Previously, the behavior of Flask-Restless was a bit arbitrary. Now we
force it to comply with a concrete (though still changing)
specification, which can be found at http://jsonapi.org/.

This is a (severely) backwards-incompatible change, as it changes which
API endpoints are exposed and the format of requests and responses.

This change also moves JSON API compliance tests to a convenient
distinct test module, `tests.test_jsonapi.py`, so that compliance with
the specification can be easily verified. These tests correspond to
version 1.0rc2 of the JSON API specification, which can be found in
commit json-api/json-api@af5dfcc.

This change fixes (or at least makes it much easier to fix or much
easier to mark as "won't fix") quite a few issues, including but not
limited to

  - #87
  - #153
  - #168
  - #193
  - #208
  - #211
  - #213
  - #243
  - #252
  - #253
  - #258
  - #261
  - #262
  - #303
  - #394
@jfinkels
Copy link
Owner

On the master branch, Flask-Restless now follows the JSON API protocol. Therefore, relationships (like the residents to-many relationship from the State model to the Person model in the original post) are just referenced, not included unless explicitly requested by the client. This problem should be fixed.

@jasonrhaas
Copy link

So is there a good work around for this on the current stable release (0.17.0)? I'm not ready to move to the beta version since I am using this in production and it is still under development. Same requests as others, I would rather not have the relationship information available in my endpoint, I just want the table data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants