Skip to content
This repository has been archived by the owner on Dec 7, 2022. It is now read-only.

Problem: OpenAPI schema doesn't use URIs to identify resources #3561

Merged
merged 1 commit into from Aug 31, 2018

Conversation

dkliban
Copy link
Member

@dkliban dkliban commented Jul 24, 2018

Solution: Create a custom OpenAPI generator that refers to resources by their URI.

To see the Docs generated with this change, point the browser to http://:8000/pulp/api/v3/docs/

The JSON schema that can be used to generate bindings can be downloaded at http://localhost:8000/pulp/api/v3/docs/api?format=openapi

@pep8speaks
Copy link

pep8speaks commented Jul 24, 2018

Hello @dkliban! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on August 30, 2018 at 13:27 Hours UTC

def get_resource_description(self, name, example_uri):
"""Return a description for the resource associated with the viewset.
:param type view: the view class for which a description is desired.
:return: description of the
Copy link
Contributor

@dralley dralley Jul 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use google/napolean style docstrings

:param bool public: if True, all endpoints are included regardless of access through `request`
:returns: the :class:`.Paths` object and the longest common path prefix, as a 2-tuple
:rtype: tuple[openapi.Paths,str]
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use google/napolean style docstrings

@werwty
Copy link
Contributor

werwty commented Jul 24, 2018

This is a subset of the schema generated from this code

swagger: '2.0'
info:
  title: Pulp3 API
  version: v3
host: pulp3.dev:8000
schemes:
  - http
basePath: /pulp/api/v3
consumes:
  - application/json
produces:
  - application/json
securityDefinitions:
  basic:
    type: basic
security:
  - basic: []
paths:
  /{artifact_href}/:
    get:
      operationId: artifacts_read
      description: ''
      parameters: []
      responses:
        '200':
          description: ''
      consumes:
        - multipart/form-data
        - application/x-www-form-urlencoded
      tags:
        - artifacts
    delete:
      operationId: artifacts_delete
      description: Remove Artifact only if it is not associated with any Content.
      parameters: []
      responses:
        '204':
          description: ''
      consumes:
        - multipart/form-data
        - application/x-www-form-urlencoded
      tags:
        - artifacts
    parameters:
      - name: 'artifact_href'
        in: path
        description: 'URI of Artifact. e.g.: /artifacts/1/'
        required: true
        type: string
  /{file_content_href}/:
    get:
      operationId: content_file_files_read
      description: ''
      parameters: []
      responses:
        '200':
          description: ''
          schema:
            $ref: '#/definitions/FileContent'
      tags:
        - content
    parameters:
      - name: 'file_content_href'
        in: path
        description: 'URI of File Content. e.g.: /content/file/files/1/'
        required: true
        type: string
definitions:
  FileContent:
    required:
      - relative_path
      - artifact
    type: object
    properties:
      id:
        title: ID
        type: integer
        readOnly: true
      created:
        title: Created
        description: Timestamp of creation.
        type: string
        format: date-time
        readOnly: true
      type:
        title: Type
        type: string
        readOnly: true
        minLength: 1
      notes:
        title: Notes
        description: A mapping of string keys to string values, for storing notes
          on this object.
        type: object
        additionalProperties:
          type: string
          minLength: 1
      _href:
        title: ' href'
        type: string
        format: uri
        readOnly: true
      relative_path:
        title: Relative path
        description: Relative location of the file within the repository
        type: string
        minLength: 1
      artifact:
        title: Artifact
        description: Artifact file representing the physical content
        type: string
        format: uri

How would any client generated from this schema know which path is being accessed? Since /{file_content_href}/ and /{artifact_href}/ are functionally equivalent?

@bmbouter
Copy link
Member

I don't know much about openapi, and I could be totally wrong about this.

I think the human interacting with the autogenerated client would know what should go there from the description like description: 'URI of Artifact. e.g.: /artifacts/1/'. Similarly a programmer interacting with the bindings. I believe the user would get that string from an earlier call when the Artifact was created and the server returned /artifacts/1/.

@dkliban
Copy link
Member Author

dkliban commented Jul 24, 2018

The operationId provides each method in the generated bindings with a name. So the user of the bindings knows exactly what to specify for the parameter.

https://github.com/dkliban/pulp3-python-bindings/blob/href-as-identifier/swagger_client/api/remotes_api.py#L531

Is that what you were asking?

@dkliban
Copy link
Member Author

dkliban commented Jul 24, 2018

@werwty Or were you talking about validation of the response? I think that is what the schema response objects are used for. Artifact does not currently have one specified, but most of the others do. Like hrere: https://github.com/dkliban/pulp3-python-bindings/blob/href-as-identifier/swagger_client/api/remotes_api.py#L127

@werwty
Copy link
Contributor

werwty commented Jul 24, 2018

The operationId provides each method in the generated bindings with a name. So the user of the bindings knows exactly what to specify for the parameter.

Ah, I just looked into the bindings code and you are right. The bindings does not use the path to generate the swagger_client.api objects.

How would validation on the urls be done? Can I pass any url to {artifact_href}?
From the generated bindings code I suspect yes:

    def artifacts_read_with_http_info(self, artifact_href, **kwargs):  # noqa: E501

        path_params = {}
        if 'artifact_href' in params:
            path_params['artifact_href'] = params['artifact_href']  # noqa: E501

        return self.api_client.call_api(
            '/{artifact_href}/', 'GET',
            path_params,
            query_params,
            header_params,
            body=body_params,
            post_params=form_params,
            files=local_var_files,
            response_type=None,  # noqa: E501
            auth_settings=auth_settings,
            async=params.get('async'),
            _return_http_data_only=params.get('_return_http_data_only'),
            _preload_content=params.get('_preload_content', True),
            _request_timeout=params.get('_request_timeout'),
            collection_formats=collection_formats)

I also ran the generated schema code through a validator, and swagger does not like equivalent paths. There are a couple of issues open discussing this, but it doesn't appear that they're going to support it in the schema specification:
OAI/OpenAPI-Specification#576

@daviddavis
Copy link
Contributor

daviddavis commented Jul 24, 2018

How would validation on the urls be done? Can I pass any url to {artifact_href}?
From the generated bindings code I suspect yes

I agree. I found another RFE that worries me: OAI/OpenAPI-Specification#577.

Edit: I do think the bindings though would validate the response and raise an error if the server returned something other than an artifact.

@dkliban
Copy link
Member Author

dkliban commented Jul 25, 2018

@werwty Thank you for looking into the schema validation. I agree that we should try to produce a schema that is OpenAPI compliant.

I have done some reading in this area and have found that there is a desire (mostly from Google) to improve the OpenAPI 3 spec to include either an ability to have path parameters that have '/' in them[0] or add ability to specify path parameters of type 'hypermedia'[1]. Another proposal is to have OpenAPI be able to keep track of resources[2].

With all that said, these are only proposals at this time and we cannot use any of the methods from above to describe our API at this time.

In reading all these proposals it became clear that the problem with the shema this PR is generating is that it it is too obscure for generating server side code that would be matching URLs. However, this schema is not a problem for generating docs and bindings. I also think that @timburks said it best here[3].

Perhaps we should be more liberal about what WE consider valid OpenAPI schema?

[0] OAI/OpenAPI-Specification#1459
[1] OAI/OpenAPI-Specification#1553
[2] OAI/OpenAPI-Specification#1553 (comment)
[3] OAI/OpenAPI-Specification#892 (comment)

@bmbouter
Copy link
Member

Maybe we can accept ids and hrefs. I was thinking about how that would work, and there are some questions I have. Let's assume we don't merge this PR and let the openAPI v2 schema describe the API with IDs.

Here's my question. Currently we return task data like:

{
    "created_resources": [
        "http://localhost:8000/pulp/api/v3/repositories/1/versions/3/"
    ]
}

For the user to use that resulting data with an API then need to parse the url to get the two expected parameters (repo=1, version=3). We could do something like this:

{
	"created_resources": [{
		"url": "http://localhost:8000/pulp/api/v3/repositories/1/versions/3/",
		"repository_id": 1,
		"version": 3
	}]
}

As yet a third alternative, we could do what the github API does which is return the entire sub-resource serialized. That would be like:

{
	"created_resources": [{
		"url": "http://localhost:8000/pulp/api/v3/repositories/1/versions/3/",
		"repository_id": 1,
                ....  lots more things ....
		"version": 3
	}]
}

In thinking about this, my primary goals are:

  1. To provide clickable urls returned with every object we are referring to in responses.
  2. To accept urls as references like we do today, even if the openAPI schema still describes them using IDs.

What do you think we should do?

@daviddavis
Copy link
Contributor

daviddavis commented Jul 25, 2018

@bmbouter the more I think about the hrefs vs id debate, the more I think a solution like one of the ones you propose makes the best sense. 👍 from me.

@werwty
Copy link
Contributor

werwty commented Jul 25, 2018

+1 option 2. Let's return the URI and the parameters (but not the fully serialized sub-resource, just enough to identify what the sub-resource is)

To provide clickable urls returned with every object we are referring to in responses.

👍

To accept urls as references like we do today, even if the openAPI schema still describes them using IDs.

👍
As a note, openapi describes these references as strings, with a format that isn't validated anywhere.
We could change the format/description to identifier to be technically correct.

{  
    type:"object",
    properties:{  
        repository:{  
            title:"Repository",
            description:"A URI of the repository to be synchronized.",
            type:"string",
            format:"uri"
        },
        repository_version:{  
            title:"Repository Version",
            description:"A URI of the repository version to be published.",
            type:"string",
            format:"uri"
        }
    }
},

Edit:

To accept urls as references like we do today, even if the openAPI schema still describes them using IDs.

I was confused and thought you were talking about references to other objects (like publish needing a reference to a repository.) Re-reading this I see your concern is with path parameters, which has to take in the id to construct the url like in /repository/{id}. I think it's fair not to support referencing via url in this case, since bindings users will probably be using/storing the id as a reference.

I think it's ok to support full on HATEOAS style API for the REST API users, but let tools that integrate with pulp store non href identifiers.

@daviddavis
Copy link
Contributor

Correct me if I am wrong but accepting id and href would mean we don't have to define generators to alter the schema nor the bindings? We could use them as they are?

@bmbouter
Copy link
Member

@daviddavis With this PR option, either path would allow us to use the generic bindings and generic generators.

I can see there is much support for a compromise. I'm outlining this compromising position to expore it, but I don't think it's as awesome as just using hrefs and accepting this PR. I want to clearly state that. By going the ID route we're trading a simpler API (one way to represent resources) for a more complex, but compliant one. When presented w/ this choice, many Google teams decided to do something that is less compliant, but lower complexity. In software I think complexity is the real enemy, so we should do the same. Also the bindings this PR produces look extremely usable (to me).

I believe only with full support form @daviddavis and @werwty and @jlsherrill could I hope this could still be Pulp's API future. Otherwise we'll end up with a compromise. I'm trying to make a case for this API approach on its merit. It's hard though because IDs are probably how everyone has done projects before (including me). The merit of this API approach is that using hrefs produces a lower complexity API, and that is a better API. Are IDs really great or are they just familiar? Is openAPI v2's decision decision of how the APIs work really the one for Pulp? I suggest the answer is no and that we should go the way of least complexity instead.

@bmbouter
Copy link
Member

In all cases (option 1, 2, or 3) how does the user handle the multiple object types that could be returned by created_resource. It could get back a repository version or a publication? The attributes and/or href will need to be handed back to a binding, how does it know which one?

@dkliban
Copy link
Member Author

dkliban commented Jul 26, 2018

@bmbouter I think @daviddavis was talking about the generator class that I wrote in this PR.

@daviddavis I think that we still need to write a custom schema generator to expose all the right docs. I am thinking of the 'repository_pk' and 'number' path parameters that don't have docs right now. https://docs.pulpproject.org/en/3.0/nightly/integration-guide/rest-api/index.html#get--repositories-repository_pk-versions-number-

Perhaps there is another way to fix it that I am not aware of.

@dkliban
Copy link
Member Author

dkliban commented Jul 26, 2018

@bmbouter In all cases the user will need process the response and decide what to do with it.

For both options 1 and 2, I don't think we'll be able to describe using OpenAPI 2.0 a schema that returns 1 of 2 different responses. I don't think there is that flexibility in the spec.

In the case of this PR, the bindings generated from this PR don't provide any machinery for converting an href into an API client for that resource type. The user will still have to write code that determines what kind of API client needs to be used to access this resource.

jortel
jortel previously requested changes Aug 24, 2018
Copy link
Contributor

@jortel jortel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The schema generator creates a schema that is less openAPI compliant and is intended to support a specific API use case. Namely to support generating bindings with swagger. The custom generator is a great approach but it should not be associated to the (main) /api endpoint. We should add another endpoint that is backed by the custom generator that can be used for swagger.

@bmbouter
Copy link
Member

@jortel I'm confused by the suggestion to create two openAPI schemas. How can we describe one API two ways? The plan just sent to pulp-dev says that we're not supporting IDs and only hrefs. Assuming that's the plan, this PR is the only way openAPI can describe an API like that that I know of.

Am I thinking about this right?

@jortel
Copy link
Contributor

jortel commented Aug 24, 2018

It's my understanding that the custom schema generator proposed here generates a schema with paths look like:

paths:
  /{artifact_href}/:

instead of:

paths:
  /artifact/:

to accommodate swagger.

Is my understanding incorrect?

Perhaps a better, clearer suggestion would be: instead of a different endpoint, how about a different format. Eg: /docs/api/?format=openapi-for-swagger

@daviddavis
Copy link
Contributor

The /artifacts/ path would remain the same (assuming you're talking about the list artifacts path). But for single resource paths (e.g. read artifact, delete artifact, etc) those would be like what you mentioned:

GET /artifacts/
POST /artifacts/
GET /{artifact_href}/
PUT /{artifact_href}/
DELETE /{artifact_href}/

@jortel
Copy link
Contributor

jortel commented Aug 24, 2018

@daviddavis, gotcha. /artifacts/ was a bad example but same point.

@jortel
Copy link
Contributor

jortel commented Aug 24, 2018

After getting a better understanding of this from @bmbouter, I no longer have the concern about the proposed generated schema at the /docs/api end point.

However, I'd like to suggest an improvement in the description field for paths (and parameters) like:

paths:
  /{artifact_href}/:
    get:
      operationId: artifacts_read
      description: ''
      parameters: []
      responses:

and

    parameters:
      - name: 'artifact_href'
        in: path
        description: 'URI of Artifact. e.g.: /artifacts/1/'
        required: true
        type: string

I think it would be clearer if the description contained the format of the href instead of an example.

For example:

    parameters:
      - name: 'artifact_href'
        in: path
        description: 'The URI of an Artifact: /artifacts/{id}/'
        required: true
        type: string

@dkliban
Copy link
Member Author

dkliban commented Aug 24, 2018

@jortel I am concerned that if we add a format of the href, we will then need to once again describe what the 'id' field is. Or what the 'repository_pk' and 'version_number' are for the repo version href.

@jortel jortel dismissed their stale review August 24, 2018 16:35

I'm not qualfied to review this.

@codecov
Copy link

codecov bot commented Aug 24, 2018

Codecov Report

Merging #3561 into master will decrease coverage by 0.65%.
The diff coverage is 24%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3561      +/-   ##
==========================================
- Coverage   57.71%   57.05%   -0.66%     
==========================================
  Files          60       61       +1     
  Lines        2521     2571      +50     
==========================================
+ Hits         1455     1467      +12     
- Misses       1066     1104      +38
Impacted Files Coverage Δ
pulpcore/pulpcore/app/urls.py 91.93% <100%> (+0.13%) ⬆️
pulpcore/pulpcore/app/openapigenerator.py 22.44% <22.44%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 50605b0...2eccf45. Read the comment docs.

@dkliban dkliban force-pushed the openapi-schema-update branch 3 times, most recently from 697be76 to b3c7b82 Compare August 25, 2018 13:50
@dkliban
Copy link
Member Author

dkliban commented Aug 25, 2018

I was able to generate the bindings using this schema. I then wrote the following script that uses the python bindings to create a remote, a repository, and then sync the remote into the repository.

import swagger_client
from swagger_client.rest import ApiException
from pprint import pprint
from urllib.parse import urlparse

# Configure HTTP basic authorization: basic
configuration = swagger_client.Configuration()
configuration.username = 'admin'
configuration.password = 'admin'
configuration.safe_chars_for_path_param = '/'

client = swagger_client.ApiClient(configuration)

# create an instance of the API class
api = swagger_client.PulpApi(client)
remote_data = swagger_client.FileRemote(name='bar27',
                                        url='https://repos.fedorapeople.org/pulp/pulp/demo_repos'
                                            '/test_file_repo/PULP_MANIFEST')

try:
    api_response = api.pulp_api_v3_remotes_file_create(remote_data)
    remote_href = urlparse(api_response.href).path

    pprint(api_response)
except ApiException as e:
    print("Exception when calling RemotesApi->pulp_api_v3_remotes_file_create: %s\n" % e)

repository_data = swagger_client.Repository(name='foo27')

try:
    api_response = api.pulp_api_v3_repositories_create(repository_data)
    repository_href = urlparse(api_response.href).path
    pprint(api_response)
except ApiException as e:
    print("Exception when calling RepositoriesApi->pulp_api_v3_repositories_create: %s\n" % e)

repository_sync_data = swagger_client.RepositorySyncURL(repository=repository_href)

try:
    api_response = api.pulp_api_v3_remotes_file_sync(remote_href, repository_sync_data)
    pprint(api_response)
except ApiException as e:
    print("Exception when calling Api->pulp_api_v3_remotes_file_sync: %s\n" % e)

@codecov
Copy link

codecov bot commented Aug 25, 2018

Codecov Report

Merging #3561 into master will decrease coverage by 0.95%.
The diff coverage is 21.73%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3561      +/-   ##
==========================================
- Coverage    57.7%   56.75%   -0.96%     
==========================================
  Files          60       61       +1     
  Lines        2523     2592      +69     
==========================================
+ Hits         1456     1471      +15     
- Misses       1067     1121      +54
Impacted Files Coverage Δ
pulpcore/pulpcore/app/urls.py 91.93% <100%> (+0.13%) ⬆️
pulpcore/pulpcore/app/openapigenerator.py 20.58% <20.58%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c097a4e...b9399bc. Read the comment docs.

@daviddavis
Copy link
Contributor

@dkliban thanks for the code example. It's helpful. I see you calling this:

remote_href = urlparse(api_response.href).path

After we return relative URI, it will just be this?:

remote_href = api_response.href

@dkliban
Copy link
Member Author

dkliban commented Aug 27, 2018

@daviddavis Yes, that is exactly correct. urlparse is only needed because we are returning a full URL.

@daviddavis
Copy link
Contributor

Looks like there are problems with the pulp_ansible plugin. There's a view that returns a static dict the Galaxy CLI requires when looking at the API's versions. There's no queryset needed since the response is static and so this line raises a exception: AttributeError: 'AnsibleGalaxyVersionView' object has no attribute 'queryset'.

@dkliban dkliban force-pushed the openapi-schema-update branch 2 times, most recently from f421046 to a01445d Compare August 28, 2018 14:40
Solution: Create a custom OpenAPI generator that refers to resources by their URI.

closes pulp#3856
https://pulp.plan.io/issues/3856

closes pulp#3851
https://pulp.plan.io/issues/3851
Copy link
Contributor

@daviddavis daviddavis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worked and code LGTM 👍

@dkliban dkliban merged commit 7a063f5 into pulp:master Aug 31, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
7 participants