Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal to split api.py into different files #1405

Merged
merged 73 commits into from
Apr 5, 2024

Conversation

totycro
Copy link
Contributor

@totycro totycro commented Nov 15, 2023

Overview

This is a proposal of how api.py could be split up and is meant as a basis for discussion. The goal is to have a file structure like this:

pygeoapi/api/
├── __init__.py
└── coverages.py
└── features.py
└── processes.py
└── ...

This PR implements this for 2 endpoints of processes, but can be applied mechanically to all others.

There are also more possibilities for refactorings of the apis like extractions of common code or consolidation of error handling, but the idea here is to split up the file first, because it will necessarily be a huge diff, and smaller improvements can then be implemented in the single files at a later point.

One issue with the split is that API is a class and thus has to be in one file. This proposal changes the methods into regular functions which receive the API object as first parameter. Since we don't really use OOP features here, this is a simple change.

This commit performs the necessary adaption to transform an api function to the new style and shows that the changes are minor: 78d6a0a#diff-c9492e9767224068f9eb894b4558aea84f02c491d7a8975ae936f99ea7fc8232

Alternatives to this approach would be to either create a mixin in each file or to have classes for each api inheriting from a base API. I think both are more complicated but I'm happy to discuss

This proposal also includes an adapter from api functions to web framework endpoints (only implemented for flask now), which can replace decorators like pre_process and gzip, and which could be used in the future for a more convenient error handling or other abstractions: 78d6a0a#diff-631fe735514040ce51f87cfb0c8eeeb1b7895d42c9364cea597355062732c159R407
This proposal would also work without this adapter if we apply the decorators to all api functions as before.

Additionally to splitting up api.py, also the tests are moved to according files (f452d45) as well as the openapi defintion (b7d9201)

Once we agree on a style for this whole refactoring, I can apply it to the whole api.py file.

Contributions and Licensing

(as per https://github.com/geopython/pygeoapi/blob/master/CONTRIBUTING.md#contributions-and-licensing)

  • I'd like to contribute [feature X|bugfix Y|docs|something else] to pygeoapi. I confirm that my contributions to pygeoapi will be compatible with the pygeoapi license guidelines at the time of contribution.
  • I have already previously agreed to the pygeoapi Contributions and Licensing Guidelines

@totycro
Copy link
Contributor Author

totycro commented Jan 3, 2024

I just rebased the branch. The tests currently fail due to a python version issue, which i can fix if we decide to actually follow this approach.

return api_req

@classmethod
def from_flask(cls, request: flask.Request, supported_locales
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should adopt a Protocol for the incoming request from Web Frameworks and also adopt the dependency injection pattern to pass it to the APIRequest class

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would a Protocol look like for this? The different request classes have some different fields, so I'm not sure how it could be combined to one Protocol

@tomkralidis
Copy link
Member

tomkralidis commented Mar 6, 2024

Strategy:

  • split pygeoapi/api.py into one file per API standard/specification, so:
    • pygeoapi/api/__init__.py
    • pygeoapi/api/processes.py
    • pygeoapi/api/features.py
    • pygeoapi/api/coverages.py
    • pygeoapi/api/records.py
    • pygeoapi/api/environmental_data_retrieval.py
    • pygeoapi/api/maps.py
    • pygeoapi/api/tiles.py
    • pygeoapi/api/stac.py
  • pygeoapi/openapi.py will be split into the above, at which point pygeoapi/openapi.py would import accordingly.

@totycro will begin with processes as the model to follow.

The above needs to work against existing tests for parity, at which point we can then refactor tests/test_api.py in the same manner.

@tomkralidis
Copy link
Member

Prior to starting this work, we will wait for the approve/merge of:

@GeoSander
Copy link
Contributor

nice work tom, agree with sander

To be fair, it was mostly not Tom's work ;)
My mistake!

@tomkralidis
Copy link
Member

nice work tom, agree with sander

To be fair, it was mostly not Tom's work ;) My mistake!

Correct, @totycro deserves the credit here :)

Copy link
Member

@ricardogsilva ricardogsilva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! 🎉

I've left some comments on things that I think could be changed, but I'm not a core maintainer, so take them with a grain of salt 😉

Overall I think this would be a very good change to the code which will make it easier to keep improving things in the future.

Comment on lines +142 to +160
async def execute_from_starlette(api_function, request: Request, *args,
skip_valid_check=False) -> Response:
api_request = await APIRequest.from_starlette(request, api_.locales)
content: str | bytes
if not skip_valid_check and not api_request.is_valid():
headers, status, content = api_.get_format_exception(api_request)
else:

loop = asyncio.get_running_loop()
headers, status, content = await loop.run_in_executor(
None, call_api_threadsafe, loop, api_function,
api_, api_request, *args)
# NOTE: that gzip currently doesn't work in starlette
# https://github.com/geopython/pygeoapi/issues/1591
content = apply_gzip(headers, content)

response = _to_response(headers, status, content)

return response
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be much overlap between this function and get_response(), with a notable difference being that get_response() is not converting the incoming starlette.request.Request into the pygeoapi APIRequest - I guess this could lead to subtle bugs, as some of the starlette path operations are calling get_response() while others use execute_from_starlette() instead.

Perhaps these two functions should be unified into a single one - likely just keeping execute_from_starlette() which does indeed ensure translation between starlette.request.Request and pygeoapi APIRequest?

Additionally, unifying these two similar functions would allow inlining the code in _to_response() into the unified function, which seems like a win IMO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea; suggest to defer until this PR is completed/merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also agree, this is also what the related TODOs hint towards. We could address this soon after the merge

@@ -318,6 +318,7 @@ def _execute_handler_sync(self, p: BaseProcessor, job_id: str,
jfmt = 'application/json'

self.update_job(job_id, job_metadata)
self._send_failed_notification(subscriber)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is duplicated with the one below it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file has very small and unrelated changes, it ought to probably be kept untouched, as it seems to not be relevant to this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an artifact of a manual backport (there are a few) over the course of PR development and commits to master.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file has very small and unrelated changes, it ought to probably be kept untouched, as it seems to not be relevant to this PR (which makes reviewing easier)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an artifact of a manual backport (there are a few) over the course of PR development and commits to master.

pygeoapi/util.py Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file has very small and unrelated changes, it ought to probably be kept untouched, as it seems to not be relevant to this PR - for this specific case, if you want to separate between the first line of the docstring and the rest, perhaps adding a proper one sentence description would be preferable than ending the line with a colon.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor fix pushed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed/updated.

Comment on lines +63 to +69
CONFORMANCE_CLASSES = [
'http://www.opengis.net/spec/ogcapi-processes-1/1.0/conf/ogc-process-description', # noqa
'http://www.opengis.net/spec/ogcapi-processes-1/1.0/conf/core',
'http://www.opengis.net/spec/ogcapi-processes-1/1.0/conf/json',
'http://www.opengis.net/spec/ogcapi-processes-1/1.0/conf/oas30',
'http://www.opengis.net/spec/ogcapi-processes-1/1.0/conf/callback'
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that pygeoapi supports providing a custom process manager and that the manager may or may not support things like:

The list of conformance classes should be built dynamically, at runtime. IMO this means it would be more suitable to have something like a get_conformance_classes(manager: BaseManager) function rather than using a global list.

I think this approach would be useful for all api/*.py modules, as it seems more flexible to let the conformance list be generated at runtime by calling a function, as it gives us the option to check for enabled/disabled/supported/unsupported features and adapting accordingly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea; we can defer to a later PR given the scope of this PR.

@@ -68,7 +65,7 @@
THISDIR = os.path.dirname(os.path.realpath(__file__))


def get_ogc_schemas_location(server_config: dict) -> str:
def get_ogc_schemas_location(server_config) -> dict:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function's type annotation got mixed up and is now incorrect - it seemed to be correct before

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment on lines 457 to 459
sub_oas = api_module.get_oas_30(cfg, locale_)
oas['paths'].update(sub_oas['paths'])
oas['tags'].extend(sub_oas['tags'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about having each api module's get_oas_30() function return a tuple of list and dict instead? It would make it more explicit than returning a dict with tags and paths keys inside of it.

So rather than:

get_oas30() -> dict:

it would be:

get_oas30() -> tuple[list[str], dict[str, dict]]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! Done.

@@ -110,14 +118,15 @@ def schemas(path):
mimetype=get_mimetype(basename_))


# TODO: inline in execute_from_flask when all views have been refactored
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the completion of this TODO not in scope of this PR? I would think it is, but I'm not sure - I was going to ask about the reason for keeping and using get_response() in some places while using execute_from_flask() in others, making a similar point about having two similar but different functions as I did for the starlette app.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is really the same discussion as #1405 (comment)

@@ -551,6 +535,7 @@ def admin_config_resource(request: HttpRequest,
resource_id)


# TODO: remove this when all views have been refactored
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is the same type of comment as I left on both the starlette and flask modules - I'm not sure if this is in scope of this PR but it seems to me that it should be.

Leaving it like this implies that we keep some pretty similar functions around and use them in similar places, which will likely cause confusion and will introduce subtle bugs. I would be in favor of having this PR completely refactor the code so as to not use the older implementations and remove them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also related to #1405 (comment)

Copy link
Contributor

@doublebyte1 doublebyte1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@totycro +1 for this effort! 👏🏽 Hopefully, from now on, it will be easier for people to add new standards and update existing ones. I left some comments, but they are more clarifications than anything else.


def get_oas_30(cfg: dict, locale: str) -> dict:
from pygeoapi.openapi import OPENAPI_YAML, get_visible_collections

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is useful to add the signature description of this new function (in this and other files)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

CONFORMANCE_CLASSES = []


# TODO: no tests for this?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed there are no tests for STAC. Not sure if this was planned in the scope of this PR, but it would be useful anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


def get_oas_30(cfg: dict, locale: str) -> dict:
from pygeoapi.openapi import OPENAPI_YAML, get_visible_collections

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is useful to add the signature description of this new function (in this and other files)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


def get_oas_30(cfg: dict, locale: str) -> dict:
from pygeoapi.openapi import OPENAPI_YAML, get_visible_collections

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is useful to add the signature description of this new function (in this and other files)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


def get_oas_30(cfg: dict, locale_: str):
from pygeoapi.openapi import OPENAPI_YAML

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is useful to add the signature description of this new function (in this and other files)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed



def get_oas_30(cfg: dict, locale: str) -> dict:
LOGGER.debug('setting up STAC')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is useful to add the signature description of this new function (in this and other files)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


def get_oas_30(cfg: dict, locale: str) -> dict:
from pygeoapi.openapi import OPENAPI_YAML, get_visible_collections

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is useful to add the signature description of this new function (in this and other files)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

api.config['resources'][dataset]['providers'], 'feature'))
except ProviderTypeError:
try:
LOGGER.debug('Loading coverage provider')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why we are loading coverages here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most likely this has been ported from the old version of the code. I'd suggest to not do any behavioral changes in this PR as it already changes a lot of other things, but we could change this after the merge or open an issue to keep it in mind.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the result of adding .../schema support for coverages. This is now moved to pygeoapi/api/__init__.py and called accordingly from upstream.

@tomkralidis
Copy link
Member

Thanks for the feedback @ricardogsilva / @doublebyte1. I've tried to address most of the issues / comments in time for @totycro's Wednesday morning, as well as suggestion to defer other suggestions following the merge of this PR.

Note that there is an unrelated fix in eeb997c in support of passing the EDR tests based on possible data changes.

@totycro
Copy link
Contributor Author

totycro commented Apr 3, 2024

I think I addressed all of the comments by now. The way I see it, we don't have any issues really blocking the merge, and I'd prefer to merge it ASAP because many commits done on master right now will introduce new conflicts.

Copy link
Member

@ricardogsilva ricardogsilva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand it is your intention to address my remaining comments after the merge, which I would be happy with.

+1 from me, thanks for the hard work!

webb-ben and others added 3 commits April 3, 2024 06:39
* Safely serialize configuration JSON

Co-Authored-By: Tom Kralidis <tomkralidis@gmail.com>

* Revert "Safely serialize configuration JSON"

This reverts commit 36feb06.

* Add test for datetime with Admin API

* Safely serialize configuration JSON

---------

Co-authored-by: Tom Kralidis <tomkralidis@gmail.com>
pygeoapi/api/__init__.py Outdated Show resolved Hide resolved
pygeoapi/openapi.py Outdated Show resolved Hide resolved
pygeoapi/openapi.py Show resolved Hide resolved
@tomkralidis tomkralidis merged commit 35bdcb6 into geopython:master Apr 5, 2024
4 checks passed
@tomkralidis tomkralidis deleted the split-api-py branch April 5, 2024 10:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants