Skip to content

Code development workflow

Jorge Samuel Mendes de Jesus edited this page Dec 21, 2020 · 17 revisions

TL;DR

NOT SO FAST !!! First scroll and see if there is something unknown to you and then check the TL;RD at the end...

Code development workflow

Code workflow is the sequential steps and procedures necessary for code development, testing and implementation.

pygeoapi contributions guidelines and instruction on how to submit tickets are found on this page CONTRIBUTING.md.

1. Forking

Developers work on their own (project) forks, this is a personal sandbox where code is developed and implemented by its author. With time, code on main project and fork will start to divert, since code from other branches and forks gets merged into the project. It is rather important for code from the project to be constantly synced into the fork and working branch

Check the github tutorial on how fork and sync: fork-a-repo

pygeoapi-master ---FORK---> pygeoapi-user001-master     

2. Issues and branches

Github issues should be related to bugs, new feature requests, blue sky research etc. For bug reporting please follow the guideline what to put in your bug report

code development should be oriented in such a way that it solves (or deals with) one issue only. Issues tend to be associated with branches, and code commits go into that specific branch. This also facilitates the Pull Request reviewing process.

pygeoapi-master ---FORK---> pygeoapi-user001-master 
                                                \----------- pygeoapi-user001-issue4456

Don't forget to sync/merge the main pygeoapi-master into your fork's master and merge (or rebase) master into branch version-control-branching. This requires that you first configure a remote for a fork, indicating the upstream location of the main code:

git remote add upstream https://github.com/geopython/pygeoapi.git

3. Code development

A good programmer is the one that writes clear and easy to understand code based on well established guidelines, not the one that writes smart code.

3.1 PEP8 - Python code style

pygeoapi follows the PEP 8 — the Style Guide for Python Code and Python naming conventions, in a nutshell:

  • snake_case for variables
  • lower case for modules and packages
  • upper case for CONSTANTS
  • UpperCaseCamel for classes
  • CAPITALS for constants
  • Methods can also be protected with _ or private __
  • Variable name collision is avoid by adding an extra _ e.g Use csv_ instead of csv`
  • English words only, with proper description of functionality and/or content.
  • Follow OGC standard names (See: 4.1 pygeoapi API)

3.2 Understandable code

PEP8 style convention helps on readability, but code should also be understandable. This can be achieved by simple English variables, good comments, and consistency.

Hoe to write code that everyone can read

code quality

Source: https://xkcd.com/1513/ and Geo-python

3.4 Code documentation

Documentation is what makes or breaks a project, thou shall not say: "The code is already explanatory". If you wrote readable code it is already explanatory BUT you still have to indicate what it does, how it can made run and more important the inputs/outputs as python is a loosely type language.** Any pull request without proper code docstring is automatically rejected**

3.4.1 Docstrings and reStructuredText

pygeoapi uses python docstring and reStructuredText

A good introduction to pydocs can be found in the links below:

Every single method/function/class should be documented using docstrings following reStructuredText (reST) syntax, for example:

class RasterioProvider(BaseProvider):
    """Rasterio Provider"""

    def __init__(self, provider_def):
        """
        Initialize object
        :param provider_def: provider definition
        :returns: pygeoapi.provider.rasterio_.RasterioProvider
        """

Python packages should also have basic description on their __init__.py file e.g:

"""OGC process package, each process is an independent module"""

Note: Type hints are not yet supported

3.4.2 Read the docs

Read the docs is a very popular documentation platform and used for pygeoapi documentation. If pydocs are properly written, read the docs (RTD) will automatically build the content, this step should be done before a pull request.

RDT code is located on folder docs, with folder organization and file content defined as a python generic RDT project, before you proceed please read "How to set up your python projec docs for success (https://towardsdatascience.com/how-to-set-up-your-python-project-docs-for-success-aab613f79626)", to have an idea how things work.

pygeoapi RDT content is on folder pygeoapi/docs/source. *.rst files are the sources where documentation should be written/updated

Table of Contents (TOC), is defined on [index.rst](https://raw.githubusercontent.com/geopython/pygeoapi/master/docs/source/index.rst)

.. _index:

.. image:: /_static/pygeoapi-logo.png
   :scale: 50%
   :alt: pygeoapi logo

pygeoapi |release| documentation
==================================

:Author: the pygeoapi team
:Contact: pygeoapi at lists.osgeo.org
:Release: |release|
:Date: |today|

.. toctree::
   :maxdepth: 4
   :caption: Table of Contents
   :name: toc

   introduction
   how-pygeoapi-works

The TOC names will the point to the individual *rst

3.4.2 API documentation

Remember pydocs and code comments ?! RTD will automatically pull the content from pygeoapi code and build the API documentation, **it is important that new packages and modules are added to openapi.rst **. For example a package, module/class and then class would have the following syntax

Provider
--------

.. automodule:: pygeoapi.provider
   :show-inheritance:
   :members:
   :private-members:
   :special-members:


Base class
^^^^^^^^^^

.. automodule:: pygeoapi.provider.base
   :show-inheritance:
   :members:
   :private-members:
   :special-members:


CSV provider
^^^^^^^^^^^^

.. automodule:: pygeoapi.provider.csv_
   :show-inheritance:
   :members:
   :private-members:

3.4.3 Building documentation (local)

On folder pygeoapi/docs:

#make help
make html
::
Running Sphinx v3.0.1
loading pickled environment... done
:: 
The HTML pages are in build/html

and documentation in available on build/html as read the docs that can viewed on a browser: firefox build/html/index.html

3.4.4 Personal RTD on github (automatic build)

Github has very good support for RDT and you can use even use it on your personal repository on the issue that you are working on.

First, you need to create an account on read the docs Sign up. You can (and should) use authentication using your github account, the following steps assume that you used your github account.

Second, connect your read the docs account to github on admin > control panel, Connected services > Connect to Github

RTD connect services

This will allow you to choose the repository and branch from where RDT will import the documents and build them.

Confirmation of service connection

If the process was successful it should not be necessary to preconfigure the webhooks.

Third, on the dashboard (Profile drop down > My projects) click Import a project

Import a project

And refresh for sync between RTD and Github. You should be able see your private pygeoapi project (<username/pygeoapi>), just add it

As default RTD will build docs from master, it is expected for you to work on your fork in a specific branch (see: Issues and branches), therefore RTD should be set to use the branch.

On the project details, give a name related to the issue that you are working on e.g pygeoapi-532 (this will then part of the public URL), and tick Edit advanced project options

Project details

On the advance options, type the name of working/default branch, and select Python as programming language

Project Extra Details

Finally, on the project page click on Build project and enjoy the automation project, in a few minutes you will have your documenation online :), for this example it will be something like: https://pygeoapi-532.readthedocs.io

Note: Every time you push to the default branch RTD will update the online documentation.


4. pygeoapi code structure

pygeoapi code uses or implements:

  • an API first approach that is wrapped by a web framework (Flask or Starlette),
  • Object oriented template pattern
  • Plugins
  • EAFP (it’s easier to ask for forgiveness than permission)
  • Prefer DRY (Don't Repeat Yourself) but when necessary WET (Write Everything Twice)

4.1 pygeoapi API

The API structure is defined on pygeoapi/apy.py module and class API, this is the projects's core. The method naming in class API is no coincidence, it follow OGC API names and definitions, for example, in OGC Features we have an endpoint defined as:

GET /collections     

This REST end point describes the collections available, the associated method is:

    def describe_collections(self, headers_, format_, dataset=None):

<VERB>_<OBJECT> is the standard terminology.

4.2 Web-frameworks

Web-frameworks libraries are responsible for:

  • HTTP requests/responses
  • URL routing
  • Configuration loading

REST end points defined by the OGC standards (see here for example) are supported by the web-framework, with its communities approaches, philosophies and perks.

Currently the are two web-frameworks supported

pygeoapi project tends to use Flask as the default web-framework. As guideline, the function name convention should be identical (or very close) to the HTTP request route e.g:

@BLUEPRINT.route('/openapi')
def openapi():
    """
    OpenAPI endpoint
    :returns: HTTP response
    """
    with open(os.environ.get('PYGEOAPI_OPENAPI'), encoding='utf8') as ff:
        openapi = yaml_load(ff)

4.3 Object oriented template pattern

pygeoapi code is object oriented (classes), and implements a template method pattern Wikipedia: template method pattern. Template method pattern is normally used on code base that implement multiple components that have an overlap functionality, behavior or properties.

The provider package contains the following modules:

.
├── __init__.py
├── base.py
├── elasticsearch_.py
├── geojson.py
:

base.py module contains a parent classes that will be used on the specific data provider modules (e.g geojson.py).

#base.py
class BaseProvider:
    """generic Provider ABC"""

    def __init__(self, provider_def):
    :
    
    def get_fields(self):
        raise NotImplementedError()
    def write(self, options={}, data=None):
        raise NotImplementedError()

class BaseProvider is the template that creates the specific classes for each different data provider, this template contains all methods necessary.

You can see the base class being extended on module geojson.py

from pygeoapi.provider.base import BaseProvider

class GeoJSONProvider(BaseProvider):
    """Provider class backed by local GeoJSON files
    :
    def get_fields(self):
         if os.path.exists(self.data):
            with open(self.data) as src:
                data = json.loads(src.read())
            fields = {}
            for f in data['features'][0]['properties'].keys():
                fields[f] = 'string'
            return fields

Checking class GeoJSONProvider there isn't a write method, if pygeoapi tries to call method write it will end up in the base class and triggering a raise NotImplementedError() that will be properly addressed by pygeoapi API.

This is the pygeoapi code approach, base classes defining precisely what it is expected and avoiding duplication.

Doubts!? Check these links:

4.4 Plugins

Currently pygeoapi supports the following plugins:

  • provider (data provider loading)
  • formatter (export formats loading)
  • process (available processes)

plugin functionality is called in the api.py and only the necessary plugins will be loaded on bases of the configuration yaml file, for example :

#api.py
from pygeoapi.plugin import load_plugin
p = load_plugin('provider', get_provider_by_type(
                            collections[k]['providers'], 'feature'))

And then up for the code on api.py to implement it.

Plugin code location is on module plugin.py (of course), and it is basically a class loader for other modules.

#: formatters and processes available
PLUGINS = {
    'provider': {
        'CSV': 'pygeoapi.provider.csv_.CSVProvider',
        'Elasticsearch': 'pygeoapi.provider.elasticsearch_.ElasticsearchProvider',  # noqa
        'GeoJSON': 'pygeoapi.provider.geojson.GeoJSONProvider',
        'OGR': 'pygeoapi.provider.ogr.OGRProvider',
:  
:
 },
    'formatter': {
        'CSV': 'pygeoapi.formatter.csv_.CSVFormatter'
    }
}

The plugins structure is PLUGINS->(MODULE)->(TYPE=>CODE_LOCATION)

Read the pygeoapi docs on plugins everything for full detail explanation

4.5 EAFP versus LBLYL (Exceptions usage)

Python language is oriented to EAFP (it's Easier to Ask for Forgi1veness that Permission) instead of LBLYL (Look Before You Jump), this basically drills down to use of exceptions on code.

EAFP states that you should try something and if it fails to deal with the error:

#pygeoapi.provider.rasterio_.RasterioProvider
import rasterio
from pygeoapi.provider.base import ProviderConnectionError
 try:
     self._data = rasterio.open(self.data)
:
 except Exception as err:
            LOGGER.warning(err)
            raise ProviderConnectionError(err)    

In the above example the code tries to open the data source and if it this raises an error it will catch the exception, logs it and re-raise the exception as a ProviderConnectionError, there was a problem and we asked forgiveness on the exception code section.

Using a LBLYL approach the could would be:

import os
if os.access(self.data, os.R_OK):
   self._data = rasterio.open(self.data) 
else:
    LOGGER.warning(err) 
    raise ProviderConnectionError(err)

Python's benevolent dictator for life disagrees with the motivation (you can read it here), but Explicit is better than implicit and Don't Repeat Yourself (DRY)

For more info on EAFP versus LBLYL:

4.6 DRY versus WET

Object oriented template pattern (class abstraction), plugins and EAFP are used to prevent repetition of code implementation, aka Don't Repeat Yourself. Orthogonality (in computer science definition) we have WET (Write Everything Twice) that promotes code duplication when ever necessary or efficient

pygeoapi is required to be packed and implemented on systems that have dependency/package distribution limitations this also forces WET implementation on code base. For example the code below is a good example of WET that is being implemented, since we are only reliable on the datatime library and not taking advantage of pendulum

#api.py
 if te['begin'] is not None and datetime_begin != '..':
                if datetime_begin < te['begin']:
                    datetime_invalid = True

At the end of the day, DRY and WET should be implemented side by side and are not complete opposites...but let keep the code dry and lean.

More on WET/DRY reasoning:

XKCD code quality strip


5. code/functionality testing

pygeoapi project implements test driven development (TDD), on the development workflow try to first write a test unit that fails, write the pygeoapi code and keep on going until the test unit passes. A very detailed TDD workflow development can be found here,and extra advantages

code testing is done local on the user computer and then later also on the CI/CD pipeline. Please take it seriously, you will be surprised where code can break.....

5.1 Code testing - local

pygeopapi uses pytest for unit testing based on the pygeoapi testing documentation

Tests are on folder /tests and each python module (*.py) bundles several tests based on global functionality or system, root folder contains the pytest.ini that env variables.

New code should have new unit tests and pytest should be run locally to determine that things are OK, for example:

python -m pytest tests/test_api.py

test code is grouped into modules with the name convention: test_<SYSTEM>_*, with dummy or specific config files on /test. Supporting the test code we have multiple datasets with share-friendly licences on subfolder test/data. If your test require extra data please add it but always small datasets.

Having properly pytest is the first step to determine if developed code can properly integrate pygeoapi and accepted in a pull request


6. Flake8 - Check code style

Flake8 is a code style checker, it will check for several of PEP8 requirements. You can do it file by file or just in bulk:

find . -type f -name "*.py" | xargs flake8

All code for PR has to be clean. Exceptionally inline ignoring errors can be added to the code See inline ignoring errors


7. CI/CD pipeline

pygeoapi implements travis as CI/CD, as good practice it is recommended that your working branch uses travis to build pygeoapi on every push you do, and then later on the pull request. Building of pygeoapi is a second level test/integration after local code testing. To start running travis:

  • Sign in or up on https://travis-ci.org/ with your github account. Give permission to access github
  • Click on your profiles picture > settings
  • Sync your repositories on Left panel > sync account, You should see all your personal github repositories
  • Activate pygeoapi, this will send you to build dashboard.
  • Everyime you push to your working branch and/or master Travis will build pygeoapi and hopefully you will get a green screen like this

Travis dashboard

Travis configuration/implementation is defined on .travis.yml (at root level). There we have definition of python versions to use on testing, docker images to use advance data systems like ElasticSearch, code quality checks etc etc. If you are just doing small code changes on pygeoapi likely you will not need to change anything on the file. If you are writing a full data provider implementation for database XPTO, then submit a new .travis.yml on your pull request.

Check the following tutorials:

https://imgs.xkcd.com/comics/automation.png


8. TL;DR

  • Join us on gitter
  • Fork pygeoapi into your private repository
  • Pick or create an issue number
  • Create a branch on your fork with a naming convention related to working issue
  • Create pytest and code
  • Check if tests pass, flake8 everything
  • Write the docs
  • Make a pull request
  • Update content and code based on review

Yes, it the TL;DR is more or the the contributing guidelines CONTRIBUTING.md

Clone this wiki locally