Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decorators to make processing scripts easier #134

Open
NathanW2 opened this Issue Nov 26, 2018 · 29 comments

Comments

Projects
None yet
9 participants
@NathanW2
Copy link
Member

NathanW2 commented Nov 26, 2018

QGIS Enhancement: Decorators to make processing scripts easier

Date 2018/11/26

Author Nathan Woodrow (@NathanW2 )

Contact woodrow.nathan at gmail dot com

maintainer @NathanW2

Version QGIS 3.6

Summary

While preparing a workshop to cover creating a custom processing script it became evident that there was a lot of code just to get a script written when a lot of it was boilerplate that could be avoided.

Most of the meat of the script is inside the initAlgorithm and processAlgorithm methods, however, there is a lot of code around these methods to just get them to work correctly and people get lost easy.

The overall goal is a full custom script on a single slide to make it easy to demo.

Proposed Solution

New Python decorators that will streamline the process of creating a custom script method to make it easier to read and understand what is going on. Decorators are also more native to Python and fit this use case well.

Example(s)

from qgis.processing import alg

@alg("test", "Test script", group="workshop", group_label="Workshop")
@alg.input(type=alg.FEATURE_SOURCE, name="INLAYER", label="Input layer")
@alg.input(type=int, name="IN1", label="Distance", parent="INLAYER")
@alg.output(type=str, name="OUT2", label="Output")
def my_alg(instance, parameters, context, feedback):
    return {"OUT2": "test"}

Example of multi outputs:

from qgis.processing import alg

@alg("test", "Test script", group="workshop", group_label="Workshop")
@alg.input(type=alg.FEATURE_SOURCE, name="INLAYER", label="Input layer")
@alg.input(type=int, name="IN1", label="Distance", parent="INLAYER")
@alg.output(type=str, name="OUT2", label="Output")
@alg.output(type=str, name="OUT3", label="Output 3")
def my_alg(instance, parameters, context, feedback):
    return {"OUT2": "test", "OUT3": 'test'}

The above decorators will create a QgsProcessingAlgorithm instance ready to use. This will be used in the script editor and added the providers like normal.

@alg("test", "Test script", group="Workshop", groupid="workshop")

Defines a new QgsProcessingAlgorithm

@alg.input(type=alg.FEATURE_SOURCE, name="INLAYER", label="Input layer")

Each input will take the type as the first arg which is translated into the correct QgsProcessingParameter* type when called. Overall goal here is to make it easy to understand for new users and feel more Python and less C++

@alg.output(type=str, name=alg.NAMES.OUT2, label="Output")

Defines an output type. After discussions with @nyalldawson we think it's best to have at least one out always defined to avoid fully black box algorithms if none is set. If a output isn't defined it will raise an exception.

Define parent inputs using: parent="INLAYER" this will check against all inputs to make sure it has already been defined.

Keyword arguments

All args and keyword arguments will be passed down though into the correct QgsProcessingParameterDefinition meaning @alg.input can accept the same keywords as QgsProcessingParameterDefinition for that type

Note: The following args are changed to match the C++ args inside the wrapper:

label -> description
default -> defaultValue

Just to make it easier to use.

Example:

QgsProcessingParameterNumber (const QString &name, const QString &description=QString(), Type type=Integer, const QVariant &defaultValue=QVariant(), bool optional=false, double minValue=std::numeric_limits< double >::lowest()+1, double maxValue=std::numeric_limits< double >::max())

will work like the following

@alg.input(type=int, name="test", label="test", default=100, optional=True, minValue=0, maxValue=999)

Error handling

The decorators will raise exceptions for a range of different issue to help users debug issues.

Some example exceptions:

qgis.processing.ProcessingAlgFactoryException: No input named NOOP defined
qgis.processing.ProcessingAlgFactoryException: Input INLAYER already defined
qgis.processing.ProcessingAlgFactoryException: No outputs defined for 'test' alg. At least one is required. Use @alg.output to set one.
qgis.processing.ProcessingAlgFactoryException: Input IN1 can't depend on itself. We know QGIS is smart but it's not that smart
qgis.processing.ProcessingAlgFactoryException: @alg.define() already called.

Affected Files

New processing module in python\qgis folder for processing so we can do

from qgis.processing import alg

and expose everything through there as stable and public API. At the moment you have to import the processing plugin code which really should be internal only.

New AlgWrapper and ProcessingAlgFactory which creates and manages the current instance being created. These can only be created on the main thread so no risk of race conditions here hopefully.

The basic shape of ProcessingAlgFactory is this however this will change as the code evolves. The example below is just how the core of it works:

class ProcessingAlgFactory(object):
    STRING = str,
    INT = int,
    NUMBER = float,
    SINK = "SINK",
    FEATURE_SOURCE = "FEATURESOURCE"

    def __init__(self):
        self._current = None

    def __call__(self, *args, **kwargs):
        return self.define(*args, **kwargs)

    @property
    def NAMES(self):
        return self.current.NAMES

    @property
    def current(self):
        return self._current

    @property
    def current_defined(self):
        return self._current is not None

    def _initnew(self):
        if self.current_defined:
            raise ProcessingAlgFactoryException("@alg() called twice on the same function.")

        self._current = AlgWarpper()

    def _pop(self):
        instances[self.current.name()] = self.current
        self._current = None

    def define(self, *args, **kwargs):
        self._initnew()
        self.current.define(*args, **kwargs)

        def dec(f):
            self.current.end()
            self.current.set_func(f)
            self._pop()
            return f

        return dec

    def output(self, *args, **kwargs):
        def dec(f):
            return f

        self.current.add_output(*args, **kwargs)
        return dec

    def input(self, *args, **kwargs):
        def dec(f):
            return f

        self.current.add_input(*args, **kwargs)
        return dec

define() is the main entry point to create a new instance of the wrapper object, which is also part of __call__ in order to allow this:

@alg("test", "Test script", group="workshop", group_label="Workshop")

The inner function of define() will be called at the end so we use that to calll the final methods on the wrapper and make sure we have all the input and outputs defined correctly.

  • ScriptUtils.py will be changed to support reading the created instance from ProcessingAlgFactory

Examples

from qgis.processing import alg
from qgis.core import QgsFeature, QgsFeatureSink

@alg(name="split_lines_new_style", label=alg.tr("Split lines at a given length"), group="examplescripts", group_label=alg.tr("Example Scripts"), icon=r"C:\temp\flame.png")
@alg.input(type=alg.SOURCE, name="INPUT", label="Input layer")
@alg.input(type=alg.DISTANCE, name="DISTANCE", label="Distance", default=30)
@alg.input(type=alg.SINK, name="OUTPUT", label="Output layer")
@alg.output(type=str, name="DISTANCE_OUT", label="Distance out")
def testalg(instance, parameters, context, feedback, inputs):
    """
    Given a distance will split a line layer into segments of the distance
    """
    source = instance.parameterAsSource(parameters, "INPUT", context )
    distance = instance.parameterAsInt(parameters,"DISTANCE", context)

    if source is None:
        raise QgsProcessingException(instance.invalidSourceError(parameters, "INPUT"))

    (sink, dest_id) = instance.parameterAsSink(parameters, "OUTPUT", context,
                                              source.fields(),
                                              source.wkbType(),
                                              source.sourceCrs()
                                          )

    if sink is None:
        raise QgsProcessingException(instance.invalidSinkError(parameters, "OUTPUT"))

    total = 100.0 / source.featureCount() if source.featureCount() else 0
    features = source.getFeatures()
    for current, feature in enumerate(features):
        if feedback.isCanceled():
            break
        geom = feature.geometry()
        for part in geom.parts():
            start = 0
            end = distance
            length = part.length()
            while start < length:
                if feedback.isCanceled():
                    break
                out_feature = QgsFeature(feature)
                out_feature.setGeometry(part.curveSubstring(start,end))
                sink.addFeature(out_feature, QgsFeatureSink.FastInsert)
                start += distance
                end += distance

        feedback.setProgress(int(current * total))

    return {"OUTPUT": dest_id, "DISTANCE_OUT": distance}

Performance Implications

(required if known at design time)

Further Considerations/Improvements

(optional)

Backwards Compatibility

No compatibility issues as only a wrapper over normal logic.

Issue Tracking ID(s)

qgis/QGIS#8586

Votes

(required)

@NathanW2

This comment has been minimized.

Copy link
Member Author

NathanW2 commented Nov 26, 2018

@nyalldawson early QEP for this as I flesh out more details

ping @anitagraser @luipir @wonder-sk @m-kuhn and anyone else who is keen.

@nyalldawson

This comment has been minimized.

Copy link

nyalldawson commented Nov 26, 2018

Looks beautiful to me!

@timlinux

This comment has been minimized.

Copy link
Member

timlinux commented Nov 27, 2018

Nice work, great idea!

@NathanW2

This comment has been minimized.

Copy link
Member Author

NathanW2 commented Nov 27, 2018

@nyalldawson how do you feel about this

@alg(name="test", label="Test script", group="Workshop", groupid="workshop")
@alg.parm(type=int, name="in1", label="in int 2")
@alg.parm(type=str, name="in3", label="in int 2")
@alg.parm(type=alg.DISTANCE, name="in4", label="in int 2")
@alg.parm(type=str, name="OUT1", label="Output")
def my_alg(instance, parameters, context, feedback):
    return {"OUT1": "WAT"}

I wonder if we shouldn't use input and output as by default they are always both anyway and it might be confusing to the reader when you do this:

@alg("test", "Test script", group="Workshop", groupid="workshop")
@alg.input(type=alg.FEATURE_SOURCE, name="INLAYER", label="Input layer")
@alg.input(type=int, name="IN1", label="Distance", parent="INLAYER")
@alg.output(type=str, name="OUT2", label="Output")
def my_alg(instance, parameters, context, feedback):
    return {
         "IN1": "test",
         "OUT2"": 'test2'
          }
@NathanW2

This comment has been minimized.

Copy link
Member Author

NathanW2 commented Nov 27, 2018

Looking at it though I think I do still like the input and output methods as it makes it super clear what the expected output is. You can still return an input if needed but I'm thinking the clearer API makes more sense here?

Feelings on that?

@nyalldawson

This comment has been minimized.

Copy link

nyalldawson commented Nov 27, 2018

We definitely need the distinction here -- otherwise it's impossible e.g to differentiate a string parameter which a user has to enter from a string value generated by the algorithm itself.

@NathanW2

This comment has been minimized.

Copy link
Member Author

NathanW2 commented Nov 27, 2018

@luipir

This comment has been minimized.

Copy link

luipir commented Nov 27, 2018

what would be the expected values for Type? would be the value from https://qgis.org/api/classQgsProcessingParameterDefinition.html#a9f99ab59bf4bcc1ff7cc8fe389cd6721 ?

@NathanW2

This comment has been minimized.

Copy link
Member Author

NathanW2 commented Nov 27, 2018

@luipir yeah so far is done like this

class ProcessingAlgFactory():
    STRING = "STRING",
    INT = "INT",
    NUMBER = "NUMBER",
    DISTANCE = "DISTANCE",
    SINK = "SINK"
    SOURCE = "SOURCE"

    typemapping = {
        str: make_string,
        int: partial(make_number, type=QgsProcessingParameterNumber.Integer),
        float: partial(make_number, type=QgsProcessingParameterNumber.Double),
        NUMBER: partial(make_number, type=QgsProcessingParameterNumber.Double),
        INT: partial(make_number, type=QgsProcessingParameterNumber.Integer),
        STRING: make_string,
        DISTANCE: make_distance,
        SINK: make_sink,
        SOURCE: make_source
    }

@NathanW2 NathanW2 closed this Nov 27, 2018

@NathanW2

This comment has been minimized.

Copy link
Member Author

NathanW2 commented Nov 27, 2018

with those make functions looking like this:

def make_number(**args):
    return QgsProcessingParameterNumber(**args)

etc

@NathanW2 NathanW2 reopened this Nov 27, 2018

@rduivenvoorde

This comment has been minimized.

Copy link
Contributor

rduivenvoorde commented Nov 28, 2018

@NathanW2 good idea! As someone which has not created an algo for some time:

First param (of @alg) is id and second comment? Would it be an idea to also add (optional, eg in examples) keywords for those? Just to make it superclear for newbies? Or is that too much?

Should we also (force to?) add a description? Both to algo itself as as part of input/outputs (preferably translatable off course ;-)?

Thanks Nathan for this already!

@NathanW2

This comment has been minimized.

Copy link
Member Author

NathanW2 commented Dec 3, 2018

@rduivenvoorde Yep I will update some of the examples to show what is possible and will throw exceptions if there is no description set.

@NathanW2

This comment has been minimized.

Copy link
Member Author

NathanW2 commented Dec 3, 2018

Implemented in: qgis/QGIS#8586

@NathanW2

This comment has been minimized.

Copy link
Member Author

NathanW2 commented Dec 3, 2018

As part of implementing the changes the workshop script goes from 115 lines down to 50 using this new method. Including the changes from @nyalldawson to loop over parts easier.

@giohappy

This comment has been minimized.

Copy link

giohappy commented Dec 3, 2018

thx @NathanW2. It resembles a lot the way params were defined in QGIS2, but now it's true, standard, code.

Have you ever considered the "builder" pattern (maybe with chaining)?

@NathanW2

This comment has been minimized.

Copy link
Member Author

NathanW2 commented Dec 3, 2018

@giohappy

This comment has been minimized.

Copy link

giohappy commented Dec 3, 2018

something like:

params= alg.ParamsBuilder()
params.addInput(type=alg.FEATURE_SOURCE, name="INLAYER", label="Input layer")
params.addOutput(type=alg.SINK, name="OUTPUT", label="Output layer")
params.build()

@alg(name="test", label="Test script", group="Workshop", groupid="workshop",parameters=params)
def my_alg(instance, parameters, context, feedback):
    return {"OUTPUT": dest_id}

and if you return the builder instance on each method you can have chaining:

params= alg.ParamsBuilder()
params.addInput(type=alg.FEATURE_SOURCE, name="INLAYER", label="Input layer")
             .addOutput(type=alg.SINK, name="OUTPUT", label="Output layer")

(....)

it seems cleaner to me then that long list of decorators, and the validation can be done at the builder build() stage.

@NathanW2

This comment has been minimized.

Copy link
Member Author

NathanW2 commented Dec 3, 2018

@luipir

This comment has been minimized.

Copy link

luipir commented Dec 3, 2018

The only advantage I see to use the Builder pattern instead of the Decorator one (that is also a GoF pattern) respect Builder one is that params constructor can be inherited... IMHO in 99% of cases this feature is not used,. The only code I saw that can use this feature I found in LAStool processing provider to specify base generic parameter, but I suspect that the introduction of this pattern wouldn't trigger a refactoring of providers.
@giohappy are you planning to develop complex providers that would have advantage of the proposed pattern?

I'm for +0

@NathanW2

This comment has been minimized.

Copy link
Member Author

NathanW2 commented Dec 3, 2018

@giohappy

This comment has been minimized.

Copy link

giohappy commented Dec 3, 2018

ok @NathanW2 your last comment convinced me 😉
@luipir I don't have any plan right now but that's one of the reasons I generally prefer builders. Anyway it can be extended in the future or the full class can be used...

@volaya

This comment has been minimized.

Copy link

volaya commented Jan 14, 2019

Nice!

Two minor comments (sorry for arriving late at this):

-Why an exception if an output is not defined? That is still a valid algorithm...
-Do you plan on having the corresponding decorators for algorithms that are feature-based? (so they create an algorithm that extends QgisFeatureBasedAlgorithm)

Awesome work!

@nyalldawson

This comment has been minimized.

Copy link

nyalldawson commented Jan 14, 2019

@volaya I discussed this with @NathanW2 prior to merge -- but the thinking here is that EVERY algorithm should have at least a single output. Even if the algorithm doesn't create a value, it still should be outputting something -- e.g. if it's a "delete file" algorithm, it should output the name of the file deleted, or whether the file was successfully deleted. I can't think of any algorithms which shouldn't have outputs.

@nyalldawson

This comment has been minimized.

Copy link

nyalldawson commented Jan 14, 2019

@volaya - oops, sent early. Was going to also say that we really want to encourage developers to add as MANY outputs as possible to algorithms, because the more outputs are available, the more powerful expressions can become which utilise these output values.

@havatv

This comment has been minimized.

Copy link

havatv commented Jan 29, 2019

To be able to create Python scripts in a simple way has been a great asset for users of QGIS 2. Without this wrapper, one has to extend QgsProcessingAlgorithm, and that will be a barrier for most of our users. It would therefore be very convenient to have this included in the LTR (3.4). Is that possible?

@NathanW2

This comment has been minimized.

Copy link
Member Author

NathanW2 commented Jan 29, 2019

Any objections to me backporting this into 3.4 LTR next Monday?

cc @nyalldawson @wonder-sk @nirvn @luipir @volaya

@nyalldawson

This comment has been minimized.

Copy link

nyalldawson commented Jan 29, 2019

@NathanW2 Yes - I'd like to delay this for at least one cycle so that we've got time to cement the API before making it widely available.

E.g. we ideally need a solution to allow use of these simpler algorithms within a custom provider instead of always being inside the script provider. I think there may also be an issue with forcing outputs and not detecting that some inputs also automatically create outputs (e.g. a sink input should be sufficient on its own). But I'll keep investigating that one...

I'd rather take it slow and make the API rock solid first.

@NathanW2

This comment has been minimized.

Copy link
Member Author

NathanW2 commented Jan 29, 2019

@anitagraser

This comment has been minimized.

Copy link
Member

anitagraser commented Mar 2, 2019

Is the exception on missing output parameter actually implemented now? I've stripped the above example down to the bare minimum template (without output, to mimic the template I had before: https://anitagraser.com/2019/03/02/easy-processing-scripts-comeback-in-qgis-3-6/) and don't see an exception anywhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.