Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yaml parameters #387

Closed
wants to merge 25 commits into from
Closed

Yaml parameters #387

wants to merge 25 commits into from

Conversation

laem
Copy link

@laem laem commented May 25, 2016

This PR is not ready (review the code at your own risk 😁 ), but exists to expose and discuss the new functionnality

The problem and possible solutions were exposed here. In short : how to throw preprocessing.py away and make the prélèvements sociaux parameters more readable.

Following is a list of proposed features.

YAML parameters

  • does not force escaping < in strings
  • is less verbose and cleaner than XML (using the compact JSON like syntax for leaves if needed)
  • is less scary for non coders, feels more like natural language (don't quote everything...) @benjello can you confirm this ?
  • easier to parse (not certain about this one)

No hierarchy

You can no longer write hierarchical parameters. Hierarchy is heavily used in the prélèvements sociaux domain in openfisca-france--leading to an awful preprocessing.py, hindering programmatic and non-initiated human intervention.

The new VAR node type can replace hierarchy in some cases.

Conditions in parameters (VAR)

Lots of formulas are just operating a switch on some trivial condition towards alternative parameters, e.g. :

class participation_effort_construction

if (effectif_entreprise > 20)
   parameters.participation_effort_construction_plus_de_20 * assiette
else 
   parameters.participation_effort_construction * assiette

This is confusing, it introduces linking-names that are just rules stringified, and prevents contributors from modifying parameters with certainty without diving into python-numpy code.

The proposed syntax is :

- variable: fillon_taux_max
  description: Taux maximum de l'allègement général
  VAR:
    - condition: "20 <= effectif_entreprise"
      VALUES:
        2018-01-01: .2850
        2017-01-01: .2850
    - condition: "effectif_entreprise < 20"
      VALUES:
        2018-01-01: .2810
        2017-01-01: .2810

Condition nodes are just an eval of numpy arrays, and should probably be constrained to a set of possibilities (using regexps ?) to inform the parameter contributor about what's allowed. Nevertheless, I favor good documentation over raising errors to inform.
See variable_parameters.yaml for more examples.

New parameter models

The prélèvements sociaux parameters are making a heavy use of BAREME parameters, even for simpler "cotisations" that only have one tranche.
Hence the new LIN parameter model, implementing a rate*base transformation, possibly with a threshold (it really is a shortcut for a BAREME with a single tranche).

I really think that variables sharing similar computations should lead to the creation of new parameter models.

  • It makes code declarative
  • It's more readable (obscure intermediate variables are gone)
  • Instead of serving just participation_effort_construction: 5.67, this could enable more detailed responses :
participation_effort_construction: 5.67
details: 
  rate: .008
  base: 2300

Short date notation

  • Introduce a sweeter syntax for writing the temporal collection of values.
  • Make the fuzzy implicit : requesting a date more recent than the last parameter will return the last value. If this behaviour does not fit your needs, the old syntax (raising an error in this case) is still available.

XML

<VALUE deb="2016-01-01" fuzzy="true" valeur="9.67" />
<VALUE deb="2015-01-01" fin="2015-12-31" valeur="9.61" />

YAML

VALUES:
  - {deb: 2016-01-01, fuzzy: true,     valeur: 9.67}
  - {deb: 2015-01-01, fin: 2015-12-31, valeur: 9.61}

Short YAML

VALUES:
  2016-01-01: 9.67
  2015-01-01: 9.61

Extensions

YAML parameters (especially without hierarchy) should make it easier to use extensions (no NODE.children pollution). @cbenz can you confirm ?


Next steps

  • Most of this is implemented in this branch, but running core's make test should raise errors.
    The verification of parameters in the parsing step is also way less developed than currently.
  • Introduce a sub class TemplateVariable where the arguments of get_parameter are given as class properties (or computed automatically)
  • I'll try and convert the prélèvements sociaux parameters to YAML, and this will certainly shed light on some limitations of these new approaches
  • Use these parameter models to explicit the computation of a variable in the API response, as illustrated above. This lets user interfaces show these details to the (advanced) user (-> more insight, more trust, more precise error reporting).
  • Constraint VAR conditions to a set of possibilities ?

And all the TODOs in the code.

Questions

  • No possible, hierarchy prevents you from grouping simple parameters of a variable (e.g. two different rates). The VAR node may not be sufficient.
  • If the conversion of prélèvements sociaux to YAML is convincing, this could lead to XML and YAML parameters coexisting at least for a while (or more, if all of this turns out to be useseless for other legislation domains). I'm not sure a better path exists for moving software with such a wide scope.
  • Formulas could evolve into two categories : declarative-yaml ones like LIN, and ad-hoc code. It looks to me like an inevitable and virtuous path for it could enable non-coders to read and update these simpler formulas.

@benjello
Copy link
Member

Very good job @laem !

A few comments though:

  • I confirm that YAML code is more readable.
  • I like the the way the condition is expressed. Implementing it may raise some problems. Should the name appearing in the condition string be a variable of the tax benefit system or just a temporary variable appearing in the formula. I do not have any complicated example in mind but it can be tricky.
  • I am concerned about the hierarchy since it is not avoidable to think at the legislation as a tree (when coding the formulas). This can be implemented using directories and subdirectories (for example the one resulting from the parsing of the barèmes IPP) and keep some hierarchy in the YAML files. A rule should be found.
  • I do not understand the coding implications of using YAML files instead of XML so @cbenz (@eraviart ?) feel free to comment.
  • The scripts translating IPP YAML files to XML parameters should be rewritten but that should be easy.

Looking forward to see the implementation in openfisca-france ;-)

@laem
Copy link
Author

laem commented May 25, 2016

I like the the way the condition is expressed. Implementing it may raise some problems. Should the name appearing

At the moment, formulas are passed along with the parameter call in the formula. This idea is that only formulas can be used in conditions, and conditions could directly trigger simulation.calculate calls in the future (but periods...)

I am concerned about the hierarchy since it is not avoidable to think at the legislation as a tree (when coding the formulas).

In my opinion trees can be avoided by using separate parameter files, VAR nodes, tags and a web view (coming next).

I do not understand the coding implications of using YAML files instead of XML

It's currently implemented as a totally separated process in the new file parameters.py

The scripts translating IPP YAML files to XML parameters should be rewritten but that should be easy.

I haven't yet looked at the YAML files generated from the XLS files, but merging them at this level could be far easier.

Looking forward to see the implementation in openfisca-france ;-)

This test file implements an example of using parameters in formulas on real -france use cases.

@benjello
Copy link
Member

About your call to the parameter plafond_securite_sociale please note that a legislative is not always linked to a tax benefit system variable.

@laem
Copy link
Author

laem commented May 25, 2016

About your call to the parameter plafond_securite_sociale please note that a legislative is not always linked to a tax benefit system variable.

Do you mean that parameters are not necessarily tied to a unique variable ? In this case, no worries, you can of course get whatever parameter you want by name. But in most of the cases in my experience, they are, so this shortcut helps reduce useless code.

2015-01-01: 3170
# Asking for a parameter for the date before the last will
# raise a ParameterNotFound error
# TODO: Or should it return 0 ?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or return the next known value ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes of course it's a possibility

@benjello
Copy link
Member

@fpagnoux @laem : what @laem calls a variable in his YAML files is could be called parameter or legislative_parameter

# Values should be written in descending ordered :
# what's most important today first
VALUES:
# Fuzzy is explicit :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@fpagnoux
Copy link
Member

This looks very promising, thanks @laem for this work !

I'm gathering my thoughts and I'll come back ;).

description: SMIC horaire brut
format: float
type: monetary
VALUES:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's with the casing?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just me transcribing caps locked XML... and I was thinking that the transformation type (here just VALUES, there BAREME and LIN-whatever) helps understand what the parameter is about quicker.

@fpagnoux
Copy link
Member

fpagnoux commented May 26, 2016

There is definitely a lot of good things in this. Here are my thoughts and interrogations :

  • Writing the parameters in yaml makes them way more readable and editable! I think we will still need some hierarchy to keep the parameters organized, but I'm pretty sure having a few folders and subfolders will do the job, without having all the weight of the xml.

  • Apart from the format, it seems that what we are addressing here is not that much about parameters, but more about how to build reusable patterns of computations. The BAREME was one of them, and it seems we need other ones, in a clearer structure.

    My vision of the problem is that some variables are using the same computation pattern with different sets of parameters.

    I don't think it should be the role of getParameters to know all these patterns and to do the computation.

To take a simple example:

class agffc(Variable):
    label = u"Cotisation retraite AGFF"

    def formula(self, simulation, period):
            base = simulation.calculate('salaire', period)
            pmss = simulation.getParameter('pmss', instant)

Instead of :

        return period, self.getParameter(
            instant,
            base_options={'base': base, 'factor': pmss}
            )

I'd prefer to have something like :

        return period, MarginalRateBaremeTemplate(
            bareme = simulation.getParameter('agffc') # or self.getParameter() if you want to have it transparent.
            instant = period.start,
            base_options={'base': base, 'factor': pmss}
            )

where MarginalRateBaremeTemplate would be a class or function defined somewhere and accessible. The argument it takes would be defined there, and depends on the template.

The getParameters would just do as it says, get the parameter.

The specific behaviour would be in its own place, more discoverable and explicit. And it would be easier to create new templates (no need to modify a core function).

The parameters would be roughly the same than in what you suggest, but maybe with less details (it would juste define a bareme and it would be up to the template to define what kind of computation we do).

@laem
Copy link
Author

laem commented May 26, 2016

Apart from the format, it seems that what we are addressing here is not that much about parameters, but more about how to build reusable patterns of computations

Yes !

We have two principles for writing formulas :

Ad-hoc code
python classes are responsible for computations by defining ad-hoc code (e.g. manipulating dates then returning a vector, that's what python is good at) for complicated and non-reusable computations.

Reusable code
a) - python classes should explicitely call a templated operation (bareme, rate multiplication...) if it exists, passing arguments coming from local simulation.calculate() calls.

b) - python classes are not used anymore for reusable computations that can reasonably be expressed in YAML (or whatever declarative), which will behind the scenes call the necessary simulation.calculate.

My examples here are halfway between a) and b), but I'm convinced we should explore the latter.

The dramatic drawback of approach a) to me is that your formula parameters are always only half of the story : "Here are some numbers related to agffc, but you won't be able to trust them or edit them without diving into the python-numpy code (because it can switch on conditions, modify the base of the bareme, etc.) that you'll find somewhere, hopefully with a similar name".
If the security of this approach is prefered, since it excludes non-coders, I don't see a reason to keep params in separated files (where the formula's hierarchy is reproduced anyway except of course for those that are shared).
Similarly, declarative-only formulas could be easily ( 😁 or almost) edited in a web interface.

@fpagnoux
Copy link
Member

The risk I see with declarative formulas, totally out of the python code, is that it will work for some easy cases (like cotisations), but clearly we won't be able to do it for more complicated formulas. So we'll end up having two codebases: one in yaml, another in python.

I'm sure we'll still be able to run products like mes-aides or embauche, confined to their own respective domain.

But for developers, it will be hard to understand these two different world. They will probably work in only one of them, and not be able to do much on the other one. And thus we will be giving up the idea of a single product to make the French tax and benefit system readable and executable.

@fpagnoux
Copy link
Member

Are there any news on this PR ? I'm pretty impatient about the yaml parameters ;)

If this is a big piece of work, is there a way we could split it and merge some of it, in a lean approach, to start benefiting from the enhancement ?

@laem
Copy link
Author

laem commented Jul 13, 2016

I'm investigating more expressive YAML syntax for writing variables and a UI to explore them in a different repo.
I still think this branch helps writing and reading parameters, but that's unfortunately not enough for a substantial improvement of the 'cotsoc' domain.

What's missing ?

  • Potential conflicts with @fpagnoux 's metaclass removal (minor)
  • MarginalTaxRate are the only bareme implemented
  • Testing the code, which is working in -core, on real formulas...

@laem
Copy link
Author

laem commented Jul 13, 2016

@fpagnoux We can take a few hours next week (thursday 21 afternoon or friday, or the week after) to see of it fits to the prestations sociales domain and maybe try to convert one or two formulas.
@benjello @cbenz

@MattiSG
Copy link
Member

MattiSG commented Mar 1, 2017

No changes for over 8 months, and still “this PR is not ready (review the code at your own risk 😁 ), but exists to expose and discuss the new functionality” and CI doesn't pass. This seems very monolithic and, even though there was enthusiasm on the switch to YAML, it looks like there was no actionable consensus reached.

In order to increase the readability of the current active work in progress, I close this PR since there is no active development on it.

@MattiSG MattiSG closed this Mar 1, 2017
@guillett guillett deleted the yaml_parameters branch November 15, 2018 07:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants