Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure consistent results in the presence of "spirals" #817

Merged
merged 28 commits into from
Mar 29, 2019

Conversation

sandcha
Copy link
Collaborator

@sandcha sandcha commented Jan 28, 2019

  • Breaking change
  • Improve cycle and spiral detection, giving consistent results more systematically
  • Migration instructions:
    • remove all optional parameters max_nb_cycles
    • avoid relying on cached values of a computation (especially wrong values)

Detailed explanation of the changes

Terminology:

  • a "cycle" describes the situation where computing an OpenFisca variable for a given period eventually triggers the computation of the same variable for the same period; this cannot be resolved, so an exception is raised and the computation is aborted;
  • a "spiral" is a related situation where computing a variable for a given period requires computing the same variable for a different period.

This PR makes that distinction more explicit by the type of exception raised.

Previously it was the responsibility of callers (country package authors and other reusers) to control the behaviour in the presence of spirals, by setting a limit on the number of "loops" a computation can go through. In such cases, the computation is not aborted with an error but allowed to proceed after returning a default value (usually 0). This resulted in inconsistent computations in interaction with caching of computed values.

This PR removes this responsibility from the caller, and Core now "goes the extra mile" to detect this condition, and to behave more reliably - returning consistent results that do not depend on the order in which computations are performed.

Spirals are more complex to analyse in the absence of static determination of the dependency between variables, so a heuristic is applied in this PR: limit the number of loops to 1. In addition, values computed in a "spiral" configuration should not be cached. We have verified empirically that with these changes, the result of computations no longer depends on their ordering.

This PR effects the following changes:

  • maintain a separate stack structure that mimics the actual Python stack
  • upon exceeding the loop limit, propagate the error up that parallel stack
  • but propagate it only up to the variable for which a spiral exists, not above
  • raise an exception to exit the formula in which the error was detected
  • return the default value from this computation
  • on completion of the original computation, invalidate the cached values of all variables marked as involved in the spiral

Connected to openfisca/openfisca-france#1211

@Morendil Morendil force-pushed the simplify-cycle-detection-redux branch from 55f5e8b to 0ef0fed Compare January 29, 2019 13:46
@Morendil
Copy link
Contributor

Rebased to get CI configuration with Python 3 only.

@Morendil
Copy link
Contributor

Morendil commented Feb 1, 2019

Given the following variables (adapted from test_cycles.py, with max_nb_cycles annotation removed):

# 5 -f-> 6 with a period offset: a spiral
#   <---
class variable5(Variable):
    value_type = int
    entity = Person
    definition_period = MONTH

    def formula(person, period):
        variable6 = person('variable6', period.last_month)
        return 5 + variable6


class variable6(Variable):
    value_type = int
    entity = Person
    definition_period = MONTH

    def formula(person, period):
        variable5 = person('variable5', period)
        return 6 + variable5

What we want can be expressed as the following test. For simplicity assume the "spiral heuristic" is that the first time a spiral is detected, we break and return the default value. (In practice, it may have value to allow a spiral to go through a second loop.)

def test_spiral_heuristic():
    simulation = tax_benefit_system.new_scenario().init_from_attributes(
        period = reference_period,
        ).new_simulation(debug = True)
    variable6 = simulation.calculate('variable6', period = reference_period)
    variable5 = simulation.calculate('variable5', period = reference_period)
    variable6_last_month = simulation.calculate('variable6', reference_period.last_month)
    assert_near(variable5, [11])
    assert_near(variable6, [11])
    assert_near(variable6_last_month, variable6)

The rationale is as follows:

  • when we compute variable5 for month N, we first compute variable6 for the month N-1; for this, we compute variable5 for month N-1. This is a spiral so we stop it per the heuristic. We return the default value instead, 0. We add this 0 to 6, giving 6 for variable6. We add this 6 to 5, giving 11 for variable5.
  • when we compute variable6 for month N, we first compute variable5 for the month N; for this, we compute variable6 for month N-1. This is a spiral so we stop it per the heuristic. We return the default value instead, 0. We add this 0 to 5, giving 6 for variable5. We add this 5 to 6, giving 11 for variable6.
  • because we stopped a spiral, nothing should have been cached; the value of variable6_last_month should be the same as the value of variable6, since the result does not depend on any input defined for a specific period, but only on the application of computation rules.

The difficulty is that calculate/run_formula can at the moment only return a value or raise an exception. It cannot return a value tagged with something signifying "do not cache", because whatever it returns can be used in a formula and we do not control how it's used.



def test_allowed_cycle():
def test_spiral_heuristic():
"""
Calculate variable5 then variable6 then in the order order, to verify that the first calculated variable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other order

@benjello
Copy link
Member

benjello commented Mar 7, 2019

@Morendil thanks for the details. I would like to be 100 % sure. Are every spiral always stopped at the first step or would they follow max_nb_cycles ? As far as I understand, when we hit a spiral we use the default value as a heuristic but we do not set the variable at this value for that period.

@Morendil
Copy link
Contributor

Morendil commented Mar 7, 2019

@benjello I should probably have posted an update after the above comment (with test_spiral_heuristic), the real test in the PR is a bit different. But yes, the idea is that when we hit a spiral we return the default but we don't cache the result. It's the interaction of cycle detection and caching that is at the root of the problem.

@Morendil Morendil force-pushed the simplify-cycle-detection-redux branch from 33c4b8d to 04f47f5 Compare March 7, 2019 15:05
@Morendil
Copy link
Contributor

Morendil commented Mar 7, 2019

Rebased.

@benjello
Copy link
Member

benjello commented Mar 7, 2019

Ok @Morendil this is neat. How can we help to have this merged ASAP ?
Does it causes troubles to openfisca-france ?

@Morendil Morendil force-pushed the simplify-cycle-detection-redux branch from 0927b3c to 88a16d3 Compare March 7, 2019 15:49
@Morendil Morendil changed the title [WIP] Simplify cycle detection Guarantee consistent results in the presence of "spirals" Mar 11, 2019
@Morendil Morendil changed the title Guarantee consistent results in the presence of "spirals" Ensure consistent results in the presence of "spirals" Mar 11, 2019
@Morendil Morendil force-pushed the simplify-cycle-detection-redux branch from bfa5cef to 64dec34 Compare March 14, 2019 13:10
@Morendil Morendil requested a review from fpagnoux March 14, 2019 13:13
@Morendil Morendil force-pushed the simplify-cycle-detection-redux branch from 64dec34 to b24a2fc Compare March 15, 2019 18:54
@Morendil
Copy link
Contributor

Morendil commented Mar 16, 2019

I have bad news and good news. The bad news is that we are trying too hard to not cache values involved in spirals, but these variables are recomputed a lot. So this kills performance - not just a little degradation but to the point of feeling like an infinite loop. Unusable in production, anyway.

I'm reminded of the saying "There are only two hard things in Computer Science: cache invalidation and naming things." Also, much gratitude to @sandcha and @fpagnoux for the performance measurement script which identified the problem last night.

The good news is that thanks to our earlier work, and in particular representing the computation stack explicitly, we should still be able to a) solve the original problem, b) simplify the code quite a bit, c) expect some modest performance gains.

Our problem is that successive calls to calculate yield inconsistent results because of cached values. We can see this clearly on the first spiral detected in revenu_disponible:

[['revenu_disponible', '2018', True], ['pensions_nettes', '2018', True], ['chomage_net', '2018-01', True], ['chomage_imposable', '2018-01', False], ['csg_deductible_chomage', '2018-01', False], ['taux_csg_remplacement', '2018-01', False], ['rfr', '2016', False], ['rni', '2016', False], ['rng', '2016', False], ['rbg', '2016', False], ['revenu_categoriel', '2016', False], ['revenu_categoriel_tspr', '2016', False], ['traitements_salaires_pensions_rentes', '2016', False], ['revenu_assimile_salaire_apres_abattements', '2016', False], ['revenu_assimile_salaire', '2016', False], ['chomage_imposable', '2016-01', False]]

Variable rfr is collateral damage in the spiral that involves chomage_imposable and is set to 0. But this does not matter during a computation. It's only a problem because we reuse the cache across computations (successive calls to calculate).

Instead of trying not to cache values involved in a spiral, we can use the same approach but clean the cache of the values afterwards. We check that the stack is empty, to detect returning from the initial call to calculate. I'm hopeful this will have the same result. (Pushing the commit that uses this approach, but I haven't yet ran the tests for inconsistent values again, or looked at fixing any unit tests that break.)

Also, there's really no reason now to keep raising and catching exceptions so the code should eventually look simpler, and in addition these operations are expensive so we should see a performance gain. This seems however to be sensitive to the chosen value of max_spiral_loops (see below)

Performance on master:

Premier test revenu disponible
2.927500 s
Second test revenu disponible
2.691405 s
3e test revenu disponible
2.688748 s
Premier test ciblé spirale
1.298005 s
Second test ciblé spirale
1.320369 s
3e test ciblé spirale
1.310184 s

Performance on this PR, max_spiral_loops = 1:

Premier test revenu disponible
2.604535 s
Second test revenu disponible
2.420090 s
3e test revenu disponible
2.381879 s
Premier test ciblé spirale
0.774254 s
Second test ciblé spirale
0.809069 s
3e test ciblé spirale
0.809284 s

Performance on this PR, max_spiral_loops = 2:

Premier test revenu disponible
3.156583 s
Second test revenu disponible
2.956258 s
3e test revenu disponible
3.317308 s
Premier test ciblé spirale
1.763587 s
Second test ciblé spirale
1.686750 s
3e test ciblé spirale
1.683628 s

@Morendil
Copy link
Contributor

Morendil commented Mar 16, 2019

Unfortunately Core tests are passing but a lot of France tests are broken with this… still investigating.

Later: breakage is due to a bug in Core 27.

@Morendil
Copy link
Contributor

Morendil commented Mar 17, 2019

The strategy "partially invalidate the cache, removing variables caught in a spiral" runs into some issues.

Here is a simplified version of tests/formulas/rsa/rsa_2017.yaml from France:

- name: Les primes sur un mois ne sont pas moyennées
  input:
    Alicia:
      salaire_net:
        2017-01: 200
  output:
    rsa:
      2017-01: 223
    rsa_fictif:
      2016-12: 0
      2016-11: 535.17 - 200
      2016-10: 535.17 - 200

This fails with the cache invalidation changes, because the rsa_fictif output check only works if rsa_fictif is already in the cache; and because rsa_fictif is caught in a spiral of rsa, we remove it from the cache in this PR.

If you try running this test on master, but interchange rsa and rsa_fictif, the test will fail with:
TypeError: formula_2016_10() missing 1 required positional argument: 'mois_demande'

This is because of extra_params, which prevents rsa_fictif from being computed directly. But of course this breaks the implicit rules of YAML tests: the results should not depend on the order of computation, and any variable should be computable directly without supplying extra parameters.

This confirms the urgency of openfisca/openfisca-france#1284 - and that this extra_params business was really not a good idea.

@Morendil Morendil force-pushed the simplify-cycle-detection-redux branch from ce8b495 to 3acf725 Compare March 17, 2019 16:25
@Morendil
Copy link
Contributor

Morendil commented Mar 17, 2019

The situation is a bit worse than I thought. This branch causes 6 errors and 12 failures in France tests. The 6 errors are due to the extra_params situation. One of the failures could be solved by max_spiral_loops=2. However, the rest (11) appear to be related to RSA.

Here is a test adapted from one of the cases in tests/formulas/revenu_disponible.yaml:

- name: f1aw_f4ba_2017
  description: Revenus fonciers (4BA) et rentes viagères à titre onéreux (1AW), 2017 (vérifie notamment la formule de revenus_nets_du_capital qui est un peu complexe pour ces deux types de revenu)
  period: 2017
  absolute_error_margin: 1
  input:
    f1aw: 30000
    f4ba: 20000
  output:
    prelevements_sociaux_revenus_capital: -7052 # -(0.7*30000 + 20000) * (0.045 + 0.02 + 0.003 + 0.005 + 0.082)
    revenus_nets_du_capital: (20000 - 7052)
    pensions_nettes: 30000
    irpp: -6593 # Montant calculé sur le simulateur de la DGFiP
    impots_directs: -6593 # Montant calculé sur le simulateur de la DGFiP
    revenu_disponible: (20000 - 7052) + 30000 - 6593
- name: f1aw_f4ba_2017
  description: Revenus fonciers (4BA) et rentes viagères à titre onéreux (1AW), 2017 (vérifie notamment la formule de revenus_nets_du_capital qui est un peu complexe pour ces deux types de revenu)
  period: 2017
  absolute_error_margin: 1
  input:
    f1aw: 30000
    f4ba: 20000
  output:
    prelevements_sociaux_revenus_capital: -7052 # -(0.7*30000 + 20000) * (0.045 + 0.02 + 0.003 + 0.005 + 0.082)
    revenus_nets_du_capital: (20000 - 7052)
    pensions_nettes: 30000
    irpp: -6593 # Montant calculé sur le simulateur de la DGFiP
    impots_directs: -6593 # Montant calculé sur le simulateur de la DGFiP
    revenu_disponible: (20000 - 7052) + 30000 - 6593
    rsa:
      2017-01: 0

Note that we are testing rsa at the end. This test is passing. However, consider the following:

- name: f1aw_f4ba_2017
  description: Revenus fonciers (4BA) et rentes viagères à titre onéreux (1AW), 2017 (vérifie notamment la formule de revenus_nets_du_capital qui est un peu complexe pour ces deux types de revenu)
  period: 2017
  absolute_error_margin: 1
  input:
    f1aw: 30000
    f4ba: 20000
  output:
    prelevements_sociaux_revenus_capital: -7052 # -(0.7*30000 + 20000) * (0.045 + 0.02 + 0.003 + 0.005 + 0.082)
    revenus_nets_du_capital: (20000 - 7052)
    pensions_nettes: 30000
    irpp: -6593 # Montant calculé sur le simulateur de la DGFiP
    impots_directs: -6593 # Montant calculé sur le simulateur de la DGFiP
    revenu_disponible: (20000 - 7052) + 30000 - 6593
- name: f1aw_f4ba_2017
  description: Revenus fonciers (4BA) et rentes viagères à titre onéreux (1AW), 2017 (vérifie notamment la formule de revenus_nets_du_capital qui est un peu complexe pour ces deux types de revenu)
  period: 2017
  absolute_error_margin: 1
  input:
    f1aw: 30000
    f4ba: 20000
  output:
    rsa:
      2017-01: 0
    prelevements_sociaux_revenus_capital: -7052 # -(0.7*30000 + 20000) * (0.045 + 0.02 + 0.003 + 0.005 + 0.082)
    revenus_nets_du_capital: (20000 - 7052)
    pensions_nettes: 30000
    irpp: -6593 # Montant calculé sur le simulateur de la DGFiP
    impots_directs: -6593 # Montant calculé sur le simulateur de la DGFiP
    revenu_disponible: (20000 - 7052) + 30000 - 6593

This test does not pass, because OpenFisca calculates an amount of 535.17 for rsa instead of 0.

This is the first computation OpenFisca performs, so it must be "the right result" for these inputs. The computed RSA of 0 must be the result of a side-effect. And in fact it turns out that rsa is set to 0 as the result of a computation aborted due to a "spiral".

(This is, expressed differently, the same issue reported by @claireleroy)

@sandcha sandcha force-pushed the simplify-cycle-detection-redux branch from 31be591 to 47afc88 Compare March 28, 2019 10:42
@Morendil Morendil merged commit 0bb1c50 into master Mar 29, 2019
@Morendil Morendil deleted the simplify-cycle-detection-redux branch March 29, 2019 09:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants