Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression in TaxBrain file upload logic #578

Closed
martinholmer opened this issue Jul 2, 2017 · 43 comments
Closed

Regression in TaxBrain file upload logic #578

martinholmer opened this issue Jul 2, 2017 · 43 comments
Assignees
Labels

Comments

@martinholmer
Copy link
Contributor

The following message was received from TaxBrain when uploading reform and assumption files that have been successfully uploaded for many months (see the history of the tax-calculator/file-upload-tests/README.md file). First the error and then the uploaded files.

image

The uploaded files are r1.json and a0.json in the tax-calculator/file-upload-tests directory. The above error is generated even though the START YEAR parameter on the TaxBrain file-upload page is set to 2013, which means I was asking for results for the ten years from 2013 through 2022.

It would seem the problem is with TaxBrain because using the taxcalc 0.9.0 package on my local computer runs just fine:

iMac2:file-upload-tests mrh$ conda list taxcalc
# packages in environment at /Users/mrh/anaconda:
#
taxcalc                   0.9.0                    py27_0    local

iMac2:file-upload-tests mrh$ tc puf.csv 2022 --reform r1.json --assump a0.json 
You loaded data for 2009.
Tax-Calculator startup automatically extrapolated your data to 2022.

iMac2:file-upload-tests mrh$ ls -l puf-22-r1-a0.csv 
-rw-r--r--  1 mrh  staff  8802722 Jul  2 17:24 puf-22-r1-a0.csv
iMac2:file-upload-tests mrh$ 

And the full set of local tests runs just fine:

iMac2:file-upload-tests mrh$ ./cli-results.sh 
STARTING WITH CLI-RESULTS : Sun Jul  2 17:08:08 EDT 2017
outfile,inctax,paytax= puf-22-r1-a0.csv 1823.04  1466.41
outfile,inctax,paytax= puf-22-r1-a1.csv 1775.99  1439.72
outfile,inctax,paytax= puf-22-r1-a2.csv 1800.40  1454.31
FINISHED WITH CLI-RESULTS : Sun Jul  2 17:09:22 EDT 2017

@MattHJensen @PeterDSteinberg @brittainhard @jbcrail

@brittainhard
Copy link
Contributor

@martinholmer will take a look at this sometime today.

@martinholmer
Copy link
Contributor Author

@brittainhard said:

will take a look at this [#578] sometime today.

Thanks. Now that we have main channel version of taxcalc 0.9.0 package, I can confirm that using that package on my computer produces results without error.

@brittainhard
Copy link
Contributor

@martinholmer looking at this, the source is definitely taxcalc and not webapp.

The main things I have noticed are this:

  1. The error shows up regardless of what year you select for the file upload.
  2. Webapp uses run_nth_year_taxcalc_model and thendropq_calculate. It does not use TaxCalcIO or the CLI. What would be causing the discrepancy, I don't know.
  3. In growfactors.csv, the max year is indeed 2026.
  4. dropq_calculate uses default arguments for Growfactors, and so is using this file.

Based on this, I think the place to look is in dropq_calculate in dropq_utils.py, and not in webapp. Let me know if you disagree.

@martinholmer
Copy link
Contributor Author

@brittainhard said:

Looking at this [TaxBrain issue #578], the source [of the TaxBrain crash] is definitely taxcalc and not webapp.

If that's the case, I'll be happy to fix Tax-Calculator. But I need from you a bug report that shows the error being generated by Tax-Calculator (not TaxBrain) code. I suggest the best way to do this is for you to create a Tax-Calculator pull request that adds a new test to the test_dropq file, and show that this new test fails for the same reason as TaxBrain failed in issue #578.

@MattHJensen @PeterDSteinberg

@martinholmer
Copy link
Contributor Author

@brittainhard, Just to make my view about the #578 bug clear: I think the four things you noticed do not provide any information about the location (taxcalc vs webapp) of the fatal TaxBrain error reported in issue #578. I do agree with you that the error could be coming from taxcalc, or it might be coming from webapp code, I don't know. Our problem is we don't know where to look for the coding problem. My suggestion (to create a new test in test_dropq.py that generates the same error as generated by TaxBrain) is meant to find out whether or not your hunch about the location of the code is correct. You should be able to create such a failing test easily if your hunch is correct. If you can't create such a failing test, then it seems to me that would point to the webapp code as the location of the bug.

@brittainhard
Copy link
Contributor

@martinholmer i'll set up a PR for that test.

@martinholmer
Copy link
Contributor Author

@brittainhard said six days ago in #578:

I'll set up a [Tax-Calculator] PR for that test [that shows a bug in dropq].

I have the time to fix that bug now before the next release. Can you submit a test that produces the error observed on TaxBrain?

The dropq code has complete (100%) code coverage, but it may be that the tests are not strong enough to detect the bug you believe is in the dropq code.

@brittainhard
Copy link
Contributor

@martinholmer My schedule is very tight right now. Currently preparing for the presentation tomorrow. I will get to it asap. Can you try to replicate this bug on your end?

@MattHJensen
Copy link
Contributor

MattHJensen commented Jul 14, 2017

@brittainhard, why don't you work on the replication w tax-calculator locally (not via webapp) after the presentation is over with. Could be valuable experience working w tax-calculator/dropq. Also, if @martinholmer looks into this, he could end up on a wild goose chase if the bug turns out to be in webapp-public.

@brittainhard
Copy link
Contributor

@MattHJensen sounds good.

@hdoupe
Copy link
Collaborator

hdoupe commented Jul 19, 2017

I tried to upload the Clinton2016.json reform file and received the following error:

screen shot 2017-07-19 at 3 47 24 pm

@PeterDSteinberg
Copy link
Contributor

@hdoupe My first thought is to double check you have the latest reform file format. There are changes periodically and I have in the past thought I found a bug, but then realized I was using an out-of-date version of the reforms despite appearing to be what I wanted. Please disregard if you're sure you're on the right version.

@MattHJensen - it would be good to put standard upload tests like Clinton/Trump tax plans in this Tax-Calculator folder or elsewhere standardized if that has not been done already. Developers affecting the reform file format should be aware they need to update the example reforms we commonly use.

@martinholmer
Copy link
Contributor Author

@hdoupe said on July 19, 2017:

I tried to upload the Clinton2016.json reform file and received the following error:
500 Internal Server Error!

That's interesting because this would seem to be yet another regression in TaxBrain.

When I tried what you just did 17 days ago, the upload of just a reform file worked fine.
What didn't work then was uploading both a reform and an assumption file.

So, even though TaxBrain is still marked as the same version as it was then (0.9.0), it would seem as if somebody has "updated" TaxBrain without testing it. So, now we have an even more broken version of TaxBrain being presented to the public.

@MattHJensen @PeterDSteinberg @brittainhard

@PeterDSteinberg
Copy link
Contributor

@martinholmer @hdoupe Please post the contents of the reform files in question. Is the reform file system also broken for the Trump reform commonly used? If so, please post broken and working Trump/Clinton reform files or email them to the three Continuum staff. It's hard to follow up when there are changing versions in question, changing reforms specs, and potentially out-of-date files being tested against new code.

@hdoupe
Copy link
Collaborator

hdoupe commented Jul 19, 2017

// Title: Select Provisions from the Clinton Tax Plan
// Reform_File_Author: Cody Kallen
// Reform Reference: https://www.hillaryclinton.com/briefing/factsheets/2016/01/12/investing-in-america-by-restoring-basic-fairness-to-our-tax-code/
// Reform Description:
// - 4% surcharge on AGI above $5 million ($2.5 million for married filing separately) (1)
// - Establishment of the "Fair Share Tax" at a 30% rate (aka the Buffett rule) (2)
// - Limit the tax value of itemized deductions to 28% (3)
// - Expansion of the Child Tax Credit and Additional CHild Tax Credit for children under 5 (4)
// - Including more business income in the base for the Net Investment Income Tax (5)
// Reform_Parameter_Map:
// - 1: _AGI_surtax*
// - 2: _FST_AGI_trt
// - 3: _ID_BenefitCap*
// - 4: CTC_c_under5_bonus, ACTC_rt_bonus_under5family
// - 5: NIIT_PT_taxed
{
"policy": {
"_AGI_surtax_trt":
{"2017": [0.04]},
"_AGI_surtax_thd":
{"2017": [[5.0e6, 5.0e6, 2.5e6, 5.0e6, 5.0e6]]},
"_FST_AGI_trt":
{"2017": [0.3]},
"_ID_BenefitCap_Switch":
{"2017": [[true, true, true, true, true, true, false]]},
"_ID_BenefitCap_rt":
{"2017": [0.28]},
"_CTC_c_under5_bonus":
{"2017": [1000]},
"_ACTC_rt_bonus_under5family":
{"2017": [0.3]},
"_NIIT_PT_taxed":
{"2017": [true]}
}
}
// Note: Due to lack of detail, data, or modeling capability, many provisions cannot be scored.
// These omitted provisions include:
// - New structure on capital gains taxes:
// - 39.6% if held less than 2 years
// - 36% if held 2-3 years
// - 32% if held 3-4 years
// - 28% if held 4-5 years
// - 24% if held 5-6 years
// - 20% if held 6 years or longer
// - Limit contributions to tax-deferred and tax-free retirement accounts
// - Limit tax benefits of like-kind exchanges
// - Reduce estate tax exemption to $3.5 million ($7 million for couples)
// - Progressive estate tax rates, with top rate of 65 percent
// - Tax unrealized gains at death for high-income taxpayers
// - Tax carried interest at ordinary tax rates
// - A 20% credit for caregiver expenses
// - Up to $5000 in tax relief for excessive health care costs
// - Financial risk fee on large banks
// - Tax on high-frequency trading
// - Expand small business expensing (Section 179)
// - Expand cash expensing for small businesses
// - Expand start-up deduction, create a small business deduction
// - Expand ACA credit for small businesses
// - Create business tax credits for profit-sharing and apprenticeships
// - Repeal the Cadillac Tax on high-cost health plans
// - New rules to prevent corporate inversions
// - Limit net interest deduction to share in consolidated financial statements
// - "Exit tax" on US multinationals that become foreign residents

@hdoupe
Copy link
Collaborator

hdoupe commented Jul 19, 2017

This is the Clinton2016.json refrom file as found in the Tax-Calculator repo.

I tried several of the reform files inlcuding the Trump2017.json file and all produced the same error.

@brittainhard @PeterDSteinberg @jbcrail I will email the files to you but they should be the same ones that are in the Tax-Calculator repo.

@martinholmer
Copy link
Contributor Author

@PeterDSteinberg, All the reform files that are under discussion in TaxBrain issue #578 are in one of two directories in the Tax-Calculator repo:

tax-calculator/file-upload-tests

and

tax-calculator/taxcalc/reforms

None of these reform files has changed over the course of the past several months. The only thing that has changed is TaxBrain's (in)ability to handle them.

@brittainhard
Copy link
Contributor

@martinholmer @hdoupe @PeterDSteinberg the app hasn't been rebuilt in a while. I think I found the thing that was causing that 500 error. Going to do a push to the test app with the Clinton reform file.

@brittainhard
Copy link
Contributor

@hdoupe try putting that file in on the test app, I got it to work by doing a push to the test app: http://ospc-taxes7.herokuapp.com/taxbrain/1355/

Going to update the main app. Can you send me a link of that failed file upload with the 500 error?

@hdoupe
Copy link
Collaborator

hdoupe commented Jul 19, 2017

@brittainhard I just tried it on the test app and got the following error
ospc-taxes7.herokuapp.com/taxbrain/file/?start_year=2017

screen shot 2017-07-19 at 5 07 31 pm

Here's the link that is showing above the error page. It will just take you back to the file upload page so I don't think it will help too much.
https://www.ospc.org/taxbrain/file/?start_year=2017

@hdoupe
Copy link
Collaborator

hdoupe commented Jul 19, 2017

@brittainhard That was produced by doing the quick calculate option. I'm trying theshow me the results option now.

@hdoupe
Copy link
Collaborator

hdoupe commented Jul 19, 2017

@brittainhard The additional_data keyword is supplied in line 262 of file_input function in views.py:

 submitted_ids, max_q_length = dropq_compute.submit_dropq_small_calculation(
     reforms,
     int(start_year),
     is_file=True,
     additional_data=assumptions
)

when dropq_compute.submit_dropq_small_calculation does not have a keyword additional_data.

@hdoupe
Copy link
Collaborator

hdoupe commented Jul 19, 2017

@brittainhard The full calculation works.

http://ospc-taxes7.herokuapp.com/taxbrain/1358/

@brittainhard
Copy link
Contributor

@hdoupe thanks, I'm just about to put up a PR for that :).

@brittainhard
Copy link
Contributor

@hdoupe @martinholmer fix for quick calc is up on the test app. Looking into the other issue now.

@brittainhard
Copy link
Contributor

@martinholmer I took another dive into the code and I ended up in taxcalc again I noticed this line:
https://github.com/open-source-economics/Tax-Calculator/blob/master/taxcalc/policy.py#L52

It looks like its setting the default max years to 14 instead of 13, which may account for why the app is trying to calculate for year 2027. I fooled around with those values and only got more errors.

Can you take a second look there? I'm still tracking down the bug.

@martinholmer
Copy link
Contributor Author

@brittainhard said about TaxBrain bug report #578, which was filed over three weeks ago:

I took another dive into the code and I ended up in taxcalc again I noticed this line:
https://github.com/open-source-economics/Tax-Calculator/blob/master/taxcalc/policy.py#L52

Line 52 in the policy.py file is the last line of the following lines:

    DEFAULTS_FILENAME = 'current_law_policy.json'
    JSON_START_YEAR = 2013  # remains the same unless earlier data added
    LAST_KNOWN_YEAR = 2017  # last year for which indexed param vals are known
    LAST_BUDGET_YEAR = 2026  # increases by one every calendar year
    DEFAULT_NUM_YEARS = LAST_BUDGET_YEAR - JSON_START_YEAR + 1

@brittainhard continued:

It looks like its setting the default max years to 14 instead of 13, which may account for why the app [that is, TaxBrain] is trying to calculate for year 2027. I fooled around with those values and only got more errors.

Can you take a second look there? I'm still tracking down the bug.

I'm mystified by your statement in several ways.

(1) The current version of Tax-Calculator can compute taxes for 2013, 2014, ..., 2025, and 2026. So, if I count on my fingers the number of years for which it can compute taxes, I get 14, which is the value of DEFAULT_NUM_YEARS. Why in the world do you think its values should be 13? Please explain.

(2) Given that we're looking for the cause of a TaxBrain bug that appeared just recently, why in the world do you think a statement (line 52) which has not changed in two years can be the cause? Please explain.

(3) GIven that you said 19 days ago in response to my request for a simple Tax-Calculator test that shows how calling a dropq method causes the reported TaxBrain error:

I'll set up a PR for that test

I don't understand why you "took another dive into the code and I ended up in taxcalc again".
Where is that promised test?

@MattHJensen @hdoupe

@martinholmer
Copy link
Contributor Author

@hdoupe, Does the TaxBrain error reported at the beginning of issue #578 still occur in the version of TaxBrain you are testing?

@hdoupe
Copy link
Collaborator

hdoupe commented Jul 26, 2017

@martinholmer Yes, the error is still there:
http://ospc-taxes7.herokuapp.com/taxbrain/1373/

However, it appears to work when it does the quick calculation
http://ospc-taxes7.herokuapp.com/taxbrain/1374/

@brittainhard

@brittainhard
Copy link
Contributor

@hdoupe @martinholmer the issue is that taxcalc is trying to calculate beyond the year 2026 when the limit is 2027. That reform file is causing the app to try to calculate beyond 2027, but quickcalc works because it is only calculating one year.

@martinholmer
Copy link
Contributor Author

@hdoupe said about the TaxBrain error reported over three weeks ago in issue #578:

Yes, the error is still there [in the 0.9.1 test version of TaxBrain]:
http://ospc-taxes7.herokuapp.com/taxbrain/1373/

Thanks for the quick response.

@hdoupe
Copy link
Collaborator

hdoupe commented Jul 26, 2017

@brittainhard said

That reform file is causing the app to try to calculate beyond 2027, but quickcalc works because it is only calculating one year.

Ah, thanks for pointing that out.

@martinholmer
Copy link
Contributor Author

@brittainhard said about the TaxBrain error reported in #578:

The issue is that taxcalc is trying to calculate beyond the year 2026 when the limit is 2027. That reform file is causing the app to try to calculate beyond 2027, but quickcalc works because it is only calculating one year.

The last year for which Tax-Calculator can compute taxes is currently 2026. So, I don't understand you statement "when the limit is 2027." What do you mean by that?

Why don't you show us the dropq function call in the TaxBrain code that causes this error and show us the value of each parameter passed to that dropq function. I don't understand why you didn't supply that information weeks ago when you came to the conclusion that the bug is in Tax-Calculator code.

@MattHJensen @hdoupe @PeterDSteinberg

@brittainhard
Copy link
Contributor

@martinholmer I think it's a good idea to give you my understanding of what the problem is.

That reform file has a structure that is unlike those of many other sample files. It has inputs for multiple years instead of one, and contains inputs for years that are beyond the default max input year of TaxBrain, which is 2017. For example, when you add inputs through the form at /taxbrain, you cannot add inputs for any year beyond 2017. When it tries to calculate data starting at any year beyond 2017 it fails because the growthfactors.csv file only has data up to 2026, as I mentioned before.

In my opinion, it is not possible to add any test that can prove which side of this application the bug is coming from. It is coming from both sides. I can supply a pickle file which does cause run_nth_year_taxcalc_model to fail, which I have tested locally. However, to use that file would prove little, since the data is first generated by TaxBrain.

I think we need a good discussion on how to handle this problem. It is not straightforward and there and many factors to consider. This project, like any math-oriented modeling web application, involves many repositories and interfaces working together. We're all limited on time and need to make sure we are doing efficient coordination to make each other aware of what's going on in each repo/interface and how those changes interact in the web application that ties all the pieces together.

@martinholmer
Copy link
Contributor Author

@brittainhard said about the TaxBrain bug reported in #578 (with my emphasis):

I think it's a good idea to give you my understanding of what the problem is.

That reform file has a structure that is unlike those of many other sample files. It has inputs for multiple years instead of one, and contains inputs for years that are beyond the default max input year of TaxBrain, which is 2017. For example, when you add inputs through the form at /taxbrain, you cannot add inputs for any year beyond 2017. When it tries to calculate data starting at any year beyond 2017 it fails because the growthfactors.csv file only has data up to 2026, as I mentioned before.

Given that that boldface statement is clearly false, I'm having a difficult time putting much credence in your understanding of the problem in #578.

@brittainhard also said:

I can supply a pickle file which does cause run_nth_year_taxcalc_model [dropq function] to fail, which I have tested locally.

Excellent! Given that the pickle documentation says the following, please do post the pickle file contents here.

By default, the pickle data format uses a printable ASCII representation.
This is slightly more voluminous than a binary representation. The big 
advantage of using printable ASCII (and of some other characteristics 
of pickle‘s representation) is that for debugging or recovery purposes
it is possible for a human to read the pickled file with a standard 
text editor.

@hdoupe
Copy link
Collaborator

hdoupe commented Jul 26, 2017

Thanks to @brittainhard I was able to go through the celery log and now I think I have a better idea about what's going on. I'm going to try to lay out the problem as I see it.

The error is thrown in the dropq_calculate function in taxcalc.dropq.dropq_utils.py.

# start_year should be 2013 but instead is 2019 ==> increment up to 2019
while calc1.current_year < start_year:
    calc1.increment_year()
calc1.calc_all()
assert calc1.current_year == start_year

...

# increment to current year in calc2, too

...

# increment up from 2019 to 2019 + year_n.  taxcalc.dropq.dropq_utils.dropq_calculate is called 10 times using year_n 
# equal to each of the values from the vector years set in webapp.apps.taxbrain.DropqCompute._get_years(...)
# when year_n is greater than or equal to 8, current year is greater than 2026 and 
# the error is thrown
# increment Calculator objects for year_n years and calculate
for _ in range(0, year_n):
    calc1.increment_year()
    calc2.increment_year()
Traceback (most recent call last):
  File "/anaconda/envs/aei_dropq/lib/python2.7/site-packages/celery/app/trace.py", line 367, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/anaconda/envs/aei_dropq/lib/python2.7/site-packages/celery/app/trace.py", line 622, in __protected_call__
    return self.run(*args, **kwargs)
  File "/Users/henrydoupe/Documents/webapp-public/deploy/taxbrain_server/celery_tasks.py", line 116, in dropq_task_async
    print('dropq_task_async', year, user_mods, first_budget_year, beh_params)
  File "/Users/henrydoupe/Documents/webapp-public/deploy/taxbrain_server/celery_tasks.py", line 93, in dropq_task
    df_bin_i, pdf_bin_i, cdf_bin_i, fiscal_tot_i,
  File "/anaconda/envs/aei_dropq/lib/python2.7/site-packages/taxcalc/dropq/dropq.py", line 54, in run_nth_year_tax_calc_model
    mask_computed=True)
  File "/anaconda/envs/aei_dropq/lib/python2.7/site-packages/taxcalc/dropq/dropq_utils.py", line 141, in dropq_calculate
    calc1.increment_year()
  File "/anaconda/envs/aei_dropq/lib/python2.7/site-packages/taxcalc/calculate.py", line 159, in increment_year
    self.records.increment_year()
  File "/anaconda/envs/aei_dropq/lib/python2.7/site-packages/taxcalc/records.py", line 185, in increment_year
    self._blowup(self.current_year)
  File "/anaconda/envs/aei_dropq/lib/python2.7/site-packages/taxcalc/records.py", line 249, in _blowup
    AWAGE = self.gfactors.factor_value('AWAGE', year)
  File "/anaconda/envs/aei_dropq/lib/python2.7/site-packages/taxcalc/growfactors.py", line 143, in factor_value
    raise ValueError(msg.format(year, self.last_year))
ValueError: year=2027 > Growfactors.last_year=2026

There are two issues here:

  1. In webapp-public/deploy/celery_tasks.py, first_budget_year is passed to dropq_task as None instead of 2013 (or whatever first budget year is selected). The result is that first_year is set to a year listed in the reform. So, in this case, it is set to 2019 instead of 2013. Why is that happening?

  2. In webapp/apps/taxbrain/compute.py, it looks like NUM_BUDGET_YEARS is set to 10 by default:
    NUM_BUDGET_YEARS = int(os.environ.get('NUM_BUDGET_YEARS', 10))
    Is this the case? If so shouldn't actually be 2026 - first_budget_year?

I'm sure I'm missing something here and would appreciate any feedback.

@brittainhard @PeterDSteinberg @martinholmer

@martinholmer
Copy link
Contributor Author

martinholmer commented Jul 26, 2017

@hdoupe said after analyzing the "celery log" that traces TaxBrain operation leading up to the #578 bug:

There are two issues here:

  1. In webapp-public/deploy/taxbrain_server/celery_tasks.py, first_budget_year is passed to dropq_task as None instead of 2013 (or whatever first budget year is selected). The result is that first_year is set to a year listed in the reform. So, in this case, it is set to 2019 instead of 2013. Why is that happening?

  2. In webapp/apps/taxbrain/compute.py, it looks like NUM_BUDGET_YEARS is set to 10 by default:
    NUM_BUDGET_YEARS = int(os.environ.get('NUM_BUDGET_YEARS', 10))
    Is this the case? If so shouldn't actually be 2026 - first_budget_year?

I'm sure I'm missing something here and would appreciate any feedback.

@hdoupe, Thanks for your efforts to unravel the cause of the bug reported in TaxBrain #578.

The "celery log" is still secret, so I have no way of providing feedback about whether or not your interpretation of the "celery log" is correct. So, my comments here are going to assume that your two observations are accurate. I'll leave it up to @brittainhard and @PeterDSteinberg, who have access to the "celery log", to determine the veracity of your two observations.

It seems as if the problem is in TaxBrain as described in your first observation. By way of background, the standard in a full (not quick) run is to show tax results for ten "budget" years with the first year being the one selected by the user on the form. The TaxBrain forms currently offer a start year choice of 2013, 2014, 2015, 2016 or 2017, so taxes can be computed for all ten years even if 2017 is selected (because currently Tax-Calculator can calculate taxes in 2026 at the latest). This information was discussed by @brittainhard and @MattHJensen during late May in TaxBrain issue #557.

So, currently the largest legal first_budget_year (when num_budget_years is ten) is 2017. If your analysis of the secret "celery log" is correct, then the logic of TaxBrain needs to be fixed. The first_budget_year has nothing to do with the earliest reform year in the reform JSON file, it is the value selected by the user on the file-upload page. So, the following code, which was merged into the TaxBrain master branch as part of pull request #556 on or around July 1, 2017, is incorrect:

def dropq_task(year, user_mods, first_budget_year, beh_params, tax_data):
    print("user mods: ", user_mods)
    # The reform style indicates what kind of reform we ran.
    # A list of size 1 with 'True' indicates a standard TaxBrain run
    # A list of size > 1 indicates a file-based reform was run, where each
    # index indicates whether the reform dictionary was non-empty
    # The four reform dictionaries from file-based reforms are:
    # policy, behavior, growth, consumption (in that order)
    reform_style = [True]
    if first_budget_year is not None:
        first_year = int(first_budget_year) 
    else:
        first_year = int(user_mods.keys()[0])

Passing a first_budget_year of None should raise an error here. And as I explained above the None value action first_year = int(user_mods.keys()[0]) is conceptually wrong and is what is causing the TaxBrain #578 bug.

The strange thing about this is that a few months ago TaxBrain did not have this problem; the problem seems to have been introduced in the massive set of #556 changes.

I don't think your second observation is a problem as long as first_budget_year is not too large.
[This sentence is incorrect. See subsequent comment for a correction.]

Does my analysis of the situation seem sensible? If not, what questions do you have?

@martinholmer
Copy link
Contributor Author

martinholmer commented Jul 27, 2017

@martinholmer said in the discussion of the #578 comment by @hdoupe:

I don't think your second observation is a problem as long as first_budget_year is not too large.

This is incorrect. What @hdoupe suggested is correct. Here is what he said:

In webapp/apps/taxbrain/compute.py, it looks like NUM_BUDGET_YEARS is set to 10 by default:

NUM_BUDGET_YEARS = int(os.environ.get('NUM_BUDGET_YEARS', 10))

Is this the case? If so, shouldn't actually be 2026 - first_budget_year?

Background on dropq function arguments start_year and year_n
The dropq_calculate function advances the Calculator objects to start_year and then increments the year of the two Calculator objects year_n times. This is the way dropq has worked from the beginning years ago.

So, if start_year is 2017 and year_n is 2, taxes are calculated for 2019. And therefore, if start_year is 2017 and year_n is 10, dropq will try to increment the year ten times beyond 2017, which will cause an error because the tenth increment will move to 2027 and that is greater than Policy.LAST_BUDGET_YEAR (currently set at 2026).

In TaxBrain what is the value of the environment variable NUM_BUDGET_YEARS?
Is NUM_BUDGET_YEARS passed to dropq as year_n?

In general, if you want ten budget years, you must pass to dropq a year_n value of nine (because start_year is already part of the ten years). Of course, this statement is true only if start_year is no greater than 2017.

So, in conclusion, there appear to be two separate bugs in TaxBrain logic corresponding to the two observations made by @hdoupe.

Those reading this discussion would benefit from reading Tax-Calculator pull request 1495, which adds dropq logic to check for start_year and year_n values that are inconsistent with the Policy.JSON_START_YEAR or Policy.LAST_BUDGET_YEAR constants.

@MattHJensen @hdoupe @PeterDSteinberg @brittainhard

@martinholmer
Copy link
Contributor Author

@hdoupe has shared the complete "celery.log" mentioned earlier in the discussion of TaxBrain issue #578. Issue #578 reported a TaxBrain crash, the source of which has been in doubt for weeks. It is important to stress that the reported TaxBrain crash occurs when doing a reform file upload that worked correctly for many months before the recent TaxBrain changes that are incorporated in the First Stable Release.

Selected information from the celery.log, which will be shown below, indicate clearly that the cause of the crash is new TaxBrain code that misuses the unchanged dropq functions in Tax-Calculator.
Here are the parameters being passed by TaxBrain to dropq:

$ grep -A1 "keywords to dropq" celery.log 

[2017-07-27 09:14:03,607: WARNING/MainProcess] keywords to dropq
[2017-07-27 09:14:03,607: WARNING/MainProcess] {'start_year': 2019, 'user_mods': {u'growdiff_response': {}, u'consumption': {}, u'growdiff_baseline': {}, u'behavior': {}, u'policy': {2018: {u'_SS_Earnings_c': [400000], u'_II_em_cpi': False, u'_II_em': [8000]}, 2019: {u'_SS_Earnings_c': [500000], u'_ALD_InvInc_ec_rt': [0.2]}, 2020: {u'_SS_Earnings_c': [600000], u'_II_em_cpi': True}}, u'gdp_elasticity': {}}, 'year_n': 0}
--
[2017-07-27 09:15:07,192: WARNING/MainProcess] keywords to dropq
[2017-07-27 09:15:07,192: WARNING/MainProcess] {'start_year': 2019, 'user_mods': {u'growdiff_response': {}, u'consumption': {}, u'growdiff_baseline': {}, u'behavior': {}, u'policy': {2018: {u'_SS_Earnings_c': [400000], u'_II_em_cpi': False, u'_II_em': [8000]}, 2019: {u'_SS_Earnings_c': [500000], u'_ALD_InvInc_ec_rt': [0.2]}, 2020: {u'_SS_Earnings_c': [600000], u'_II_em_cpi': True}}, u'gdp_elasticity': {}}, 'year_n': 2}
--
[2017-07-27 09:15:45,075: WARNING/MainProcess] keywords to dropq
[2017-07-27 09:15:45,077: WARNING/MainProcess] {'start_year': 2019, 'user_mods': {u'growdiff_response': {}, u'consumption': {}, u'growdiff_baseline': {}, u'behavior': {}, u'policy': {2018: {u'_SS_Earnings_c': [400000], u'_II_em_cpi': False, u'_II_em': [8000]}, 2019: {u'_SS_Earnings_c': [500000], u'_ALD_InvInc_ec_rt': [0.2]}, 2020: {u'_SS_Earnings_c': [600000], u'_II_em_cpi': True}}, u'gdp_elasticity': {}}, 'year_n': 4}
--
[2017-07-27 09:16:21,640: WARNING/MainProcess] keywords to dropq
[2017-07-27 09:16:21,640: WARNING/MainProcess] {'start_year': 2019, 'user_mods': {u'growdiff_response': {}, u'consumption': {}, u'growdiff_baseline': {}, u'behavior': {}, u'policy': {2018: {u'_SS_Earnings_c': [400000], u'_II_em_cpi': False, u'_II_em': [8000]}, 2019: {u'_SS_Earnings_c': [500000], u'_ALD_InvInc_ec_rt': [0.2]}, 2020: {u'_SS_Earnings_c': [600000], u'_II_em_cpi': True}}, u'gdp_elasticity': {}}, 'year_n': 7}
--
[2017-07-27 09:16:53,262: WARNING/MainProcess] keywords to dropq
[2017-07-27 09:16:53,262: WARNING/MainProcess] {'start_year': 2019, 'user_mods': {u'growdiff_response': {}, u'consumption': {}, u'growdiff_baseline': {}, u'behavior': {}, u'policy': {2018: {u'_SS_Earnings_c': [400000], u'_II_em_cpi': False, u'_II_em': [8000]}, 2019: {u'_SS_Earnings_c': [500000], u'_ALD_InvInc_ec_rt': [0.2]}, 2020: {u'_SS_Earnings_c': [600000], u'_II_em_cpi': True}}, u'gdp_elasticity': {}}, 'year_n': 8}

$ 

We identified earlier in the #578 discussion the TaxBrain bug that sets start_year to 2019, so there is no need to discuss that bug further.

Given that TaxBrain is incorrectly setting the value of start_year to 2019, then whenever year_n is greater than 7, TaxBrain is calling for tax calculation in a year after 2026, which is currently the last year for which Tax-Calculator can compute taxes. That is why the last entry in the log is for year_n equal to 8. Passing start_year=2019 and year_n=8 to dropq will definitely cause an error. So, the celery.log shows the TaxBrain crash is caused by a misuse of the dropq functions in the new TaxBrain code. There was no such misuse of dropq in the older code, which is why this TaxBrain crash appeared only in the last month or so.

So, the first question is whether or not the TaxBrain developers from Continuum Analytics agree with this analysis of the cause of the TaxBrain crash. And if they agree, the next question is when can the TaxBrain code be fixed.

As I said earlier in the #578 discussion, the recently merged Tax-Calculator pull request 1495 has tried to improve dropq checking of passed start_year and year_n values. See the new test of this checking capability to see what kind of start_year/year_n pairs are illegal. It is important for everybody to understand, for example, why passing start_year=2017 and year_n=10 to dropq functions is illegal.

@MattHJensen @hdoupe @PeterDSteinberg @brittainhard

@MattHJensen
Copy link
Contributor

MattHJensen commented Jul 27, 2017

I'm very glad to see progress on this issue, and thanks to @hdoupe, @martinholmer, and @brittainhard for digging into this bug report.

I do have a few suggestions that may speed up discussions in the future.

  • Straightforward statements of disagreement are very helpful, and I think we are all thick skinned enough to not need sugar coating. BUT there is no need to make disagreements or critiques gratuitously harsh. Emotional responses —especially during disagreements — disincentive people from communicating openly and reduces the likelihood that we get to the source of disagreements. (No complaints from me about ecstatic praise!)
  • When a downstream project maintainer thinks there is a bug is in a dependency (upstream project) — s/he should either open a PR to that upstream project with a failing test or paste into an issue the input and unexpected output that demonstrate the potential bug in the upstream project. This never happened in this discussion, but it would have been valuable at the very beginning.

@brittainhard
Copy link
Contributor

@martinholmer @hdoupe @MattHJensen @PeterDSteinberg

Looking at the pickle file, the size is over 500mb. This is most likely because it contains a serialized DataFrame object. Including that in a PR would be undesirable for a number of reasons.

After having taken a second look at the problem, I believe the solution is to fix the code in webapp. To this end, I am going to create a PR which will set up a failing test that we can work from towards a fix. This PR should be up shortly.

Let me know if you have any questions.

@martinholmer
Copy link
Contributor Author

@brittainhard said:

Looking at the pickle file, the size is over 500mb. This is most likely because it contains a serialized DataFrame object. Including that in a PR would be undesirable for a number of reasons.

Yes, your view is very sensible.

After having taken a second look at the problem, I believe the solution is to fix the code in webapp. To this end, I am going to create a PR which will set up a failing test that we can work from towards a fix. This PR should be up shortly.

Great! Let us know if any questions come up that you think we could answer quickly.

@brittainhard
Copy link
Contributor

Closed by #593

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants