Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test Results against target values #3

Closed
wants to merge 3 commits into from
Closed

Conversation

sbenthall
Copy link
Owner

This PR addresses llorracc#7 by providing a test script that compares the data in the Results/ files with target values.

There are few tricky things about this PR:

  • Only one set of target values, PYbetaPointIndNetWorthResults, is from the original cstwMPC paper. The other target values are currently taken from the data files from this repository as commit 3aefa00 using HARK 0.11.0
  • Of those Results, some do not compare favorably with the original paper results. The third quintile target is 7.285, but commit 3aefa00 gets 5.241. This is 30% off of the original target, whereas @llorracc has set the criteria as 'not much more than 10%'. However, this change in result was accepted by an earlier review by @llorracc and @mnwhite . So, it's not clear what to do here.
  • Because of the design of the original cstwMPC code, which involves a lot on non-standard code execution (using exec()) and file output (saving txt files with custom data), I designed this script to work with the files output to the Results/ directory. This means that it doesn't operate like a normal Python unit test of some part of the code. Rather, the test must be ran by hand to verify results.

@sbenthall sbenthall closed this Jan 24, 2023
@llorracc
Copy link

Seb,

Great, I'm very glad to see this!

  • Of those Results, some do not compare favorably with the original paper results. The third quintile target is 7.285, but commit 3aefa00 gets 5.241. This is 30% off of the original target, whereas @llorracc has set the criteria as 'not much more than 10%'. However, this change in result was accepted by an earlier review by @llorracc and @mnwhite. So, it's not clear what to do here.

The crucial statistics are those for the MPC, which are all within reasonable tolerance of the cstwMPC paper's numbers. Probably not worth the archaeology to run down why the third quintile is off so much -- except insofar as we want to be sure that these numbers are stable in the sense that if we increase, say, the number of consumers being simulated, they don't change much.

  • Because of the design of the original cstwMPC code, which involves a lot on non-standard code execution (using exec()) and file output (saving txt files with custom data), I designed this script to work with the files output to the Results/ directory. This means that it doesn't operate like a normal Python unit test of some part of the code. Rather, the test must be ran by hand to verify results.

I'd really like to get this into some form that can be run automatically when we update the development branch of the HARK toolkit. Whether that requires something in the form of a unit test, I don't know, but my goal is to choose a small number of REMARKs that are "unpinned" because they give a thorough workout to the substantive, quantitative results of the toolkit and any code merge that changes those substantive results needs to be closely scrutinized to understand why.

@sbenthall
Copy link
Owner Author

Point of order -- this PR is a duplicate of econ-ark#9
I at first made the mistake of making this PR on my personal fork.

I'll respond to these comments in appropriate places in the econ-ark repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants