Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to understand the testing frame work #169

Open
mkhattab940 opened this issue Apr 14, 2018 · 9 comments
Open

Trying to understand the testing frame work #169

mkhattab940 opened this issue Apr 14, 2018 · 9 comments

Comments

@mkhattab940
Copy link
Contributor

mkhattab940 commented Apr 14, 2018

Hi @zaxtax

My team and I have been doing working on Hakaru for our undergrad capstone project. We are wrapping things up and writing a paper based on our results. However, we've been having trouble formulating our hypothesis.

We've been writing a number of RoundTrip test cases to test known relationships between distributions. However, we are having trouble understanding what exactly is going on with these tests. We've tried diving into the related files, but none of us are knowledgeable enough with Haskell to figure it out. Best we can tell is it grabs the test cases and passes them to some Maple environment.

@JacquesCarette has told us to ask you to explain it to us.

I'll explain what we've figured out so far and then hopefully you can fill in the gaps for us. So for example, we have added a test with the following result:

### Failure in: 6:RoundTrip:7:2:t_rayleigh_to_stdChiSq:0
haskell/Tests/TestTools.hs:130
expected:
chiSq_iid = fn n nat:
            fn mean real:
            fn stdev prob:
            q <~ plate _ of n: normal(mean, stdev)
            return summate i from 0 to size(q):
                   ((q[i] - mean) * prob2real(1/ stdev)) ^ 2
standardChiSq = fn n nat: chiSq_iid(n, nat2real(0), nat2prob(1))
standardChiSq(2)
but got:
q307 <~ normal(+0/1, 1/1)
q315 <~ normal(+0/1, 1/1)
return q307 ^ 2 + q315 ^ 2
Cases: 338  Tried: 287  Errors: 2  Failures: 20
                                               
### Failure in: 6:RoundTrip:7:2:t_rayleigh_to_stdChiSq:1
haskell/Tests/TestTools.hs:130
expected:
chiSq_iid = fn n nat:
            fn mean real:
            fn stdev prob:
            q <~ plate _ of n: normal(mean, stdev)
            return summate i from 0 to size(q):
                   ((q[i] - mean) * prob2real(1/ stdev)) ^ 2
standardChiSq = fn n nat: chiSq_iid(n, nat2real(0), nat2prob(1))
standardChiSq(2)
but got:
X3 <~ uniform(+0/1, +1/1)
return log(real2prob(X3)) * (-2/1)
Cases: 338  Tried: 288  Errors: 2  Failures: 21

So for each test case it looks like 2 tests are run. I've messed around with hk-maple a bit and it looks like this is roughly what is happening:

  • 0-test: run default and Summarize modes on the expected file and check if their outputs match
  • 1-test: run default mode on the expected file again, run Summarize mode on the test file and check if their outputs match

However, these outputs don't always exactly match the outputs when I run Summarize on these files (although they are very close) so I don't think this is exactly what is happening. Can you clarify how these outputs are generated?

We would also like to make sure we understand the purposes of both tests. The 0-test seems to be some sort preliminary test before the 1-test tests the actual relationship we are interested in. As far as I can tell, Summarize seems to be a more ambitious version of Simplify. So I expect if their outputs are equal, Summarize can't do any better, right? I think Dr. Carette had said it has to do with making sure some sort of change of variables is done correctly. Can you expand on this?

I know the 1 test is meant to produce equivalent code for Hakaru files describing equivalent distributions. Can you explain how the inference algorithms used in the test are meant to accomplish this?

For reference, this is the hypothesis we are currently working with:

Assume we know a relationship between 2 statistical distributions which transforms distribution A into distribution B, which we are able to prove by analyzing their PDFs.

We hypothesize that by applying the appropriate transformations on an implementation of distribution A in hakaru, we can create a hakaru program whose hk-maple output will be a hakaru program that is equivalent to the hakaru program output by hk-maple run on an implementation of distribution B.

Really appreciate your help with this.

@zaxtax
Copy link
Member

zaxtax commented Apr 14, 2018

Hello @maymoo99 !

So I don't see anything wrong with what the tests are doing. Simplify will inline function calls, which is basically what it did. You can check this behavior by calling simplify on

def add(x, y): x + y
add(3, 4)

As you might know the maple code has no knowledge of chi^2, only simple distributions like normal, uniform, etc. So when code is emitted it will use the sum of two normals. So test 0 seems fine to me.

I don't fully understand where test 1 comes from and need to dig in further. Generally you should trace this behavior to haskell/Tests/TestTools.hs

Summarize is not necessarily a more powerful version of Simplify. Summarize rewrites programs looking for every particular looping structure in the code. Simplify is actually trying to rewrite code to find closed forms for different distributions. Their roles are different. A common pattern is to call Simplify followed by Summarize.

I hope that helps for now, and I'll add more comments as I get a better idea what's going on.

@JacquesCarette
Copy link
Contributor

One of the key pieces of information (for me!) in the above is that our tests are run twice, once with Simplify and once with Summarize -- I did not know this. That would indeed explain quite a few symptoms. At least, if that is indeed what is going on...

Hmm, except that I can't see anything in haskell/Tests/*.hs that mentions Summarize in any form. So I am still quite puzzled as to why it seems that all tests are run twice. I looked through the code, and couldn't find out what's going on there.

@mkhattab940
Copy link
Contributor Author

mkhattab940 commented Apr 17, 2018

OK, so here's something interesting:

[khattm@cps02 ~/hakaru/tests/RoundTrip2] hk-maple t_rayleigh_to_stdChiSq.expected.hk 
chiSq_iid = fn n nat:
            fn mean real:
            fn stdev prob:
            q <~ plate _ of n: normal(mean, stdev)
            return summate i from 0 to size(q):
                   ((q[i] - mean) * prob2real(1/ stdev)) ^ 2
standardChiSq = fn n nat: chiSq_iid(n, nat2real(0), nat2prob(1))
standardChiSq(2)

[khattm@cps02 ~/hakaru/tests/RoundTrip2] hk-maple -c Simplify t_rayleigh_to_stdChiSq.expected.hk 
q307 <~ normal(+0/1, 1/1)
q315 <~ normal(+0/1, 1/1)
return q307 ^ 2 + q315 ^ 2

[khattm@cps02 ~/hakaru/tests/RoundTrip2] hk-maple -c Summarize t_rayleigh_to_stdChiSq.expected.hk 
q <~ plate _ of 2: normal(+0/1, 1/1)
return summate i from 0 to size(q): q[i] ^ 2

---

[khattm@cps02 ~/hakaru/tests/RoundTrip2] hk-maple -c Simplify t_rayleigh_to_stdChiSq.0.hk 
X3 <~ uniform(+0/1, +1/1)
return log(real2prob(X3)) * (-2/1)

[khattm@cps02 ~/hakaru/tests/RoundTrip2] hk-maple -c Summarize t_rayleigh_to_stdChiSq.0.hk 
x <~ X <~ X <~ uniform(+0/1, +1/1)
          return log(real2prob(X)) * (-2/1)
     return sqrt(real2prob(X))
return prob2real(x ^ 2)

Comparing this with the logs, it looks like this is how RoundTrip test cases run:

0:

  • expected: hk-maple t_rayleigh_to_stdChiSq.expected.hk
  • got: hk-maple -c Simplify t_rayleigh_to_stdChiSq.expected.hk

1:

  • expected: hk-maple t_rayleigh_to_stdChiSq.expected.hk
  • got: hk-maple -c Simplify t_rayleigh_to_stdChiSq.0.hk

So, it's not Summarize, but Simplify being run to produce the 'but got' code. Now I'm really confused because the documentation clearly states that Simplify is hk-maple's default mode. But then the commands run in the 0-test would be the same no matter what... What's really going on here?

@mkhattab940
Copy link
Contributor Author

@zaxtax To clarify, it's not so much that we think that there is something wrong with the tests so much as we are trying to understand how they are meant to work. My end goal for this discussion is to refine our hypothesis (in italics above) and to be able to speak accurately about our results in the paper we're writing.

Why does the expected file have to pass the 0-test? What exactly are we testing here? What does it imply about the implementation if it passes? What might it imply about the implementation if it doesn't?

It seems the 0-test has to pass if the 1-test has a chance of passing. When the 0-test fails, in general it seems like the "but got" result gets further simplified than the expected result. Why not reverse the roles of the expected/got algorithms in the 0-test and then the 1-test could compare the outputs of -c Simplify on both files? Is the intention to prevent the 1-test from passing if the 0-test doesn't pass?

If I could think of the 1-test as running the same Simplification algorithm on both files (since, for 0-tests that pass, the 1-test will produce the same result using either algorithm on my expected file) it becomes immediately obvious how we're using the 1-test to test relationships between distributions. But why should 2 different optimization algorithms run on 2 different Hakaru programs produce the same result when the distributions they implement are equivalent? The story seems to be more complicated than just "if we can simplify both models to the same model, then Hakaru recognizes the relationship" which is essentially the level of nuance we are able to discuss our results at the moment.

I hope these questions can be answered while keeping the discussion relatively high-level.

@mkhattab940
Copy link
Contributor Author

@zaxtax @JacquesCarette bump

@JacquesCarette
Copy link
Contributor

I looked at the code, and I can't figure out where the whole 0-tests versus 1-tests comes from. Perhaphs @ccshan knows?

@zaxtax
Copy link
Member

zaxtax commented Apr 19, 2018 via email

@zaxtax
Copy link
Member

zaxtax commented Apr 24, 2018

I'm still trying to trace the code. Was my hunch that both programs are being passed to Simplify correct?

@JacquesCarette
Copy link
Contributor

I'm not even sure of that. I can't even see where the test suite is being run twice! [And I don't know if it was @yuriy0 or @ccshan who set that up]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants