-
Notifications
You must be signed in to change notification settings - Fork 1
Fix dream wfm tests #280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix dream wfm tests #280
Conversation
|
I'm not sure if I'm misunderstanding what you are saying, but it sounds to me like you're saying the failures were flukes and they don't have anything to do with the recent release of tof. The nightlies run three times every night (once on main, and once on release, and once with the oldest allowed dependencies), and this particular failure has not happened before. But last night it happened twice, on both main and the latest release. Meanwhile it didn't happen in the nightly that uses the old version of tof. That strongly indicates there's some change in the last tof release that made this much more likely than before. Edit: The outliers do look very harmless though. I'm not saying they're necessary a problem. Just that it'd be good to know why they happen more frequently now. |
I don't think so. Something changed in tof for sure: how the pulse is generated. Old: to select neutrons, do a 1d New: we have a 2d probability distribution, that depends on birth time and wavelength. We flatten the distribution, do a single Overall, it should give quantitatively the same result, but statistical fluctuations will exist. |
|
You can try it yourself: changing the seed to from 88 to 345 causes the test to fail (using the old tof) |
|
Do we use the same seed every run? In that case that explains it a bit better. I thought we used different seeds every run. Although still doesn't explain why it didn't fail in the nightly test with old dependencies. But like you say that could just be a fluke. |
We have 3 cases: dream, dream_with_time_overlap, v20. I still don't know where the outliers come from, and it would probably be good to find out...
Why does it not explain that? |
|
Okay, I saw now that in the failing nighly only one of the three cases you mentioned (
I misunderstood, I thought the seed had changed, but of course the only thing that changed is the package. If you're confident there's no issues with this then I'm fine with that, adjusting the threshold seems like a reasonable fix. |
|
I think I understand where the outliers come from. It's not a coincidence that they only appear in the test case where we have a time overlap in the 2 WFM frames. That region get subsequently masked out, according to how high the variance is. The outliers are all located around that central region between the frames, where uncertainty is highest (I plotted the outliers in black on the lookup table) In the present table, we have masked out the region with large uncertainty, but that was just based on an arbitrary threshold we chose. So it's basically that close to the regions, uncertainties get larger and exceed the limit we used in the test. If we had either
or
then the test would probably have passed? |
|
Maybe the name |
|
I'm not really sure I understand the distinction in this context.
right? Personally I prefer that threshold quantity, for two reasons:
|
For example, we could make the condition for masking the lookup table be: where |
|
I think I meant to say that the term "error" can be interpreted in different ways. |
|
Makes sense 👍 I'm fine with calling it something like uncertainty threshold or standard deviation threshold too. |


With the latest release of
tof, the nightly tests started failing.It turns out that when computing the wavelengths of neutrons, there was a chance of less than 1 in 10,000 to have some outliers where the error is greater than 2%.
Here is a figure of the error of each neutron in the test, using 100K neutrons. We see 5 outliers.

I checked and even with the previous version of
tof, if we use more neutrons than the 10K used in the test, we also start to see the outliers. So it is just a statistical fluctuation and we were simply lucky not to see it before.To fix, we relax the percentile threshold by a very small amount (100 -> 99.9) to make the test more robust to future changes.