Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in TS calculation of StandardLLH #237

Open
sgriswol opened this issue Dec 16, 2022 · 2 comments
Open

Error in TS calculation of StandardLLH #237

sgriswol opened this issue Dec 16, 2022 · 2 comments

Comments

@sgriswol
Copy link

Hello Flarestack Devs,

In flarestack.llh.StandardLLH.calculate_test_statistic() I believe that there may be an error in the TS calculation. Specifically, the order of summation over sources & events may be reversed. For convenience, I am looking at L919--990.

Below I have used some of the notation from Eqs. 2 & 5 of ICRC2021_1116. I have also attached an expression for a floating n_s TS based on Eq. 6 of the ICRC proceeding.
floating_ns_TS

What Flarestack is doing:

  • (L938) Iterating over all M sources...
  • (L957--969) Assuming there are n_coinc > 0 coincident events for a source j, the SoB ratio terms for all coincident events of source j are computed. I assume this is to save computational effort.
  • (L963) These are stored in numpy arrays and are appended to list x. x will have length M after all sources have been iterated over, but each element of x may have different length.
  • (L976), each element y of list x is first element-wise logged and then summed. Since each element has shape corresponding to n_coinc, these operations are performed over all coincident events for source. Then, a sum is taken over all sources.
  • (L978--982) Adding in contributions from background terms

I believe the error is at L976. To attempt to rephrase what I think flarestack is doing: It first takes the log of all n_coinc events' SoB ratio terms for a particular source j, sums them, and then sums over all sources.

Shouldn't the innermost sum be over sources, inside the log? That is to say, shouldn't flarestack first sum over all m sources' SoB ratio term for a particular event i, take the log, and sum over all N events.

Please let me know if you have any further questions, want clarification, or if there's something that I've missed.

@JannisNe
Copy link
Collaborator

Hi, I think the thing is that implicitly it is assumed that each neutrino can only be coincident with one source, i.e. that the distances between the sources are larger than the angular uncertainties. So that when a neutrino is close to one source it will be far away from any other source and the spatial PDF will be non-zero only for the close source. That means that each element of y in L976 is already the sum over all sources for each event. It's just that the sum has only one summand.

However, when going through this I think there should be an extra term in L976.

With $\omega_{i, j} = S_j(\theta_i, \gamma) / B(\theta_i)$ we have

$$
\begin{align} \
\lambda &= 2 \cdot \left[ \sum i^N \log \left( \left[ \sum j^M \frac{n{s,j}}{N} \left(\omega{i, j} -1 \right) \right] +1 \right) \right] \
&:= 2 \cdot \left[ \sum _i^N \log \left( A_i \right) \right]
\end{align}
$$

$A_i$ is y in L976.

Now assume that for each neutrino event $i$ only one $\omega_{i,j}$ can be non-zero. Without loss of generalisation we can call that source $j_k$ so that

$$ \omega_{i,j} = \begin{cases} S_j(\theta_i, \gamma) / B(\theta_i) & \mathrm{if} j=j_k \\ 0 & \mathrm{else} \end{cases} $$

and we get

$$ \begin{align} \\ A_i &= \left[ \sum_j ^M \frac{n_{s,j}}{N} \left( \omega_{i,j} -1 \right) \right] +1 \\ &= \frac{n_{s, j_k}}{N} \left( \omega_{i,j_k} -1 \right) + 1 + \sum_{j \neq j_k} ^M \frac{-n_{s,j}}{N} \\ &= \frac{n_{s, j_k}}{N} \left( \omega_{i,j_k} -1 \right) + 1 - \frac{n_s - n_{s, j_k}}{N} \end{align} $$

How the code is now I think it only calculates

$$ A_i = \frac{n_{s, j_k}}{N} \left( \omega_{i,j_k} -1 \right) + 1 $$

so I wonder if the $- \frac{n_s - n_{s, j_k}}{N}$ is missing. In the code we would just have to add a
- (kwargs["n_coincident"] - n_j) / kwargs["n_all"] in L968 and L955

@JannisNe
Copy link
Collaborator

Thinking about what that means for past results I don't think it changes anything. $- (n_s - n_{s, j_k})/N$ will be very small because the number of signal neutrinos $n_s$ will be a lot smaller than the total number of events in the sample $N$.

Still, we should change this. Talking to @robertdstein I think adding - (kwargs["n_coincident"] - n_j) / kwargs["n_all"] is not the way to go. Instead, we should do the explicit sum over the sources first. This is a bit tricky because we have to track which neutrino events the terms in L963-969 belong to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants