# Everest 2 Diagnostics

<br><span style="font-size:150%; font-weight:bold;"><span style="color:gray;">[11/09/16]</span> Optimizing the number of neighbors</span>

Looks like the more neighbors you add, the better you do at the faint end but the worse you do at the bright end. I imagine that this is because adding more stars reduces the amplitude of the regularization parameter, so you end up decreasing the weight of the star's own pixel vectors. For faint stars that's OK, since there's not that much information in them, but for bright stars you end up losing de-trending power. The plot below is the **estats** output comparing a run with 10 neighbors and a run with 50 neighbors. It is clear that 10 neighbors is the way to go.
![10 vs 50 neighbors](images/number_of_neighbors.png "10 vs 50 neighbors")

<br><span style="font-size:150%; font-weight:bold;"><span style="color:gray;">[11/09/16]</span> Neighboring PLD</span>

Plotted below is the comparison between the nPLD(PLD) results (blue) and the Everest 1.0 results (yellow). I've zoomed in on the relative CDPP plot on the right. It looks awesome. Ignore the saturated stars, since I wasn't yet implementing column collapse -- stars brighter than mag 11 are definitely being overfit (issue resolved now!). My current goal is to reproduce this *without* running PLD beforehand to get a factor of 2 speed boost. ![Original nPLD](images/nPLD_original.png "Original nPLD")

<br><span style="font-size:150%; font-weight:bold;"><span style="color:gray;">[11/09/16]</span> Super-detrended cloud</span>

There is a cloud of super-detrended stars just below magnitude 11.5. These stars did not have their columns collapsed since they were just shy of the saturation threshold. Based on the ratio of their max pixel flux to the saturation flux (see list below), I suggest a threshold of -10% when collapsing columns.

![Super-detrended](images/super_detrended.png "Super-detrended")

```0.940860801777, 0.919384317658, 0.944862331189, 0.958934950546, 0.93954328125, 0.946145564303, 0.96732639277, 0.946067420072, 0.944256371287, 0.85614310767, 0.956884969351, 0.931195326593, 0.892291594039, 0.895342764563, 0.940960196554, 0.984913263267, 0.926170642072```

<br><span style="font-size:150%; font-weight:bold;"><span style="color:gray;">[11/10/16]</span> Saturation threshold of -10%</span>

Here's what the nPLD(PLD) model looks like with a saturation threshold of -0.1 (comparing to v1). The super-detrended cloud is gone. I only see two stars below the continuum, so I think this is a good limit.

![Lower saturation threshold](images/saturation_threshold.png "Lower saturation threshold")

<br><span style="font-size:150%; font-weight:bold;"><span style="color:gray;">[11/10/16]</span> Simple nPLD</span>

The downside of nPLD is that the PLD model must be run beforehand, so you're running `everest` twice for each star. We can instead run the model just once -- I call this *simple* nPLD. The difference is that when PLD is run beforehand, we know where the outliers are for each of the neighboring stars, so we mask them when using their pixels as regressors. With *simple* nPLD, the outliers (thruster fires, deep transits, etc.) are present in the regressors. But as it turns out, it makes no difference! In fact, I find slightly **better** de-trending with *simple* nPLD, as shown below.

![Simple](images/simplenPLD.png "Simple nPLD")

The actual light curves look pretty good, too. And comparing the outliers histogram to the nPLD case, this does not introduce extra outliers. Out of curiosity, I also tested what would happen if we did a quick-and-dirty iterative sigma clipping to remove outliers in the regressors. This involves using a guess for the covariance matrix, since I don't want to waste time optimizing the GP. As it turns out, this leads to slightly *worse* de-trending than regular nPLD, so we're not going to do this.

![No Thrusters](images/nothrusters.png "No Thrusters nPLD")

I tested all this on campaign 6.0. The data for the original neighboring PLD scheme is in [stats/NeighborRecursive.tsv](stats/NeighborRecursive.tsv); the data for the new "simple" scheme is in [stats/SimpleNeighborRecursive.tsv](stats/SimpleNeighborRecursive.tsv); and the data for the poor outlier removal simple nPLD is in [stats/NoThrustersRecursive.tsv](stats/NoThrustersRecursive.tsv).

I added a **parent_model** keyword argument to nPLD, which by default will be `None`, meaning nPLD is run on its own (i.e., *simple* nPLD). Setting **parent_model** to "PLD" restores the previous nPLD functionality.

<br><span style="font-size:150%; font-weight:bold;"><span style="color:gray;">[11/10/16]</span> Recursive PLD</span>

While cleaning up the code, I found that I was inadvertently normalizing the pixel weights by the de-trended flux at the current PLD order, rather than by the raw flux. This is what I used to call "recursive" PLD, and I decided against it a while back for some reason. But it turns out that it actually gives you a 1-2% boost in the de-trending power, especially for faint stars. Here I'm comparing recursive nPLD to nPLD (both computed the "simple" way):

![Recursive](images/recursive.png "Recursive nPLD")

Again, these tests were performed on c6.0. The data for the non-recursive simple nPLD is in [stats/SimpleNeighbor.tsv](stats/SimpleNeighbor.tsv).

I've added a **recursive** keyword argument, which from now on will be `True` by default.