# Automated differentiation - gradients of red giant stars 

By using Autograd and refactoring most of my Keras code into vectorized numpy, I am able to have a high degree of leverage in observing what the neural network learns. Here, I investigate results, gradients and mathematical justification for what the neural networks learns and how it fits into the astrophysics picture. We have three (two) main models to analyze:

1. A classification model assuming the confidence of a RGB star is approximately logistic
2. A regression model for $\Pi_1$ of a red giant, the period spacing
3. A regression model for $\Delta{}v$, the large frequency seperation of a red giant

$$
\begin{equation}
\begin{aligned}
\text{Model 1} &= p(x|W, b) = \sigma(r(XW_1 + b_1)W_2 + b_2) &&\text{Architecture: } W_1 \in (7514, 32), b_1 \in (32,), W_2 \in (32, 1), b_2 \in (1,)\\
\text{Model 2} &= \hat{y}_{\Pi_1} = (r(r(XW_1 + b_1)W_2 + b_2)W_3 + b_3)W_4 + b_4 &&\text{Architecture: } W_1 \in (7514, 8), b_1 \in (8,), W_2 \in (8, 4), b_2 \in (4,), W_3 \in (4, 2) b_3 \in (2,), W_4 \in (2, 1), b_4 \in (1,)\\
\end{aligned}
\end{equation}
$$

Where $x$ is random variable of stellar spectra to the set $\{0, 1\}$ that is Bernoulli distributed with probability $p = p(x)$. $r$ is the ReLU function, $\max(0, x)$, and $\sigma$ is the sigmoid function, $\frac{1}{1+e^{-z}}$. The reason for the relatively thin layers in the regression for period spacing is to avoid overfitting and also because there didn't seem to be a huge cost in loss on the validation set, even when we botteleneck the last layer to a vector in $\mathbb{R}^2$. This is actually good for forcing the network to learn a feature rich vector and also is easy to visualize. 

With this explicit equation, we can derive the gradients easily - in this case, we use `Autograd` to perform automatic differentation on `numpy` operations in Python to obtain rich mathematical results. 

##  Gradients of red giants - classification

Here, we take the gradient of the prediction with respect to the input vector of red giants and average the gradient to get an average picture of what features in the stellar spectra influence the greatest change in the prediction, explicitly we are plotting

$$\text{Average gradient of red giants} = \frac{1}{N}\sum_{i=1}^N\nabla_{\hat{p}}(x^{(i)})$$

![title](plots/gradient/fig1.png)

From this plot we can see immediately that the areas of highest magnitude are $(5444, 5585), (6850, 7080)$. Here we observe them in wavelength space and look at the corresponding ions. Below is the gradient with the x-axis in the open interval $(6850, 7080)$ converted to wavelength space $(16740, 16840)$

![title](plots/gradient/fig2.png)

This interval of wavelength mostly corresponds to Fe, Iron, specifically Fe I. However, the noticable dip downwards at around $16820.8$ corresponds to the ion Th I - Thorium. Since the output of $\hat{y}$ is the probability that a star is a red clump, this indicates a increase in Thorium leads to the lower probability that a star is a red clump. 

## Gradients of red giants - $\Pi_1$ regression

Here we take the gradient of the regression on $Pi_1$ of a star and normalize it, averaging the gradient of all the training examples in the entire Kepler data set. Specifically plotted below is:

$$\text{Average gradient of red giants} = \frac{1}{N}\sum_{i=1}^N\nabla_{\hat{y}}(x^{(i)})$$

![title](plots/gradient/fig4.png)

Some sections in the gradient that are especially noisy and high in magnitude, as well as unusually tall are $(0, 980), (1843, 3558)$. Below is the gradient plotted in wavelength space for $(0, 980)$

![title](plots/gradient/fig5.png)

## Gradient comparisons
![title](plots/gradient/fig3.png)