# Space weather prediction

Space weather is a branch of space physics and aeronomy concerned with the time varying conditions within the Solar System, including the [solar wind](https://en.wikipedia.org/wiki/Solar_wind), emphasizing the space surrounding the Earth, including conditions in the magnetosphere, ionosphere, thermosphere, and exosphere. [Wikipedia]

We will be interested in the solar wind. It is monitored by the GOES satellites, currently GOES 14 and 15:
![GOES 14-15](goes_14-15.jpg)

They measure the solar particles flux, and the measurements are available on NOAA website: [current](https://www.swpc.noaa.gov/products/goes-electron-flux) and [historical measurements](ftp://ftp.swpc.noaa.gov/pub/lists/particle/) on https://www.swpc.noaa.gov/.

![GOES electron flux](20180224_electron.gif)

One of possible options how to effectively model a variable without any suitable regressors is to exploit the [**autoregresive models**](https://en.wikipedia.org/wiki/Autoregressive_model) of pth order,

$$
y_t = \beta_0 + \beta_1 y_{t-1} + ... = \beta_p y_{t-p} + \varepsilon_t = \beta^\intercal x_t + \varepsilon_t,
$$

where $x_t = [1, y_{t-1}, \ldots, y_{t-p}]^{\intercal}$. We assume the noise $\varepsilon_t$ to be iid normal.

We will use a simplified approach to AR models. In general, the theory is much richer.

Now, let us try to construct an AR(1) model. In addition, it is important to admit that the reality (particle flux) varies in time, but the ordinary Bayesian modelling assumes constant parameters. A popular heuristic workaround is to flatten the prior density before incorporation of new data. This flattening - called **forgetting** - increases the uncertainty about the variable of interest. The most simple algorithm - **exponential forgetting** - simply uses an exponentiation of the density,

$$
\left[\pi(\beta, \sigma^2|x_{0:t}, y_{0:t})\right]^\lambda, \qquad \lambda\in[0.95, 1],
$$

which, under conjugate priors, is equivalent to

$$
\xi_{t-1} \leftarrow \lambda \xi_{t-1}, \qquad \nu_{t-1} \leftarrow \lambda\nu_{t-1}.
$$

# Data set
    :Data_list: 20180225_Gp_part_5m.txt
    :Created: 2018 Feb 25 1536 UTC
    # Prepared by the U.S. Dept. of Commerce, NOAA, Space Weather Prediction Center
    # Please send comments and suggestions to SWPC.Webmaster@noaa.gov 
    # 
    # Label: P > 1 = Particles at >1 Mev
    # Label: P > 5 = Particles at >5 Mev
    # Label: P >10 = Particles at >10 Mev
    # Label: P >30 = Particles at >30 Mev
    # Label: P >50 = Particles at >50 Mev
    # Label: P>100 = Particles at >100 Mev
    # Label: E>0.8 = Electrons at >0.8 Mev
    # Label: E>2.0 = Electrons at >2.0 Mev
    # Label: E>4.0 = Electrons at >4.0 Mev
    # Units: Particles = Protons/cm2-s-sr
    # Units: Electrons = Electrons/cm2-s-sr
    # Source: GOES-15
    # Location: W135
    # Missing data: -1.00e+05
    #
    #                      5-minute  GOES-15 Solar Particle and Electron Flux
    #
    #                 Modified Seconds
    # UTC Date  Time   Julian  of the
    # YR MO DA  HHMM    Day     Day     P > 1     P > 5     P >10     P >30     P >50     P>100     E>0.8     E>2.0     E>4.0
     #-------------------------------------------------------------------------------------------------------------------------
    2018 02 25  0000   58174      0   5.65e+00  1.61e-01  1.26e-01  6.80e-02  5.50e-02  2.82e-02  6.53e+04  1.58e+03 -1.00e+05
    2018 02 25  0005   58174    300   6.13e+00  2.19e-01  1.53e-01  8.97e-02  7.67e-02  4.99e-02  6.53e+04  1.57e+03 -1.00e+05
    2018 02 25  0010   58174    600   5.24e+00  2.21e-01  1.86e-01  1.28e-01  1.15e-01  4.75e-02  6.53e+04  1.54e+03 -1.00e+05
    2018 02 25  0015   58174    900   7.44e+00  2.56e-01  1.40e-01  6.80e-02  5.50e-02  2.82e-02  6.46e+04  1.50e+03 -1.00e+05
    2018 02 25  0020   58174   1200   5.25e+00  3.48e-01  3.12e-01  1.36e-01  1.02e-01  4.14e-02  6.37e+04  1.49e+03 -1.00e+05
    2018 02 25  0025   58174   1500   6.91e+00  2.82e-01  1.70e-01  9.89e-02  8.60e-02  5.91e-02  6.35e+04  1.48e+03 -1.00e+05
    2018 02 25  0030   58174   1800   4.80e+00  1.90e-01  1.54e-01  9.66e-02  8.37e-02  3.74e-02  6.26e+04  1.45e+03 -1.00e+05

Let us try modelling of E>2, i.e., electrons with the energy larger than 2MeV. The AR(1) model

$$
y_t = \beta_0 + \beta_1 y_{t-1} + \varepsilon_t, \qquad \text{iid}\quad \varepsilon_t \sim\mathcal{N}(0, \sigma^2).
$$

Again, we use the NiG prior and are interested in predictions.

In [1]:
import sys
sys.path.insert(0, '../zdrojaky/')

import numpy as np
import matplotlib.pylab as plt
from nig import NiG

We load the data file. Note that it is similar to CSV (comma separated values) files, but the delimiters are spaces. We can use _np.genfromtxt()_. We skip the preamble (26 rows).

In [2]:
#file = 'ftp://ftp.swpc.noaa.gov/pub/lists/particle/20180224_Gs_part_5m.txt'
datafile = '20180224_Gs_part_5m.txt'
data = np.genfromtxt(datafile, skip_header=26)
e2 = data[:,13]
ndat = e2.size

Let us define the prior hyperparameters and the forgetting factor *forg_factor*. Its value is usually between 0.95 and 1, where 0.95 is considered very low.

In [None]:
xi0 = np.diag([1000, .1, .1])
nu0 = 5.
regmodel = NiG(xi0, nu0)

forg_factor = .95

Now we calculate predictions. The estimates $\hat{\beta}$ are in regmodel.Ebeta.

In [3]:
yt_pred = np.zeros(ndat)                        # here we save the predictions

for t in range(2, ndat):
    #####################################
    # enter regressor
    xt = np.array([***, e2[***]])
    #####################################
    yt = e2[t]
   
    #####################################
    # calculate predictions
    yt_pred[t] =
    #####################################
    
    # forgetting
    regmodel.xi *= forg_factor
    regmodel.nu *= forg_factor
    
    # update
    regmodel.update(yt, xt)
    regmodel.log()

SyntaxError: invalid syntax (<ipython-input-3-6dc444dca5bd>, line 6)

We are interested only in $\hat{\beta}$ and predictions $\hat{y}_t$ including the prediction quality. It is often measured by RMSE (root mean squarred error):

$$
RMSE = \sqrt{MSE} = \sqrt{\frac{1}{T}\sum_{t=1}^{T}(\hat{y}_t - y_t)^2}.
$$

Note that we need to skip as much data, as we needed for starting with modelling (it was not possible to start with the first measurement!).

Let us plot the true evolution and predictions, and the histogram and box plots of errors.

In [None]:
errors = e2[2:] - yt_pred[2:]

#####################################
# Calculate RMSE here
RMSE = 
#####################################
print('RMSE: ', RMSE)

plt.figure(figsize=(15, 3))
plt.plot(e2)
plt.plot(yt_pred, '+')

plt.figure(figsize=(15,3))
plt.subplot(1,2,1)
plt.hist(errors, bins=100)
plt.subplot(1,2,2)
plt.boxplot(errors, showfliers=False)
plt.show()

Evolution of $\hat{\beta}$:

In [None]:
Ebeta_log = np.array(regmodel.Ebeta_log)
plt.figure(figsize=(15, 5))
plt.subplot(3, 1, 1)
plt.plot(Ebeta_log[:,0])
plt.subplot(3, 1, 2)
plt.plot(Ebeta_log[:,1])
plt.show()