# Prospective Learning: A quick introduction

Learning involves updating decision rules based on past experience to improve future performance. Probably approximately correct (PAC) learning has been
extremely useful to develop algorithms that minimize the risk—typically defined as the expected loss—on unseen samples under certain assumptions. The assumption, that samples are independent and identically distributed (IID) within the training dataset and at test time, has served us well. But it is neither testable nor believed to be true in practice. The future is always different from the past: both distributions of data and goals of the learner may change over time. Moreover, those changes may cause the optimal hypothesis to change over time as well. Although, numerous approaches have been developed to address this issue, we still lack a first-principles framework to address problems where data distributions and goals may change over time in such a way that the optimal hypothesis is time-dependent.

We have developed a theoretical framework called "Prospective Learning". Instead of data arising from an unknown probability distribution like in PAC learning, prospective learning assumes that data comes from an unknown stochastic process, that the loss considers the future, and that the optimal hypothesis may change over time. A prospective learner uses samples received up to some time $t \in \mathbb{N}$ to output an infinite sequence of predictors, which is uses for making predictions on data at all future times $t' > t$. A prospective learner minimizes the expected cumulative risk of the future using past data. To properly define such a learner, let's first define several key ingredients.

### Definitions

1. **Data**: $z_t = (x_t, y_t)$ is the datum at time $t$. Data is drawn from a stochastic process $Z \equiv (Z_t)_{t \in \mathbb{N}}$. Past data $z_{\leq t} \equiv (z_1, \dots, z_t)$, future data $z_{> t} \equiv (z_{t+1}, \dots)$

2. **Hypothesis Class**: A prospective learner selects an infinite sequence of hypotheses $h \equiv (h_1,\dots,h_t,h_{t+1},\dots) \in \mathcal{H}$

3. **Learner**: A map from past data $z_{\leq t}$ to a $h \in \mathcal{H}$

4. **Prospective Loss**: Future loss incurred by a hypothesis $h$

$$
    \bar \ell_t(h, Z) = \limsup_{\tau \to \infty} \frac{1}{\tau} \sum_{s=t+1}^{t+\tau} \ell (s, h_s(X_s), Y_s)
$$

5. **Prospective Risk**: Prospective risk at time $t$ is the expected future loss

$$
    R_t(h)
    = \mathbb{E} [ \bar \ell_t(h,Z) \mid z_{\leq  t} ]
    = \int \bar \ell_t(h,Z) \ \mathrm{d} \mathbb{P}_{Z \mid z_{\leq t}}
$$

6. **Prospective Bayes Risk**: Minimum achievable prospective risk by any learner that observes past data $z_{\leq t}$

$$
    R_t^* = \inf_{h\in \sigma(Z_{\leq t})}  R_t(h)
$$

The following schematic illustration depicts a prospective learner.

<div style="text-align: center;">
    <img src="../assets/cartoon.jpg" alt="Alt text" style="width: 30%;"/>
</div>

### A simple data generating process

Suppose there are two binary classification problems (“tasks”) where the input is one-dimensional. Inputs for both tasks are drawn from a uniform distribution
on the set $[−2, −1] \cup [1, 2]$. Ground-truth labels correspond to the sign of the input for Task 1, and the negative of the sign of the input for Task 2. Now consider a stochastic process where the task switches every $20$ time steps.

<span style="color: red;">@Rice, add the schematic figures of Task A and task B here including the one that shows that the tasks are alternating. Also plot some samples from each task. Add code for sampling data from this process</span>

### Online- and Continual-learning algorithms are falling short

Add a short introduction about the methods

<span style="color: red;">@Rice, add code that implements FTL, OGD, and BGD and plot the prospective and instantaneous risks</span>

### Prospective Multi-layer Perceptron

Add a short introduction about prospective ERM and how prospective MLP implments it

<span style="color: red;">@Rice, add code that implements prospective MLP and plot the prospective and instantaneous risks</span>

Add the final figure where all the plots are put together

### References
