### Course Description:

This graduate-level course offers a practical approach to probabilistic learning with Gaussian processes (GPs). GPs represent a powerful set of methods for modeling and predicting a wide variety of spatio-temporal phenomena. Today, they are used for problems that span both regression and classification, with theoretical foundations in Bayesian inference, reproducing kernel Hilbert spaces, eigenvalue problems, and numerical integration. Rather than focus *solely* on these theoretical foundations, this course balances theory with practical probabilistic programming, using a variety of ``python``-based packages. Moreover, practical engineering problems will also be discussed, that see GP models that cut across other areas of machine learning including transfer learning, convolutional networks, and normalizing flows. 

### Pre-requisites:

- CS1371, MATH2551, MATH2552 (or equivalent)
- Working knowledge of ``python`` including familiarity with ``numpy`` and ``matplotlib`` libraries. 

## Lectures

Below you will find a list of the lectures that form the backbone of this course.


#### L1 | Probability fundamentals I
<details>
<summary>Contents</summary>
    
- Probability
- Conditional probability and independence
- Expectation of a random variable
- Probability density function for a continuous random variable
- Key discrete probability mass functions
- Key continuous probability density functions
</details>

#### L2 | Probability fundamentals II
<details>
<summary>Contents</summary>
    
- Functions of random variables
- Multivariate distributions
- Decision and estimation: basic definitions
- Tests of significance
</details>


#### L3 | Introduction to Bayesian inference
<details>
<summary>Contents</summary>
    
- Introduction to Bayesian modelling
- Conjugacy with distributions
- Bayesian polynomial regression
</details>

#### L4 | The uniqueness of the Normal distribution
<details>
<summary>Contents</summary>
    
- Marginal distributions
- Conditional distributions
- Nataf (and other) transforms
</details>

#### L5 | Exact vs. approximate inference
<details>
<summary>Contents</summary>
    
- Maximum likelihood and maximum aposteriori estimate
- Markov chain monte Carlo
- Expectation maximization
</details>

#### L6 | Introduction to Gaussian processes

<details>
<summary>Contents</summary>
    
- Motivation and parallels with Bayesian optimization
- Mercer kernels (spectral densities, periodic, Matern, squared exponential)
- Making new kernels from old ones
- Hyperparameters (and their hyperparameters)
</details>

#### L7 | Gaussian likelihoods

<details>
<summary>Contents</summary>
    
- Prediction using noise-free observations
- Prediction with noisy observations
- Weight-space vs. function-space perspectives
- Semi-parametric models
</details>

#### L8 | Gaussian & non-Gaussian likelihoods

<details>
<summary>Contents</summary>
    
- Reproducing kernel Hilbert spaces
- *Representer* theorem
- Non-Gaussian likelihoods and classification
</details>

#### L9 | Scaling Gaussian processes

<details>
<summary>Contents</summary>
    
- Computing the matrix inverse via Cholesky decomposition
- Subset of data approaches
- Nystrom approximation
- Inducing points
</details>

#### L10 | Scaling Gaussian processes II

<details>
<summary>Contents</summary>
    
- Variational inference
- ELBO derivation
- Minimizing the KL-divergence practically
</details>

#### L11 | Multiple kernel learning

<details>
<summary>Contents</summary>
    
- Empirical Bayes
- Multiple kernel learning
- Generalized additive models
</details>

#### L12 | Revisiting hyperparameter training

<details>
<summary>Contents</summary>
    
- HMC, NUTS, and Gibbs sampling for MCMC
</details>

#### L13 | Gaussian processes and deep neural networks

<details>
<summary>Contents</summary>
    
- Single and deep MLPs
- Deep Gaussian processes
- Posterior inference
</details>


#### L14 | Designing bespoke deep Gaussian processes

<details>
<summary>Contents</summary>
    
- Covariance structure
- Warping inputs
- Parallels to normalizing flows
- Hierarchy of hyperparameters
</details>

#### L15 | Time-series forecasting with Gaussian processes

<details>
<summary>Contents</summary>
    
- Linear Gaussian state space model
- Kalman smoother
</details>

#### L16 | Conditioning on linear operators

<details>
<summary>Contents</summary>
    
- Integral operators
- Differential operators
</details>

#### L17 | Multi-output Gaussian processes

<details>
<summary>Contents</summary>
    
- Coregional models
- Transfer learning across outputs
</details>

#### L18 | Group Polynomial and other kernels

<details>
<summary>Contents</summary>
    
- Orthogonality and data distribution
- Numerical quadrature
</details>

## Office hours

Professor Seshadri's office hours:

| Location  | Time    |
| --------  | ------- |
| GU 341    | TBD   |
| GU 341    |TBD    |

Location may change during term. 

## Textbooks

This course will make heavy use of the following texts:

- Rasmussen, C. E., Williams, C. K. *Gaussian Processes for Machine Learning*, The MIT Press, 2006.
- Murphy, K. P., *Probabilistic Machine Learning: Advanced Topics*, The MIT Press, 2023.

## Important papers

Students are encouraged to read through the following papers:

- [Roberts, S., Osborne, M., Ebden, M., Reece, S., Gibson, N., Aigrain, S., (2013) *Gaussian processes for time-series modelling*, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.](https://doi.org/10.1098/rsta.2011.0550)

- [Dunlop, M., Girolami, M., Stuart, A., Teckentrup, A., (2018) *How Deep Are Deep Gaussian Processes?*, Journal of Machine Learning Research 19, 1-46](https://www.jmlr.org/papers/volume19/18-015/18-015.pdf)

- [Alvarez, M., Lawrence, N., (2011) *Computationally Efficient Convolved Multiple Output Gaussian Processes*, Journal of Machine Learning Research 12, 1459-1500](https://www.jmlr.org/papers/volume12/alvarez11a/alvarez11a.pdf)

- [Van der Wilk, M., Rasmussen, C., Hensman, J., (2017) *Convolutional Gaussian Processes*, 31st Conference on Neural Information Processing Systems](https://dl.acm.org/doi/pdf/10.5555/3294996.3295044)