### Course Description:

This graduate-level course offers a practical approach to probabilistic learning with Gaussian processes (GPs). GPs represent a powerful set of methods for modeling and predicting a wide variety of spatio-temporal phenomena. It is used for problems that span both regression and classification, with theoretical foundations in Bayesian inference, reproducing kernel Hilbert spaces, eigenvalue problems, and numerical integration. Rather than focus solely on these theoretical foundations, this course balances theory with practical probabilistic programming, using a variety of ``python``-based packages. Practical engineering-relevant problems will also be discussed, cutting across other areas of machine learning such as transfer learning, deep models, and normalizing flows. 

### Pre-requisites:

- CS1371, MATH2551, MATH2552 (or equivalent)
- Working knowledge of ``python`` including familiarity with ``numpy`` and ``matplotlib`` libraries. 

## Lectures

This is a preliminary schedule; it may change throughout term. 


#### L1 | Probability fundamentals I
<details>
<summary>Contents</summary>
    
- Probability
- Conditional probability and independence
- Expectation of a random variable
- Probability density function for a continuous random variable
- Key discrete probability mass functions
- Key continuous probability density functions
</details>

#### L2 | Probability fundamentals II
<details>
<summary>Contents</summary>
    
- Functions of random variables
- Multivariate distributions
- Decision and estimation: basic definitions
- Tests of significance
</details>


#### L3 | Introduction to Bayesian inference
<details>
<summary>Contents</summary>
    
- Introduction to Bayesian modelling
- Conjugacy with distributions
- Bayesian polynomial regression
</details>

#### L4 | The uniqueness of the Normal distribution
<details>
<summary>Contents</summary>
    
- Marginal distributions
- Conditional distributions
- Nataf (and other) transforms
</details>

#### L5 | Exact and approximate inference
<details>
<summary>Contents</summary>
    
- Maximum likelihood and maximum aposteriori estimate
- Markov chain monte Carlo
- Expectation maximization
</details>

#### L6 | Introduction to Gaussian processes

<details>
<summary>Contents</summary>
    
- Motivation and parallels with Bayesian optimization
- Mercer kernels (spectral densities, periodic, Matern, squared exponential)
- Making new kernels from old ones
- Hyperparameters (and their hyperparameters)
</details>

#### L7 | Gaussian likelihoods

<details>
<summary>Contents</summary>
    
- Prediction using noise-free observations
- Prediction with noisy observations
- Weight-space vs. function-space perspectives
- Semi-parametric models
</details>

#### L8 | Gaussian & non-Gaussian likelihoods

<details>
<summary>Contents</summary>
    
- Reproducing kernel Hilbert spaces
- *Representer* theorem
- Non-Gaussian likelihoods and classification
</details>

#### L9 | Scaling Gaussian processes

<details>
<summary>Contents</summary>
    
- Computing the matrix inverse via Cholesky decomposition
- Subset of data approaches
- Nystrom approximation
- Inducing points
</details>

#### L10 | Scaling Gaussian processes II

<details>
<summary>Contents</summary>
    
- Variational inference
- ELBO derivation
- Minimizing the KL-divergence practically
</details>

#### L11 | Multiple kernel learning

<details>
<summary>Contents</summary>
    
- Empirical Bayes
- Multiple kernel learning
- Generalized additive models
</details>

#### L12 | Gaussian processes and deep neural networks

<details>
<summary>Contents</summary>
    
- Single and deep MLPs
- Deep Gaussian processes
- Posterior inference
</details>


#### L13 | Designing bespoke deep Gaussian processes

<details>
<summary>Contents</summary>
    
- Covariance structure
- Warping inputs
- Parallels to normalizing flows
- Hierarchy of hyperparameters
</details>

## Office hours

Professor Seshadri's office hours:

| Location  | Time    |
| --------  | ------- |
| GU 341    | TBD   |
| GU 341    |TBD    |

Location may change during term. 

## Textbooks

This course will make heavy use of the following texts:

- Rasmussen, C. E., Williams, C. K. *Gaussian Processes for Machine Learning*, The MIT Press, 2006.
- Murphy, K. P., *Probabilistic Machine Learning: Advanced Topics*, The MIT Press, 2023.
- 

In [None]:
## Papers

This course will also rely on the following papers; students are encouraged to read this in their own time. 

- Deep GP
- Aeroengine
- GPs with polynomial kernels
- 