<font size="6"><b>[ML for NLP] Lecture 1D: Recap of Optimization</b></font>

This notebook consists of a very brief introduction to the key concepts of optimization. There are many more content in this complex yet fundamental field. Here are the main parts covered here:

1. [Introduction to Probability Theory](#Introduction-to-Probability-Theory)
    1. Sample space, Events, Axioms, Conditionals, Independence

2. [Exploring Probability Theory using Real Data](#Exploring-probability-theory-using-real-data)
3. [Random Variables](#Random-Variables)
	1. Discrete RVs, Continuous RVs, Two RVs, Independence
4. [Summary Statistics](#Summary-Statistics)
	1. Quantiles, Expected value, Variance, Mode
5. [Bayes Rule](#Bayes'-Rule)
6. [Important Discrete RVs](#Important-Discrete-Random-Variables)
7. [Important Continuous RVs](#Important-Continuous-Random-Variables)
8. [Joint Distributions for multiple RVs](#Joint-distributions-for-multiple-RVs)
	1. Covariance, Correlation
9. [Multivariate Normal Distribution](#The-multivariate-Gaussian-(normal)-distribution)
10. [Information Theory](#Information-Theory)
	1. Entropy, Cross-entropy, Perplexity, KL-divergence, Mutual Information


---

> Acknowledgments: This lecture is highly based on:
> - Introduction to Probability (2008) by Dimitri Bertsekas and John Tsitsiklis
> - Probabilistic Machine Learning: An Introduction (2021) by Kevin Murphy
> - Lecture slides of the last LxMLS (http://lxmls.it.pt/), which were kindly provided by André Martins and Mário Figueiredo

> _Disclaimer:_ <br>
> Overall, data visualization is a must have tool in the arsenal of any data scientist. Througout this notebook, we will use both `matplotlib` and `seaborn` ([see docs](http://seaborn.pydata.org/)) to plot our data. Seaborn has the advantage of being specific tailored for data science. Moreover, it is actually built on top of matplotlib, so everything we learned so far is still applicable, even though the API changes a little. I strongly recommend you to learn both libs :-). 

$\newcommand{\Prob}{\mathbb{P}}$
$\newcommand{\Expec}{\mathbb{E}}$
$\newcommand{\Var}{\mathbb{V}}$

### Base MathJax
$\newcommand{\bm}[1]{{\boldsymbol{{#1}}}}$
$\DeclareMathOperator{\argmin}{argmin}$
$\DeclareMathOperator{\argmax}{argmax}$

### imports

In [8]:
# imports
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd

# define options for seaborn
custom_params = {"axes.spines.right": False, "axes.spines.top": False, 
                 "grid.color": ".85", "grid.linestyle": ":"}
sns.set_theme(style="whitegrid", rc=custom_params)

### css

In [2]:
%%html
<style>
.text-center{
    text-align: center;
}
.emphasis{
    font-size: 110%;
}
.visible{
    opacity: 1 !important;
}
.imgs{
    margin: 0px 10px;
}
.blue-box{
    text-align: left;
    color: black;
    padding: 10px 25px;
    background-color: #dbe7ff;
    border: 1px dashed #999;
    margin: 10px 0px;
    display: inline-block;
    clear: both;
    opacity: 0.2;
}
.red-box{
    text-align: left;
    color: black;
    padding: 10px 25px;
    background-color: #e6d5d5;
    border: 1px dashed #999;
    margin: 10px 0px;
    display: inline-block;
    clear: both;
    opacity: 0.2;
}
.green-box{
    text-align: left;
    color: black;
    padding: 10px 25px;
    background-color: #d5e6d8;
    border: 1px dashed #999;
    margin: 10px 0px;
    display: inline-block;
    clear: both;
    opacity: 0.2;
}
.transparent-box{
    text-align: left;
    color: black;
    padding: 10px 25px;
    margin: 10px 0px;
    display: inline-block;
}
.blue-box:hover{
    opacity: 1.0;
}
.red-box:hover{
    opacity: 1.0;
}
.green-box:hover{
    opacity: 1.0;
}
.float-right{
    float: right;
}
.float-left{
    float: left;
}
.margin-left-20{
    margin-left: 20px;
}
.margin-left-50{
    margin-left: 50px;
}
.margin-left-100{
    margin-left: 100px;
}
.clear{
    clear: both;
}
.ex-table{
  border: 1px solid black;
  border-collapse: collapse;
}
.ex-table th{
    border: 1px solid black;
    border-collapse: collapse;
}
.ex-table td{
    border: 1px solid black;
    border-collapse: collapse;
}

</style>

### plot utils

In [9]:
def stylize_plot(ax, xlim=None, ylim=None, xticks=None, yticks=None):
    # plot grid with 0.9 opacity on all x and y ticks
    ax.grid(lw=0.9, linestyle=":", which='major')
    # limit view from min(x) to max(x) on x-axis (similarly for y)
    if xlim is not None:
        ax.set_xlim(*xlim)
    if ylim is not None:
        ax.set_ylim(*ylim)
    # add specific x, y ticks
    if xticks is not None:
        ax.set_xticks(xticks)
    if yticks is not None:
        ax.set_yticks(yticks)
    # remove top and right spines
    ax.spines['top'].set_color('none')
    ax.spines['right'].set_color('none')
    return ax


---
# Minimizing a Function



---
# Convex Functions

---
# Convexity and Minimization

---
# Gradients

---
# Hessians

---
# Matrix Gradients

---
# Gradient Descent

---
# Sthocastic Gradient Descent