# Week 7: More PCA

## Goals:
- Basics of LaTeX and Markdown
- More PCA

## Problem set 1

Let's briefly talk about problem set 1.

## LaTeX

![](graphics/LaTeX_project_logo_bird.png)

<font size="-2">
© Jonas Jacek <a href="https://creativecommons.org/licenses/by/4.0/">CC-BY-4.0</a>
</font>

[$\LaTeX$](https://en.wikipedia.org/wiki/LaTeX) is the main typesetting system for typing mathematics. Pronounced either "law-tech" or "lay-tech".

Virtually all typed mathematics now is using Latex. People in physics and computer science also use Latex.

**Basic idea:**
 
Write $\TeX$ or $\LaTeX$ code in a `tex` file. Compile to produce a `pdf` file.

A basic example is something like

---

```latex
\documentclass{article}         % Always at start

% Additional packages
\usepackage{amsmath}

% Additional preamble code 

\begin{document}                % Required

Here is where you put content.

\end{document}                  % Always at the end
```

---

Simply put, Latex enables one to write very complicated formulae with ease.

<img src="graphics/corollary.png" width=800></img>

**Note** to compile `latex` code on the web (e.g. via `html`), there are two standards: [MathJax](https://en.wikipedia.org/wiki/MathJax) and [KaTeX](https://en.wikipedia.org/wiki/KaTeX). I prefer KaTeX, but it is limited compared to MathJax. 

## Markdown

![](graphics/Markdown-mark.png)

<font size="-2">
Dustin Curtis <a href="https://creativecommons.org/publicdomain/zero/1.0/deed.en">CC0</a>
</font>


[Markdown](https://en.wikipedia.org/wiki/Markdown) is a markup language for creating formatted text, most commonly `html`.

Used in blogs and websites, GitHub (e.g. README file), Jupyter notebooks, and more.

I use it as an "`html` lite" and a "$\LaTeX$ lite". For example, my website is written primarily in markdown (gets converted to `html`).

One can do very simple html-styled formatting with very little code.

---

Code:
```markdown
The font can be **emboldened** or *italicised*.
```

Output:

The font can be **emboldened** or *italicised*.

---

Code:
```markdown
1. First item
1. Second item
42. Third item
```

Output:
1. First item
1. Second item
42. Third item
    
---

Code:
```markdown
- red item
- blue items
  - one blue item
  - another blue item
```

Output:
- red item
- blue items
  - one blue item
  - another blue item
  
---

One can sometimes write (simple) latex code in markdown.

Code:
```markdown
We define a function $f(x) : \mathbb{R} \to \mathbb{R}_{> 0}$ given by
$$
    x \mapsto e^{x^2}. 
$$
```

Output:

We define a function $f(x) : \mathbb{R} \to \mathbb{R}_{> 0}$ given by
$$
    x \mapsto e^{x^2}. 
$$

---

One can also align equations so that they can be easily read:

Code:
```markdown
$$
\begin{aligned}
    1 &= \sqrt{1} \\
    &= \sqrt{(-1)\cdot (-1)} \\
    &= \sqrt{-1} \cdot \sqrt{-1} \\
    &= i \cdot i \\
    &= -1.
\end{aligned}
$$
```

Output:
$$
\begin{aligned}
    1 &= \sqrt{1} \\
    &= \sqrt{(-1)\cdot (-1)} \\
    &= \sqrt{-1} \cdot \sqrt{-1} \\
    &= i \cdot i \\
    &= -1.
\end{aligned}
$$

---

## Back to Python computations

## Problem

Using the data set in `data/UN_IRE_data_smaller.csv` perform PCA. Build your own class to interact with the data.
1. Write functions like `__init__` and `__repr__`. 
2. Write a method to compute all of the principal components (return them in order).
3. Find a reasonable $k$ such that almost all of the total variability is captured in the first $k$ principal components. 
4. Project the data onto the first $k$ components.
5. Plot the projected data on the first two components. 

(This is a 'smaller' version of what you will do for Problem Set 2.)

In class, I started typing the following:

In [19]:
import pandas as pd
import numpy as np

class DataFrameDeluxe:

    def __init__(self, df):
        self.dataframe = df 
        self.n = len(df)
        self.m = len(df.columns)

    def __repr__(self):
        s = "Deluxe Data Frame =====\n\tFirst five rows:\n\t{}".format(self.dataframe.head())
        return s
    
    def covariance_matrix(self):
        X = np.array(self.dataframe).T
        return X @ X.T / self.n

In [20]:
df = pd.read_csv("data/UN_IRE_data_smaller.csv")
dfd = DataFrameDeluxe(df)
print(dfd.covariance_matrix())

[[ 1.          0.101418   -0.02993994 -0.22063903 -0.2530163   0.11588741
   0.02810676 -0.17463691  0.65876099]
 [ 0.101418    1.          0.8971059   0.62950541 -0.90440474  0.990278
  -0.94809571 -0.28735216 -0.15559027]
 [-0.02993994  0.8971059   1.          0.50951845 -0.79944017  0.9106246
  -0.86673234 -0.09157589 -0.27482051]
 [-0.22063903  0.62950541  0.50951845  1.         -0.60601261  0.63773029
  -0.75443647 -0.42784083 -0.33518557]
 [-0.2530163  -0.90440474 -0.79944017 -0.60601261  1.         -0.91513812
   0.89814213  0.3016471   0.0313164 ]
 [ 0.11588741  0.990278    0.9106246   0.63773029 -0.91513812  1.
  -0.96046096 -0.31324433 -0.16713343]
 [ 0.02810676 -0.94809571 -0.86673234 -0.75443647  0.89814213 -0.96046096
   1.          0.29599282  0.24904734]
 [-0.17463691 -0.28735216 -0.09157589 -0.42784083  0.3016471  -0.31324433
   0.29599282  1.         -0.24027416]
 [ 0.65876099 -0.15559027 -0.27482051 -0.33518557  0.0313164  -0.16713343
   0.24904734 -0.24027416  1.    