## Problem

Generate 1000 samples of $\mathbf{x}\in \mathbb{R}^3$ from the multivariate normal distribution with $\mu=(1,3,5)^T$ and $\Sigma \in \mathbb{R}^{3\times 3}$ be a randomly generated positive matrix. <br>
Then we generate $Y$ data from $Y_i=\beta_0 + X_i^T\beta + \epsilon_i$, where $\beta_0=-5$ and $\beta=(10,-5,5)^T$, with $\epsilon_i \buildrel{iid} \over \sim N(0,1)$. <br>
Do standard linear regression and also apply stochastic gradient descent to estimate the parameter. In SGD, let the batch size be 1/10 of the whole data, and the learning rate be 1/100.

## Necessary Codings

```np.random``` is used to generate random samples from a distribution. <br>
```np.linalg``` can do simple OLS and solving a linear equation.

* ```np.random.multivariate_normal(mean=mu, cov=Sigma, size=N) ``` 
* ```np.random.uniform(low=0, high=1, size=N)```
* ```np.random.normal(loc=0, scale=1, size=N)```
* ```np.hstack((Amat, Bmat))``` : The input should be a __sequence of np.arrays__ (array1, array2, array3, $\cdots$). This explains why there are two pairs of parenthesis.
* ```np.linalg.lstsq(Xmat, Yvec)``` : simple linear regression. Xmat should include constant(1) column.
* ```np.linalg.solve(Amat, Bmat)```: $A^{-1}B$ 

```torch``` can apply __stochastic gradient descent (SGD)__ with the given objective function. Prior to that, we need to form appropriate datasets. First we convert np.array's into torch.tensor's and then for a torch.utils.data.__dataloader__ object.

* ```torch.from_numpy()``` : turn an np.array into torch.tensor in the same shape.
* ```torch.set_default_dtype(torch.float64)``` : Since numpy has default float32 and torch has default torch.float32, they are not incompatible.
* ```from torch.utils.data import TensorDataset / DataLoader``` : In order to do SGD, we first need to form it as a torch.utils.data.__TensorDataset__, and then form a torchh.utils.data.__DataLoader__ which is a device that randomly mixes batches for us.
* ```train_ds = TensorDataset(input, output)```: Both input and outputs are torch.tensors.
* ```train_dl = DataLoader(train_ds, batch_size, shuffle=True)``` : batch_size should be integer, not something like 10.0.

Next we form a model, optimizer, and a loss function. 

* ```model = torch.nn.Linear(p,K)```: generate a model with p inputs and K outputs. It randomly generated $K$ bias and $p\times K$ weight terms.
* ```opt = torch.optim.SGD(model.parameters(), lr)```: Form an optimizer with the model and learning rate as inputs.
* ```import torch.nn.functional as F``` : torch.nn.functional contains various loss functions, such as __mse_loss__ and __softmax__.

Inside the loop, we optimize as follows.

    pred=model(xb)
    loss=F.mse_loss(pred, yb)
    loss.backward()
    opt.step()
    opt.zero_grad()

## Generate the data

In [10]:
import numpy as np

In [11]:
np.random.seed(0)
N=1000; p=3
mu = np.array([1,3,5])
W = np.random.uniform(low=0, high=1, size=9).reshape(3,3)
Sigma = np.matmul(W, W.transpose())

Xdat = np.random.multivariate_normal(mean=mu, cov=Sigma, size=N)
ones = np.array([1.]*N)
Xmat = np.hstack((ones.reshape(N,1), Xdat))

beta = np.array([-5.,10,-5,5]) # Put it as a float.
error = np.random.normal(0,1,N)
Yvec = np.matmul(Xmat, beta) + error

## Linear Regression: Standard Way

* np.linalg.lstsq: include constant in the input matrix

In [None]:
res=np.linalg.lstsq(Xmat, Yvec, rcond=None)
res[0]

* manual computation: include constant in the input matrix

In [None]:
XTX = np.matmul(Xmat.transpose(), Xmat)
XTY = np.matmul(Xmat.transpose(), Yvec)
np.linalg.solve(XTX,XTY)

## Linear Regression: Stochastic Gradient

### Step1: Convert the data to torch.tensors, and form the dataloader.

* Convert the np.array's to torch.tensor's.
* Set the default torch.float type. numpy's default is float64 and torch's default is torch.float32. So model (torch) parameters have to match with the data (imported from numpy).
* Since torch.nn.Linear(p,K) automatically gives us bias parameters, we do not need to include the constant term in the input matrix.
* But Y-vector has to be N-by-1 matrix, not N-vector.

In [None]:
import torch 
torch.set_default_dtype(torch.float64)

In [None]:
Y_torch = Yvec.reshape(N, 1)
Y_torch = torch.from_numpy(Y_torch) 
Xdat_torch = torch.from_numpy(Xdat)

* After forming it as a torch.utils.data.__dataset__ that puts together X and Y data, we make it into a torch.utils.data.__dataloader__, with which we can SGD with.

In [None]:
from torch.utils.data import TensorDataset
train_ds=TensorDataset(Xdat_torch, Y_torch) # Do not forget to use the torch-transformed data matrix.
from torch.utils.data import DataLoader
batch_size = int(N/10)
train_dl = DataLoader(train_ds, batch_size, shuffle=True)

### Step2: Set the model and optimizer.

In [None]:
from torch import nn
import torch.nn.functional as F
torch.manual_seed(0)
model = nn.Linear(p,1) # a function that takes N-by-p matrix and outputs N-by-1 matrix.
opt=torch.optim.SGD(model.parameters(), lr=1e-2)
num_epochs=1000

### Step3: Iterations

In [None]:
for epoch in range(num_epochs):
    for xb, yb in train_dl: # for each batch
        pred=model(xb)
        loss=F.mse_loss(pred, yb)
        loss.backward()
        opt.step()
        opt.zero_grad()
    if (epoch+1) % 100 ==0:
        print('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, loss.item()))

* results

In [None]:
list(model.parameters())