# Day 11: Odds and Ends

### BUSI 520 - Python for Business Research
### Kerry Back, JGSB, Rice University

### Outline 

- Debugging 
- Timing code 
- Parallel processing
- GPUs
- Geometry of ridge regression

### Error messages 

- Find the last line of your code in the error message (before it goes into any python libraries, if it does).
- The error is in that line or in some object that appears in that line.
- Ask Julius/Github Copilot/ChatGPT or google if the message is not clear.
- Inspect all of the objects in that line to see if they are what you expect them to be.

### How to inspect objects

- Use type() or .shape or .head() or whatever 
- Use print statements
- Use CTRL-SHFT-P to bring up the command palette and search for `Jupyter: Open Variables View` to see the current values of all variables.
- Use the Data Wrangler extension to see the data in a table.

### Errors inside functions

- Try to avoid them by testing code on examples before you put it into a function.
- Take code out of the function if there's an error inside the function and test on examples.
  - Use CTRL-[ to indent a block of code and CTRL-] to unindent it.
- Use print statements inside the function.
- Use the debugger.

### pdb debugger

- Ask Github Copilot how to use the pdb debugger in Jupyter.
- pdb.set_trace() 
- %debug magic command

### Timing code

- Use time.time or timeit.timeit
- Or use %%time or %%timeit magic commands in Jupyter
- time runs once, timeit runs multiple times and averages the time.

In [39]:
import numpy as np 

def add_numbers(n):
    return np.arange(n+1).sum()

In [40]:
%%timeit 

add_numbers(100000)

113 μs ± 10.3 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


In [47]:
import time 

start = time.time()
add_numbers(100000)
end = time.time()
print(f"elapsed time: {end-start}")

elapsed time: 0.0010008811950683594


### 

### Parallel processing

- Use the joblib library or multiprocessing.Pool to parallelize loops.
- Other libraries are available for more complex parallel processing.
- Only worthwhile for tasks that take a long time to run.
- For small tasks, parallel processing is too much overhead.

In [11]:
from joblib import Parallel, delayed

In [48]:
start = time.time()
lst = [add_numbers(n) for n in range(100000)]
end = time.time()
print(f"elapsed time: {end-start}")

elapsed time: 4.282323360443115


In [49]:
start = time.time()

lst = Parallel(n_jobs=-1, verbose=0)(
    delayed(add_numbers)(n) for n in range(100000)
)

end = time.time()
print(f"elapsed time: {end-start}")

elapsed time: 2.8435566425323486


### GPU processing

- Many cores, each slower than CPU cores.
- Good for tasks that can be parallelized.
- Code still runs on CPU, but tasks are sent to GPU.
- Various libraries will automatically send tasks to the GPU and parallelize them.  
  - Nvidia: numpy -> cupy, pandas -> cudf, scikit-learn -> cuml
- Google Colab has free GPU access: Runtime/Select Runtime Type/GPU

### cupy

[https://colab.research.google.com/drive/1xhORH4VQr5vuaDVsvKY54JW_dIE5EMUz?usp=sharing](https://colab.research.google.com/drive/1xhORH4VQr5vuaDVsvKY54JW_dIE5EMUz?usp=sharing)

### cudf

- Can rewrite code to use cudf instead of pandas.  Almost everything is exactly the same - just use cudf.DataFrame instead of pd.DataFrame, etc.
- Or can use magic %load_ext cudf.pandas to automatically run pandas code with cudf: 

[https://colab.research.google.com/github/rapidsai-community/showcase/blob/main/getting_started_tutorials/cudf_pandas_colab_demo.ipynb](https://colab.research.google.com/github/rapidsai-community/showcase/blob/main/getting_started_tutorials/cudf_pandas_colab_demo.ipynb)

## Geometry of Ridge Regression

### OLS

- Demean $X$ and $y$, with no constant column in $X$
- $(1/N)X'y$ is sample covariance of $y$ with $x$'s.
- $(1/N)X'X$ is sample covariance matrix of $x$'s.
- OLS is 
$$(\text{sample cov of $X$})^{-1} \times \text{sample cov of $X$ with $y$}$$

### Why ridge is called ridge 

- Ridge is 
$$\min \frac{1}{2}(y-X\beta)'(y-X\beta) + \lambda \beta'\beta$$
- FOC is
$$X'(y-X\beta) - \lambda \beta = 0$$
- Solution is
$$\beta = (X'X + \lambda I)^{-1}X'y$$
- So replace sample cov matrix of $X$ with $(1/N)(X'X + \lambda I)$.
- $\lambda I$ adds a ridge to the diagonal of the sample cov matrix of $X$.

### Singular value decomposition

Take the singular value decomposition $X = UDV'$.

- U is $N \times K$, D is $K \times N$, V is $N \times K$.
- U and V are orthogonal matrices. 
  - Columns of U are the eigenvectors of $XX'$ corresponding to nonzero eigenvalues.
  - Columns of V are the eigenvectors of $X'X$.
- First $K$ columns of $D$ is a diagonal matrix containing the nonzero singular values of $X$.  Other columns are zero.

### OLS again 

- $X'X = VD^2V'$.  This is the diagonalization of the nonsingular matrix $X'X$.
- The elements of $D^2$ are the eigenvalues of $X'X$.
- $(X'X)^{-1} = VD^{-2}V'$.


### Ridge again 

- We can write 
$$(X'X + \lambda I) = VD^2V' + \lambda VV' = V(D^2 + \lambda I)V'$$
- This implies
$$(X'X + \lambda I)^{-1} = V(D^2 + \lambda I)^{-1}V'$$
- The matrix $(D^2 + \lambda I)^{-1}$ is diagonal with 
$$\frac{1}{d_i^2 + \lambda}$$ 
on the diagonal.
- So ridge works by inflating the variances of the eigenvectors of the sample cov matrix of $X$.
- The proportional inflation ($(d_i^2 + \lambda)/d_i^2$) is larger for eigenvectors with smaller eigenvalues.

### OLS and Ridge predictions 

- OLS:
$$X\beta = UDV'V(D^2)^{-1}V'VDU'y = UU'y = \sum_{i=1}^K u_i(u_i'y)$$
- Ridge:
$$X\beta = UDV'V(D^2 \lambda I)^{-1}V'VDU'y$$
$$ = UD^2(D^2 \lambda I)^{-1}U'y = \sum_{i=1}^K u_i\frac{d_i^2}{d_i^2 + \lambda}(u_i'y)$$
- Ridge can be seen as scaling down the covariances - proportional effect is larger for lower variance eigenvectors.