<center><h1> Runtime of bolasso (by coordinate descent) under sklearn built-in parallel </h1></center>

## #1: import all modules

* <font size="4"> "pickle" is used to save all computation results into ".p" files, which can be loaded later. </font>

* <font size="4"> For simplicity and elegancy, all relevant functions and classes are coded in "simul_built_in_parallel.py". </font>

In [1]:
%reset -f

from simul_cd_parallel import simul_func

import mkl
import numpy             as np
import matplotlib.pyplot as plt
import pickle
import os
import errno

## make sure we use the Intel MKL C++/Fortran compiler for maximum performance.

In [2]:
mkl.get_version_string()

'Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications'

In [3]:
print('This was obtained using the following MKL configuration:')

np.show_config()

This was obtained using the following MKL configuration:
blas_mkl_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/home/ning/anaconda3/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/home/ning/anaconda3/include']
blas_opt_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/home/ning/anaconda3/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/home/ning/anaconda3/include']
lapack_mkl_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/home/ning/anaconda3/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/home/ning/anaconda3/include']
lapack_opt_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/home/ning/anaconda3/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/home/ning/anaconda3/include']
Supported SIMD extensions in this NumPy install:
    baseline =

# set simulation parameters

In [4]:
n_info    = 5
step_size = -0.01
num_rep   = 200
rnd_seed  = 0

n_dim_0 = 400 ; sample_size_0 = 200
n_dim_1 = 800 ; sample_size_1 = 400
n_dim_2 = 1200; sample_size_2 = 600

---

## **Read this before replication**

## #1. the ["tqdm progress bar"](https://github.com/tqdm/tqdm)
### After runing all the codes, you should see a progress bar below each simulation function. The progress bars are made by Python package *"tqdm"* with negligible overheads (80ns for the graphical output). As a result, it does not affect the accuracy of measuring runtime. 

## #2. the graphical interface of progress bar

### The progress bar looks as follows (such as the one below *trial.simul_bsolar()* ). 

![the tqdm progress bar](./progress_bar.png)

### From left to right, it displays

* <font size="4.5"> percentage of finished repetitions </font>
* <font size="4.5"> the progress bar </font>
* <font size="4.5"> number of finished repetitions &nbsp; $/$ &nbsp; number of total repetitions </font>
* <font size="4.5"> $[$ time spent &nbsp;  $<$ &nbsp;  time left to finish all repetitions, &nbsp;  average runtime based on finished repititions $]$ </font>
* <font size="4.5"> Note that the average time in either **iteration per second (it/s)** or **second per iteration (s/it)**; take the reciprical of **it/s** to make a clear comparison </font>

## #3. the runtime length issue of bolasso

### Beware that bolasso computation could take very long time on some CPU

---

# #3(a) : the runtime for Amdal's law : $p/n = 1000/100$

In [5]:
n_dim_00 = 1000 ; sample_size_00 = 100

trial = simul_func(sample_size_00, n_dim_00, n_info, 3, step_size, rnd_seed)

trial.simul_bolasso()

100%|██████████| 3/3 [02:28<00:00, 49.61s/it]


## #4 : $\log(p)/n \rightarrow 0$

In [6]:
num_rep = 10

## #4(a): $p/n=400/200$ 

In [7]:
trial = simul_func(sample_size_0, n_dim_0, n_info, num_rep, step_size, rnd_seed)

### run 10 repetitions for bolasso 

In [8]:
trial.simul_bolasso()

100%|██████████| 10/10 [04:23<00:00, 26.32s/it]


## #4(b): $p/n=800/400$

In [9]:
trial = simul_func(sample_size_1, n_dim_1, n_info, num_rep, step_size, rnd_seed)

### run 10 repetitions for bolasso 

In [10]:
trial.simul_bolasso()

100%|██████████| 10/10 [15:42<00:00, 94.24s/it]


## #4(c): $p/n=1200/600$

In [11]:
trial = simul_func(sample_size_2, n_dim_2, n_info, num_rep, step_size, rnd_seed)

### run 10 repetitions for bolasso 

In [12]:
trial.simul_bolasso()

100%|██████████| 10/10 [46:59<00:00, 281.95s/it]


---
## #4 : $p/n \rightarrow 0$

In [13]:
n_dim_3 = 100 ; sample_size_3 = 100
n_dim_4 = 100 ; sample_size_4 = 150
n_dim_5 = 100 ; sample_size_5 = 200

## #4(d) : $p/n = 100/100$

In [14]:
trial = simul_func(sample_size_3, n_dim_3, n_info, num_rep, step_size, rnd_seed)

### run 10 repetitions for bolasso 

In [15]:
trial.simul_bolasso()

100%|██████████| 10/10 [02:15<00:00, 13.55s/it]


## #4(d): $p/n = 100/150$

In [16]:
trial = simul_func(sample_size_4, n_dim_4, n_info, num_rep, step_size, rnd_seed)

### run 10 repetitions for bolasso 

In [17]:
trial.simul_bolasso()

100%|██████████| 10/10 [10:31<00:00, 63.11s/it]


## #4(e): $p/n=100/200$

In [18]:
trial = simul_func(sample_size_5, n_dim_5, n_info, num_rep, step_size, rnd_seed)

### run 10 repetitions for bolasso 

In [19]:
trial.simul_bolasso()

100%|██████████| 10/10 [10:30<00:00, 63.06s/it]


---
## #4 : $p/n \rightarrow 1$

In [20]:
n_dim_6 = 150 ; sample_size_6 = 100
n_dim_7 = 200 ; sample_size_7 = 150
n_dim_8 = 250 ; sample_size_8 = 200

## #4(f): $p/n=150/100$

In [21]:
trial = simul_func(sample_size_6, n_dim_6, n_info, num_rep, step_size, rnd_seed)

### run 10 repetitions for bolasso 

In [22]:
trial.simul_bolasso()

100%|██████████| 10/10 [02:13<00:00, 13.39s/it]


## #4(g): $p/n=200/150$

In [23]:
trial = simul_func(sample_size_7, n_dim_7, n_info, num_rep, step_size, rnd_seed)

### run 10 repetitions for bolasso 

In [24]:
trial.simul_bolasso()

100%|██████████| 10/10 [02:40<00:00, 16.05s/it]


## #4(h): $p/n=250/200$

In [25]:
trial = simul_func(sample_size_8, n_dim_8, n_info, num_rep, step_size, rnd_seed)

### run 10 repetitions for bolasso 

In [26]:
trial.simul_bolasso()

100%|██████████| 10/10 [03:10<00:00, 19.01s/it]


---

## #5. runtime graph plot

In [5]:
num_rep          = 3
n_info           = 5
step_size        = -0.02
rnd_seed         = 0
n_repeat_solar   = 10    
n_repeat_bsolar  = 3     

# feature number = 100

In [6]:
n_dim       = 100   
sample_size = 100

trial = simul_func(sample_size, n_dim, n_info, num_rep, step_size, rnd_seed)
trial.simul_bolasso()

100%|████████████████████████████████████████████████████████████████████████████████| 3/3 [00:41<00:00, 13.84s/it]


# feature number = 200

In [7]:
n_dim       = 200   
sample_size = 200

trial = simul_func(sample_size, n_dim, n_info, num_rep, step_size, rnd_seed)
trial.simul_bolasso()

100%|████████████████████████████████████████████████████████████████████████████████| 3/3 [00:54<00:00, 18.14s/it]


# feature number = 400

In [8]:
n_dim       = 400   
sample_size = 400

trial = simul_func(sample_size, n_dim, n_info, num_rep, step_size, rnd_seed)
trial.simul_bolasso()

100%|████████████████████████████████████████████████████████████████████████████████| 3/3 [02:18<00:00, 46.10s/it]


# feature number = 600

In [9]:
n_dim       = 600   
sample_size = 600

trial = simul_func(sample_size, n_dim, n_info, num_rep, step_size, rnd_seed)
trial.simul_bolasso()

100%|███████████████████████████████████████████████████████████████████████████████| 3/3 [06:32<00:00, 130.67s/it]


# feature number = 800

In [10]:
n_dim       = 800   
sample_size = 800

trial = simul_func(sample_size, n_dim, n_info, num_rep, step_size, rnd_seed)
trial.simul_bolasso()

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [13:15<00:00, 265.29s/it]


# feature number = 1000

In [11]:
n_dim       = 1000   
sample_size = 1000

trial = simul_func(sample_size, n_dim, n_info, num_rep, step_size, rnd_seed)
trial.simul_bolasso()

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [22:57<00:00, 459.31s/it]


# feature number = 1200

In [12]:
n_dim       = 1200   
sample_size = 1200

trial = simul_func(sample_size, n_dim, n_info, num_rep, step_size, rnd_seed)
trial.simul_bolasso()

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [46:58<00:00, 939.53s/it]


# feature number = 1400

In [13]:
n_dim       = 1400   
sample_size = 1400

trial = simul_func(sample_size, n_dim, n_info, num_rep, step_size, rnd_seed)
trial.simul_bolasso()

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [1:27:38<00:00, 1752.76s/it]


# feature number = 1600

In [14]:
n_dim       = 1600   
sample_size = 1600

trial = simul_func(sample_size, n_dim, n_info, num_rep, step_size, rnd_seed)
trial.simul_bolasso()

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [2:24:09<00:00, 2883.30s/it]


---

## #6. output the raw results into HTML

In [3]:
!rm -rf bolasso_cd_runtime.html
!jupyter nbconvert --to html bolasso_cd_runtime.ipynb 

[NbConvertApp] Converting notebook bolasso_cd_runtime.ipynb to html
[NbConvertApp] Writing 645603 bytes to bolasso_cd_runtime.html
