<center><h1> bsolar vs bolasso : the subsample selection frequency </h1></center>

## this ipynb file illustrates the comparison of subsample selection frequency between bsolar and bolasso </font> 

## #1: import all modules

* <font size="4.5"> For simplicity and elegancy, all relevant functions and classes are coded in "bootstrap_demo_parallel.py". </font>

In [None]:
%reset -f

import numpy                as np
import matplotlib.pyplot    as plt
import pickle
import os
import errno
import warnings

from bootstrap_demo_parallel import one_shot_simul

warnings.simplefilter(action='ignore', category=FutureWarning)

---

## #2(a): define inputs values

| <font size="4.5"> variable name </font> | <font size="4.5">  meaning </font> |
|-|-|
| <font size="4.5">  sample_size  </font> | <font size="4.5">  the sample size $n$ in the paper; </font>| 
| <font size="4.5">  n_dim        </font> | <font size="4.5">  the number of variables (informative + redundant) in $X$, $p$ in the paper; </font>| 
| <font size="4.5">  n_info       </font> | <font size="4.5">  the number of informative variables in $X$; </font>| 
| <font size="4.5">  n_repeat_solar </font> | <font size="4.5">  the number of subsamples generated by solar; </font>| 
| <font size="4.5">  n_repeat_bsolar </font> | <font size="4.5">  the number of subsamples generated by bsolar; </font>| 
| <font size="4.5">  num_rep      </font> | <font size="4.5">  the total repetition number of this simulation; </font>|
| <font size="4.5">  step_size    </font> | <font size="4.5">  the step size for tuning $c$; </font>| 
| <font size="4.5">  rnd_seed     </font> | <font size="4.5">  the random seed value; </font>| 

## #2(b): define DGP

* <font size="4.5"> the population regression equation is $$Y = 2\cdot \mathbf{x}_0 + 3\cdot \mathbf{x}_1 + 4\cdot \mathbf{x}_2 + 5\cdot \mathbf{x}_3 + 6\cdot \mathbf{x}_4  + u,$$ 
* <font size="4.5"> To change the simulation settings, simply change the input values. If you change *n_info* you will adjust the DGP as follows: </font>
    * <font size="4.5"> If $i > \mbox{n_info} - 1$ and $i \in \left[ 0, 1, 2, \ldots, p-1 \right]$, $\beta_i = 0$ in population;</font>
    * <font size="4.5"> If $i \leqslant \mbox{n_info} - 1$ and $i \in \left[ 0, 1, 2, \ldots, p-1 \right]$, $\beta_i = i + 2$ in population</font>

In [None]:
sample_size     = 200
n_dim           = 100
n_info          = 5
n_repeat_solar  = 10
n_repeat_bsolar = 10
step_size       = -0.02
rnd_seed        = 0
plot_on         = False

---

## #3: compute bsolar and bolasso

### Numpy, sklearn and python are actively updated. If you use different version, replication results may be slightly different from the paper (see Read_me_first.docx for detail).

### first, we call the class for simulation from "solar_simul_one_shot.py"

In [None]:
#control the random seed for reproduction
np.random.seed(rnd_seed)

trial = one_shot_simul(sample_size, n_dim, n_info, n_repeat_solar, n_repeat_bsolar, step_size, rnd_seed, plot_on)

 * <font size="5">  if you only want to see our raw result, set *repro=False* </font> 
 * <font size="5">  for replication, set *repro=True* </font> 

In [None]:
repro = True

### then we compute bsolar on the simulated data and return the result

In [None]:
pkl_file = "./numerical_result/bsolar_demo.p"

if repro == True:
    #compute everything
    Qc_list_bsolar, Qc_list_bolasso = trial.simul_func()

    #save all the computation result into "solar_graph_n_(sample size).p" files
    #create the subdirectory if not existing
    if not os.path.exists(os.path.dirname(pkl_file)):
        try:
            os.makedirs(os.path.dirname(pkl_file))
        # Guard against race condition
        except OSError as exc: 
            if exc.errno != errno.EEXIST:
                raise

    with open(pkl_file, "wb") as f:
        pickle.dump( Qc_list_bsolar , f)
        pickle.dump( Qc_list_bolasso, f)

else:    
    #load everything from pickle if you don't want to recompute everything
    with open(pkl_file, "rb") as f:
        Qc_list_bsolar  = pickle.load( f )
        Qc_list_bolasso = pickle.load( f )

---

## #4: plots and results of the simulation

### step 1: we plot the subsample selection frequency list of bsolar-3 (repeating solar 3 times)

In [None]:
trial.q_list_bsolar(Qc_list_bsolar)

### step 2 : we the subsample selection frequency list of bolasso (repeating lasso 256 times)

In [None]:
trial.q_list_bolasso(Qc_list_bolasso)

### Finally we produce the HTML file

In [None]:
!rm -rf subsample_frequency_bolasso_bsolar.html
!jupyter nbconvert --to html subsample_frequency_bolasso_bsolar.ipynb