Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating Initial Z Parameter Sweep Results #13

Merged
merged 11 commits into from
May 10, 2018
  •  
  •  
  •  
100 changes: 72 additions & 28 deletions 1.initial-z-sweep/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

## Latent Space Dimensionality

Compression algorithms reduce the dimesionality of input data by enforcing the number of dimensions to bottleneck.
Compression algorithms reduce the dimensionality of input data by enforcing the number of dimensions to bottleneck.
A common problem is the decision of how many "useful" latent space features are present in data.
The solution is different for different problems or goals.
For example, when visualizing large differences between groups of data, a highly restrictive bottleneck, usually between 2 or 3 features, is required.
Expand Down Expand Up @@ -48,15 +48,34 @@ We sweep over the following parameter combinations for Tybalt and ADAGE models:
| Kappa | 0, 0.5, 1 | |
| Sparsity | | 0, 0.000001, 0.001 |
| Noise | | 0, 0.1, 0.5 |
| Weights | | tied |

This resulted in the training of 540 Tybalt models and 1,080 ADAGE models.
This resulted in the training of 540 Tybalt models and 648 ADAGE models.
Importantly, we also include results of a parameter sweep of 1,080 ADAGE models with _untied_ weights, which we ran previously.
For all downstream applications we use ADAGE models with _tied_ weights, but we also report the _untied_ results here.

Our goal was to determine optimal hyperparameter combinations for both models across various bottleneck dimensionalities.

## Results

We report the results in a series of visualizations and tables for Tybalt and ADAGE separately below.

In order to compile the results of the parameter sweep, run the following commands:

```bash
# Compile Tybalt parameter sweep results
python scripts/summarize_paramsweep.py --results_directory 'param_sweep/param_sweep_tybalt/' --output 'parameter_sweep_tybalt_full_results.tsv'

# Compile ADAGE parameter sweep results
python scripts/summarize_paramsweep.py --results_directory 'param_sweep/param_sweep_adage/' --output 'parameter_sweep_adage_tiedweights_full_results.tsv'

# Compile untied ADAGE parameter sweep results
python scripts/summarize_paramsweep.py --results_directory 'param_sweep/param_sweep_adage_untied' --output 'parameter_sweep_adage_full_results.tsv'

# Visualize the results of the sweep for all models
Rscript --vanilla scripts/param_sweep_latent_space_viz.R
```

### Tybalt

Tybalt models had variable performance across models, but was generally stable across all hyperparameter combinations (**Figure 1**).
Expand All @@ -82,14 +101,14 @@ For Tybalt, the optimal hyperparameters across dimensionality estimates are:

| Dimensions | Kappa | Epochs | Batch Size | Learning Rate | End Loss |
| :--------- | :---- | :----- | :--------- | :------------ | :------- |
| 5 | 1 | 50 | 50 | 0.002 | 2805.4 |
| 25 | 0 | 50 | 50 | 0.0015 | 2693.9 |
| 50 | 0 | 100 | 100 | 0.002 | 2670.5 |
| 75 | 0 | 100 | 150 | 0.002 | 2656.7 |
| 100 | 0 | 100 | 100 | 0.001 | 2651.7 |
| 125 | 0 | 100 | 150 | 0.0005 | 2650.3 |

Generally, it appears that the optimal `learning rate` and `kappa` decreases while the `batch size` and `epochs` increase as the dimensionality increases.
| 5 | 0.5 | 100 | 50 | 0.001 | 2525.4 |
| 25 | 0 | 100 | 50 | 0.001 | 2479.6 |
| 50 | 0 | 100 | 100 | 0.001 | 2465.5 |
| 75 | 0 | 100 | 150 | 0.001 | 2460.5 |
| 100 | 0 | 100 | 150 | 0.0005 | 2456.5 |
| 125 | 0 | 100 | 150 | 0.0005 | 2457.4 |

Generally, it appears that the optimal `learning rate` and `kappa` decreases while the `batch size` increases as the dimensionality increases.
Training of models with optimal hyperparameters are shown in **figure 3**.

![](figures/z_param_tybalt/z_parameter_tybalt_best.png?raw=true)
Expand All @@ -98,45 +117,70 @@ Training of models with optimal hyperparameters are shown in **figure 3**.

### ADAGE

ADAGE models had variable performance across models and failed to converge with high levels of sparsity (**Figure 4**).
### Untied Weights

With untied weights, ADAGE models had variable performance across models and failed to converge with high levels of sparsity (**Figure 4**).
High levels of sparsity fail _worse_ with increasing dimensionality.

![](figures/z_param_adage_tied_weights/z_parameter_adage_tiedweights.png?raw=true)
![](figures/z_param_adage/z_parameter_adage.png?raw=true)

**Figure 4.** The loss of validation sets at the end of training for all 1,080 ADAGE models.
**Figure 4.** The loss of validation sets at the end of training for all 1,080 untied weight ADAGE models.

After removing `sparsity = 0.001`, we see a clearer picture (**Figure 5**).

![](figures/z_param_adage_tied_weights/z_parameter_adage_remove_sparsity_tiedweights.png?raw=true)
![](figures/z_param_adage/z_parameter_adage_remove_sparsity.png?raw=true)

**Figure 5.** The loss of validation sets at the end of training for 720 ADAGE models.
**Figure 5.** The loss of validation sets at the end of training for 720 untied weight ADAGE models.

A similar pattern appears where lower dimensionality benefits from increased sparsity.
ADAGE models are also generally stable, particularly at high dimensions.

It appears that `learning rate` is globally optimal at 0.0005; epochs at 100; batch size at 50; sparsity at 0; with decreasing noise for larger z dimensions.

![](figures/z_param_adage/z_parameter_adage_best.png?raw=true)

**Figure 6.** Training optimal untied weight ADAGE models across different latent space dimensions.

### Tied Weights

By constraining the compression and decompression networks to contain the same weights (tied weights), ADAGE models had variable performance across models.
ADAGE models failed to converge with low learning rates (**Figure 7**).

![](figures/z_param_adage_tied_weights/z_parameter_adage_tiedweights.png?raw=true)

**Figure 7.** The loss of validation sets at the end of training for 648 tied weight ADAGE models.

After removing `learning rate = 1e-05` and `learning_rate = 5e-05`, we see a clearer picture (**Figure 8**).

![](figures/z_param_adage_tied_weights/z_param_adage_remove_learningrate_tiedweights.png?raw=true)

**Figure 8.** The loss of validation sets at the end of training for 432 tied weight ADAGE models.

It appears the models perform better without any induced sparsity.

This analysis allowed us to select optimal models based on tested hyperparameters.
For ADAGE, the optimal hyperparameters across dimensionality estimates are:
For tied weights ADAGE, the optimal hyperparameters across dimensionality estimates are:

| Dimensions | Sparsity | Noise | Epochs | Batch Size | Learning Rate | End Loss |
| :--------- | :------- | :---- | :----- | :--------- | :------------ | :------- |
| 5 | 0 | 0.0 | 100 | 50 | 0.0005 | 0.022 |
| 25 | 0 | 0.0 | 100 | 50 | 0.0005 | 0.013 |
| 50 | 0.0 | 0 | 100 | 50 | 0.0005 | 0.011 |
| 75 | 0 | 0.0 | 100 | 50 | 0.0005 | 0.010 |
| 100 | 0 | 0.0 | 100 | 50 | 0.0005 | 0.009 |
| 125 | 0 | 0.0 | 100 | 50 | 0.0005 | 0.008 |

It appears that `learning rate` is globally optimal at 0.0005; epochs at 100; batch size at 50; and noise and sparsity at 0.
| 5 | 0 | 0.0 | 100 | 50 | 0.0015 | 0.0042 |
| 25 | 0 | 0.0 | 100 | 50 | 0.0015 | 0.0029 |
| 50 | 0 | 0.0 | 100 | 50 | 0.0005 | 0.0023 |
| 75 | 0 | 0.0 | 100 | 50 | 0.0005 | 0.0019 |
| 100 | 0 | 0.0 | 100 | 50 | 0.0005 | 0.0017 |
| 125 | 0 | 0.0 | 100 | 50 | 0.0005 | 0.0016 |

It appears that `learning rate` decreases for higher dimensional models, while epochs are globally optimal at 100; batch size at 50; and noise and sparsity at 0.
See https://github.com/greenelab/tybalt/issues/127 for more details about zero noise.

![](figures/z_param_adage_tied_weights/z_parameter_adage_best_tiedweights.png?raw=true)

**Figure 6.** Training optimal ADAGE models across different latent space dimensions.
**Figure 9.** Training optimal ADAGE models across different latent space dimensions.

## Summary

Selection of hyperparameters across different latent space dimensionality operated as expected.
Loss was higher for lower dimensions and lower dimensions benefitted the most from increased regularization.
In general, tied weight ADAGE models performed better than untied weight ADAGE models, and required less regularization.
We will use tied weight ADAGE in all downstream analyses.
Loss was higher for lower dimensions and lower dimensions benefited the most from increased regularization.
Nevertheless, we have obtained a broad set of optimal hyperparameters for use in a larger and more specific sweep of dimensionality.


Binary file modified 1.initial-z-sweep/figures/z_param_adage/z_parameter_adage.pdf
Binary file not shown.
Binary file modified 1.initial-z-sweep/figures/z_param_adage/z_parameter_adage.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Binary file modified 1.initial-z-sweep/figures/z_param_adage/z_parameter_adage_best.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified 1.initial-z-sweep/figures/z_param_tybalt/z_parameter_tybalt.pdf
Binary file not shown.
Binary file modified 1.initial-z-sweep/figures/z_param_tybalt/z_parameter_tybalt.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
loss val_loss num_components learning_rate batch_size epochs sparsity noise seed
0 0.1632493313754133 0.1629537305041174 100 1e-05 100 100 1e-06 0.1 5344
1 0.16309046103281294 0.16294219161397847 100 1e-05 100 100 1e-06 0.1 5344
2 0.163079408032046 0.1629297489946212 100 1e-05 100 100 1e-06 0.1 5344
3 0.1630636197818897 0.16290945578460117 100 1e-05 100 100 1e-06 0.1 5344
4 0.16303681312847224 0.16287482803191372 100 1e-05 100 100 1e-06 0.1 5344
5 0.16299226730546323 0.16281895032480134 100 1e-05 100 100 1e-06 0.1 5344
6 0.1629232176347614 0.16273524765692762 100 1e-05 100 100 1e-06 0.1 5344
7 0.16282321472947173 0.16261750504599146 100 1e-05 100 100 1e-06 0.1 5344
8 0.16268637473494868 0.16246021348028328 100 1e-05 100 100 1e-06 0.1 5344
9 0.16250760176010412 0.16225867744666247 100 1e-05 100 100 1e-06 0.1 5344
10 0.16228251978014846 0.16200887969690353 100 1e-05 100 100 1e-06 0.1 5344
11 0.16200751502647745 0.16170758825151166 100 1e-05 100 100 1e-06 0.1 5344
12 0.1616796714820584 0.16135226377290696 100 1e-05 100 100 1e-06 0.1 5344
13 0.16129675526007675 0.1609409052373177 100 1e-05 100 100 1e-06 0.1 5344
14 0.16085711557832677 0.16047223548793313 100 1e-05 100 100 1e-06 0.1 5344
15 0.16035966686916905 0.15994535453954534 100 1e-05 100 100 1e-06 0.1 5344
16 0.15980377716469385 0.15935995845339407 100 1e-05 100 100 1e-06 0.1 5344
17 0.15918928821553396 0.15871602633790155 100 1e-05 100 100 1e-06 0.1 5344
18 0.15851640743374465 0.15801396052442004 100 1e-05 100 100 1e-06 0.1 5344
19 0.15778568537057758 0.15725439727006846 100 1e-05 100 100 1e-06 0.1 5344
20 0.1569979212864633 0.15643841395126515 100 1e-05 100 100 1e-06 0.1 5344
21 0.1561542248380598 0.15556703275771597 100 1e-05 100 100 1e-06 0.1 5344
22 0.15525583505218005 0.15464184811366863 100 1e-05 100 100 1e-06 0.1 5344
23 0.15430424667546386 0.15366413852377753 100 1e-05 100 100 1e-06 0.1 5344
24 0.15330105200318409 0.15263576378774404 100 1e-05 100 100 1e-06 0.1 5344
25 0.1522479568964563 0.1515584448024855 100 1e-05 100 100 1e-06 0.1 5344
26 0.15114678299105025 0.15043398468338665 100 1e-05 100 100 1e-06 0.1 5344
27 0.14999940249420646 0.14926420736252963 100 1e-05 100 100 1e-06 0.1 5344
28 0.14880769219046153 0.1480513627654943 100 1e-05 100 100 1e-06 0.1 5344
29 0.14757365022091237 0.14679693674022828 100 1e-05 100 100 1e-06 0.1 5344
30 0.14629913196890784 0.14550320444694118 100 1e-05 100 100 1e-06 0.1 5344
31 0.14498614356585388 0.14417193493052344 100 1e-05 100 100 1e-06 0.1 5344
32 0.14363657598058246 0.14280504430059213 100 1e-05 100 100 1e-06 0.1 5344
33 0.1422523420488266 0.1414044607224776 100 1e-05 100 100 1e-06 0.1 5344
34 0.14083529559379576 0.13997209161969285 100 1e-05 100 100 1e-06 0.1 5344
35 0.13938727588682306 0.13850970581248778 100 1e-05 100 100 1e-06 0.1 5344
36 0.13791009455749614 0.13701895725487465 100 1e-05 100 100 1e-06 0.1 5344
37 0.1364054357447333 0.13550165939570671 100 1e-05 100 100 1e-06 0.1 5344
38 0.1348750548133069 0.13395961527548841 100 1e-05 100 100 1e-06 0.1 5344
39 0.13332059017574566 0.13239421077709101 100 1e-05 100 100 1e-06 0.1 5344
40 0.13174362739776577 0.13080723688530563 100 1e-05 100 100 1e-06 0.1 5344
41 0.13014575748062251 0.12920002294844718 100 1e-05 100 100 1e-06 0.1 5344
42 0.12852850518284903 0.12757429435624548 100 1e-05 100 100 1e-06 0.1 5344
43 0.12689337750932364 0.12593138936776013 100 1e-05 100 100 1e-06 0.1 5344
44 0.12524175648910682 0.12427282033853196 100 1e-05 100 100 1e-06 0.1 5344
45 0.12357506409365626 0.12259981215898715 100 1e-05 100 100 1e-06 0.1 5344
46 0.12189469003463806 0.12091397868478718 100 1e-05 100 100 1e-06 0.1 5344
47 0.1202020440918019 0.11921618410840107 100 1e-05 100 100 1e-06 0.1 5344
48 0.11849826811673 0.11750831791953226 100 1e-05 100 100 1e-06 0.1 5344
49 0.11678476681040259 0.1157910283935729 100 1e-05 100 100 1e-06 0.1 5344
50 0.11506269055776583 0.11406616683132086 100 1e-05 100 100 1e-06 0.1 5344
51 0.11333334167440204 0.11233438454081665 100 1e-05 100 100 1e-06 0.1 5344
52 0.11159783871292416 0.11059725879874062 100 1e-05 100 100 1e-06 0.1 5344
53 0.10985738603111915 0.10885537887488178 100 1e-05 100 100 1e-06 0.1 5344
54 0.10811304702784337 0.10711059408571252 100 1e-05 100 100 1e-06 0.1 5344
55 0.10636600591213206 0.10536361066390522 100 1e-05 100 100 1e-06 0.1 5344
56 0.10461730668422087 0.10361539130684119 100 1e-05 100 100 1e-06 0.1 5344
57 0.10286803923290294 0.10186731721737877 100 1e-05 100 100 1e-06 0.1 5344
58 0.1011192426542869 0.10012017239128525 100 1e-05 100 100 1e-06 0.1 5344
59 0.09937197155689126 0.0983749899882168 100 1e-05 100 100 1e-06 0.1 5344
60 0.0976271955815127 0.0966328109928112 100 1e-05 100 100 1e-06 0.1 5344
61 0.0958858943202424 0.09489474640269975 100 1e-05 100 100 1e-06 0.1 5344
62 0.0941490391494906 0.09316153427464278 100 1e-05 100 100 1e-06 0.1 5344
63 0.09241752484082537 0.09143419032120824 100 1e-05 100 100 1e-06 0.1 5344
64 0.09069230715346158 0.08971348262611945 100 1e-05 100 100 1e-06 0.1 5344
65 0.08897425064249274 0.08800049309604731 100 1e-05 100 100 1e-06 0.1 5344
66 0.08726428156752501 0.08629605080464378 100 1e-05 100 100 1e-06 0.1 5344
67 0.08556320097200545 0.08460079223366838 100 1e-05 100 100 1e-06 0.1 5344
68 0.08387182805305055 0.0829158728221553 100 1e-05 100 100 1e-06 0.1 5344
69 0.082191004031856 0.08124173372684412 100 1e-05 100 100 1e-06 0.1 5344
70 0.08052149254012446 0.07957938444524554 100 1e-05 100 100 1e-06 0.1 5344
71 0.07886409623799333 0.07792928541575245 100 1e-05 100 100 1e-06 0.1 5344
72 0.07721951111896667 0.07629279268147358 100 1e-05 100 100 1e-06 0.1 5344
73 0.07558844291993531 0.07467024822031433 100 1e-05 100 100 1e-06 0.1 5344
74 0.0739716502096334 0.0730616862525293 100 1e-05 100 100 1e-06 0.1 5344
75 0.0723696823891671 0.07146880187880454 100 1e-05 100 100 1e-06 0.1 5344
76 0.07078331284526869 0.06989159266553333 100 1e-05 100 100 1e-06 0.1 5344
77 0.06921304218726361 0.06833095444207216 100 1e-05 100 100 1e-06 0.1 5344
78 0.06765954147054223 0.06678719769919937 100 1e-05 100 100 1e-06 0.1 5344
79 0.06612334374135886 0.06526111676615087 100 1e-05 100 100 1e-06 0.1 5344
80 0.06460499238646554 0.06375311685716686 100 1e-05 100 100 1e-06 0.1 5344
81 0.06310493511847719 0.06226378626859368 100 1e-05 100 100 1e-06 0.1 5344
82 0.06162373322874178 0.06079349114592351 100 1e-05 100 100 1e-06 0.1 5344
83 0.06016191218403025 0.059342672252774835 100 1e-05 100 100 1e-06 0.1 5344
84 0.058719754655245544 0.05791189609535376 100 1e-05 100 100 1e-06 0.1 5344
85 0.057297760252508684 0.05650136400781684 100 1e-05 100 100 1e-06 0.1 5344
86 0.05589622916710511 0.05511180562290115 100 1e-05 100 100 1e-06 0.1 5344
87 0.0545156001890826 0.053742800001523004 100 1e-05 100 100 1e-06 0.1 5344
88 0.05315615205656193 0.052395531486476486 100 1e-05 100 100 1e-06 0.1 5344
89 0.051818228556995666 0.05106980701786789 100 1e-05 100 100 1e-06 0.1 5344
90 0.05050213097609544 0.049765827443132446 100 1e-05 100 100 1e-06 0.1 5344
91 0.04920808947018806 0.048484288605313806 100 1e-05 100 100 1e-06 0.1 5344
92 0.04793632055727056 0.04722506354502098 100 1e-05 100 100 1e-06 0.1 5344
93 0.04668698816758033 0.04598821473106667 100 1e-05 100 100 1e-06 0.1 5344
94 0.045460239755916385 0.04477437703528596 100 1e-05 100 100 1e-06 0.1 5344
95 0.044256347643536025 0.04358293162892811 100 1e-05 100 100 1e-06 0.1 5344
96 0.043075310173830636 0.04241465606806266 100 1e-05 100 100 1e-06 0.1 5344
97 0.04191729778579628 0.04126956216504226 100 1e-05 100 100 1e-06 0.1 5344
98 0.0407823579090024 0.040147631742696664 100 1e-05 100 100 1e-06 0.1 5344
99 0.03967057958158788 0.03904852053927417 100 1e-05 100 100 1e-06 0.1 5344
Loading