diff --git a/docs/android-chrome-192x192.png b/docs/android-chrome-192x192.png new file mode 100644 index 00000000..fca973b2 Binary files /dev/null and b/docs/android-chrome-192x192.png differ diff --git a/docs/android-chrome-512x512.png b/docs/android-chrome-512x512.png new file mode 100644 index 00000000..e2bccbd5 Binary files /dev/null and b/docs/android-chrome-512x512.png differ diff --git a/docs/apple-touch-icon.png b/docs/apple-touch-icon.png new file mode 100644 index 00000000..5f5ea395 Binary files /dev/null and b/docs/apple-touch-icon.png differ diff --git a/docs/approach.png b/docs/approach.png new file mode 100644 index 00000000..2488a64c Binary files /dev/null and b/docs/approach.png differ diff --git a/docs/browserconfig.xml b/docs/browserconfig.xml new file mode 100644 index 00000000..4630ccc4 --- /dev/null +++ b/docs/browserconfig.xml @@ -0,0 +1,9 @@ + + + + + + #009688 + + + diff --git a/docs/favicon-192x192.png b/docs/favicon-192x192.png new file mode 100644 index 00000000..fca973b2 Binary files /dev/null and b/docs/favicon-192x192.png differ diff --git a/docs/icon.svg b/docs/icon.svg new file mode 100644 index 00000000..d4f9ed6a --- /dev/null +++ b/docs/icon.svg @@ -0,0 +1,78 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/index.html b/docs/index.html new file mode 100644 index 00000000..d537f7af --- /dev/null +++ b/docs/index.html @@ -0,0 +1,518 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+ +
+
+
+
+
+

Summary

+

+ BioBombe examines many low-dimensional representations of gene + expression data. Named after the large mechanical devices built by + Alan Turing and other cryptologists during World War II to decode + secret messages, BioBombe represents an approach to decipher hidden + messages embedded in gene expression data. In the manuscript, we use + this approach to compare the biological representations learned by + various compression algorithms across different latent space + dimensionalities ranging from k = 2 to k = 200. +

+

+ This website provides convenient links to all the resources produced + for the manuscript including software, processed gene expression + input data, compressed features for all algorithms, latent + dimensionalities, and initializations for all datasets, and + performance results and models for the large scale cancer-type and + mutation classification analysis. +

+

Primary Findings

+

+ There does not exist a single optimal algorithm or latent + dimensionality for learning biological representations. Many different biological signatures are learned by different + compression algorithms at various latent dimensionalities. A + practitioner aiming to optimize feature discovery by compressing + gene expression data should use multiple algorithms across a large + range of latent dimensionalities. +

+
+
+
+
+

Citation

+

+ Sequential compression across latent space dimensions enhances + gene expression signatures
+ Way, G.P., Zietz, M., Himmelstein, D.S., Greene, C.S.
+ biorXiv preprint (2019) · + doi:10.1101/573782 +

+
+
+
+
+

Approach

+

+ We train five compression algorithms (principal components analysis + (PCA), independent components analysis (ICA), non-negative matrix + factorization (NMF), denoising autoencoders (DAE), and variational + autoencoders (VAE)) using three benchmark gene expression datasets: + The Cancer Genome Atlas (TCGA), Genome Tissue Expression Consortium + project (GTEx), and the Therapeutically Applicable Research To + Generate Effective Treatments (TARGET) project across a wide range + of latent dimensionalities (k). We assess model performance by + measuring reconstruction, correlation between input and + reconstructed output, model stability and geneset coverage. +

+
+ +
+
+
+
+
+

Resources

+

Software

+

+ DevelopmentArchive +

+

+ Includes code, data, documentation, results, figures, and a + computational environment for the full analysis. Each numbered + module represents specific data processing steps or analysis + results. +

+

Input Data

+

+ Data +

+

+ Includes Processed Training and Testing Datasets for TCGA, GTEx, and + Target as git LFS files +

+

Heterogeneous Networks

+

+ Gene SetsInfo +

+

+ Includes real and permuted hetnets1 for MSigDB2 + and xCell3 gene sets, as well as more details about + heterogeneous networks. +

+

+ 1 + Systematic integration of biomedical knowledge prioritizes drugs + for repurposing
+ Himmelstein DS, Lizee A, Hessler C, Brueggeman L, Chen SL, Hadley D, + Green A, Khankhanian P, Baranzini SE.
+ eLife (2017) · + doi:10.7554/eLife.26726 +

+

+ 2 + The Molecular Signatures Database Hallmark Gene Set Collection
+ Liberzon A, Birger C, Thorvaldsdóttir H, Ghandi M, Mesirov + JP, Tamayo P.
+ Cell Systems (2015) · 1:417-25 +

+

+ 3 + xCell: digitally portraying the tissue cellular heterogeneity + landscape
+ Aran D, Hu Z, Butte AJ.
+ Genome Biology (2017) · + doi:10.1186/s13059-017-1349-1 +

+

Compressed Features

+

+ Results: +

+

+ TCGAGTEXTARGET +

+

+ Randomly Permuted Data: +

+

+ TCGAGTEXTARGET +

+

TCGA Classification Results

+

+ Results +

+

+ Includes BioBombe feature coefficients, sample activation scores, + and classifier metrics for all supervised learning models (elastic + net logistic regression) trained to predict cancer-type and mutation + status. +

+
+
+
+
+

Acknowledgements

+

+ This work was funded in party by The Gordon and Betty Moore + Foundation under GBMF 4552 (CSG) and the National Institutes of + Health's National Human Genome Research Institute under R01 HG010067 + (CSG) and the National Institutes of Health under T32 HG000046 + (GPW). +

+

+ We would like to thank Jaclyn Taroni, Yoson Park, and Alexandra Lee + for insightful discussions and code review. We also thank Jo Lynne + Rokita and John Maris for insightful discussions regarding the + neuroblastoma analysis. +

+
+
+
+ + + + diff --git a/docs/logo.svg b/docs/logo.svg new file mode 100644 index 00000000..966466a6 --- /dev/null +++ b/docs/logo.svg @@ -0,0 +1,151 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/mstile-144x144.png b/docs/mstile-144x144.png new file mode 100644 index 00000000..45389a60 Binary files /dev/null and b/docs/mstile-144x144.png differ diff --git a/docs/mstile-150x150.png b/docs/mstile-150x150.png new file mode 100644 index 00000000..17cea0b0 Binary files /dev/null and b/docs/mstile-150x150.png differ diff --git a/docs/mstile-310x150.png b/docs/mstile-310x150.png new file mode 100644 index 00000000..4077cc08 Binary files /dev/null and b/docs/mstile-310x150.png differ diff --git a/docs/mstile-310x310.png b/docs/mstile-310x310.png new file mode 100644 index 00000000..1020a85c Binary files /dev/null and b/docs/mstile-310x310.png differ diff --git a/docs/mstile-70x70.png b/docs/mstile-70x70.png new file mode 100644 index 00000000..f3380ef5 Binary files /dev/null and b/docs/mstile-70x70.png differ diff --git a/docs/share-thumbnail.png b/docs/share-thumbnail.png new file mode 100644 index 00000000..9be2625f Binary files /dev/null and b/docs/share-thumbnail.png differ diff --git a/docs/site.webmanifest b/docs/site.webmanifest new file mode 100644 index 00000000..b20abb7c --- /dev/null +++ b/docs/site.webmanifest @@ -0,0 +1,19 @@ +{ + "name": "", + "short_name": "", + "icons": [ + { + "src": "/android-chrome-192x192.png", + "sizes": "192x192", + "type": "image/png" + }, + { + "src": "/android-chrome-512x512.png", + "sizes": "512x512", + "type": "image/png" + } + ], + "theme_color": "#ffffff", + "background_color": "#ffffff", + "display": "standalone" +}