Skip to content

Commit

Permalink
Merge pull request #666 from isb-cgc/staging
Browse files Browse the repository at this point in the history
Cost management, ml notebooks, st. jude
  • Loading branch information
jhphan authored Mar 2, 2022
2 parents 9ba1a36 + 52e3652 commit 7e336e2
Show file tree
Hide file tree
Showing 5 changed files with 93 additions and 1 deletion.
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ The `ISB-CGC <https://isb-cgc.org>`_ aims to serve the needs of a broad range of
sections/HowtoRequestCloudCredits
sections/BestPractices
sections/Benefits
sections/CostManagement
sections/office_hours

.. toctree::
Expand Down
2 changes: 1 addition & 1 deletion docs/source/sections/Benefits.rst
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ Most bioinformaticians today are likely accustomed to using the high performance
| | | |
+-----------+-------------------------------------+-----------------------------------------+

***Be careful of costs**
***Be careful of costs**, See the `Cost Management <CostManagement.html>`_ page for more information.



Expand Down
63 changes: 63 additions & 0 deletions docs/source/sections/CostManagement.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
===============
Cost Management
===============

This section details a few use cases and their approximate costs in order to help users estimate cloud costs for their analyses.

Estimating Costs for Common Bioinformatics and Data Analysis Tasks
==================================================================

The following table summarizes order-of-magnitude costs for common data analysis tasks. For example an order-of-magnitude cost of $10 indicates that the cost can be up to $10, estimated from the given example notebooks. Estimated costs between $10 and $100 are reported as the next order of magnitude, $100:

.. list-table::
:widths: 100 25 25 25
:align: center
:header-rows: 1

* - Bioinformatics / Data Analysis Task
- Dataset(s)
- Tools
- Approx. Cost (Max)
* - | Identify differentially expressed genes
| `Example: Breast Cancer Tumor vs. Normal <https://github.com/isb-cgc/Community-Notebooks/blob/master/Notebooks/How_to_analyze_differential_expression_between_paired_tumor_and_normal_samples.ipynb>`_
- TCGA
- BigQuery, Colab, Python, R
- $1
* - | Train a prediction model using gene expression data
| `Example: Logistic Regression, Ovarian Cancer Chemo Response <https://github.com/isb-cgc/Community-Notebooks/blob/master/MachineLearning/How_to_build_an_RNAseq_logistic_regression_classifier_with_BigQuery_ML.ipynb>`_
| `Example: Logistic Regression, Breast Cancer Tumor vs. Normal <https://github.com/isb-cgc/Community-Notebooks/blob/master/TeachingMaterials/2021-10-NIHLibrarySession/BigQueryMachineLearning.ipynb>`_
- TCGA
- BigQuery, BigQuery ML, Colab, Python, R
- $1 ($100) \*
* - | Train a linear regression model using gene expression data
| `Example: Linear Regression, Kidney Cancer Survival <https://github.com/isb-cgc/Community-Notebooks/blob/master/MachineLearning/How_to_predict_cancer_survival_with_BigQueryML.ipynb>`_
- TCGA
- BigQuery, BigQuery ML, Colab, Python
- $1 ($100) \*
* - | Train a deep neural network (DNN) regression model using gene expression data
| `Example: Regression w/ TensorFlow, Kidney Cancer Survival <https://github.com/isb-cgc/Community-Notebooks/blob/master/MachineLearning/How_to_predict_cancer_survival_with_TensorFlow.ipynb>`_
- TCGA
- BigQuery, Colab, TensorFlow, Compute Engine w/ GPUs
- $1 \*\*
* - | Analyze RNA-seq data using the GDC workflow
| `Example: GDC RNA-seq CWL Workflow <https://github.com/NCI-GDC/gdc-rnaseq-cwl>`_
- TCGA
- Compute Engine, Cloud Storage, CWL
- $10 \*\*\*

* \*BigQuery ML costs depend on data size. In these examples, a subset of data was extracted to a temporary table, which was used as input to BigQuery ML. This reduces costs substantially. If using all gene features of a TCGA dataset, costs can grow to the order of $100.

* \*\*With small datasets, use of GPUs in Colab does not cost extra (unless using `Colab Pro <https://research.google.com/colaboratory/faq.html>`_). However, if TensorFlow code is executed in a VM with GPUs, the hourly cost can range from $1 to $10.

* \*\*\*Cost per sample depends on sample size (i.e., number of reads) and processing time.

* BigQuery ML vs. TensorFlow w/ Compute Engine or Colab GPUs: When choosing between these tools for machine learning, consider the following guidelines:

- TensorFlow w/ Compute Engine or Colab GPUs: Appropriate for data exploration or parameter tuning requiring multiple iterations of training and evaluation.

- BigQuery ML: Appropriate for production deployment of machine learning models. For example, after optimizing model parameters, train and deploy the final model with BigQuery ML.





6 changes: 6 additions & 0 deletions docs/source/sections/MachineLearningNotebooks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,16 @@ Machine learning methods have enabled researchers to leverage and integrate the
* - How to build an RNA-seq logistic regression classifier
- `Python <https://github.com/isb-cgc/Community-Notebooks/blob/master/MachineLearning/How_to_build_an_RNAseq_logistic_regression_classifier.ipynb>`_
- `R <https://github.com/isb-cgc/Community-Notebooks/blob/master/MachineLearning/How_to_build_an_RNAseq_logistic_regression_classifier_R.ipynb>`_
* - How to build an RNA-seq logistic regression classifier with BigQuery ML
- `Python <https://github.com/isb-cgc/Community-Notebooks/blob/master/MachineLearning/How_to_build_an_RNAseq_logistic_regression_classifier_with_BigQuery_ML.ipynb>`_
-
* - How to perform nearest centroid classification using BigQuery
- `Python <https://nbviewer.jupyter.org/github/isb-cgc/Community-Notebooks/blob/master/Notebooks/How_to_perform_Nearest_Centroid_Classification_with_BigQuery.ipynb>`_
- `R <https://github.com/isb-cgc/Community-Notebooks/blob/master/Notebooks/How_to_perform_Nearest_Centroid_Classification_with_BigQuery.md>`_
* - How to predict cancer survival with BigQuery ML
- `Python <https://github.com/isb-cgc/Community-Notebooks/blob/master/MachineLearning/How_to_predict_cancer_survival_with_BigQueryML.ipynb>`_
-
* - How to predict cancer survival with TensorFlow
- `Python <https://github.com/isb-cgc/Community-Notebooks/blob/master/MachineLearning/How_to_predict_cancer_survival_with_TensorFlow.ipynb>`_
-

22 changes: 22 additions & 0 deletions docs/source/sections/ProgrammaticAccess.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,3 +26,25 @@ We have compiled a collection of tutorials and sample workflows designed to intr
:hidden:

gcp-info/GCE-101.rst


St. Jude Bioinformatics Tools
=============================

The following bioinformatics tools and workflows developed by St. Jude have been containerized and made available for execution in the cloud. Each link below navigates to the tools' original documentation. If you would like guidance on how to run these on ISB-CGC, please attend our `office hours <office_hours.html>`_ or contact us (feedback@isb-cgc.org).

.. list-table::
:widths: 100 25
:align: center
:header-rows: 0

* - **CICERO** (Clipped-reads Extended for RNA Optimization) is an assembly-based algorithm to detect diverse classes of driver gene fusions from RNA-seq.
- `GitHub <https://github.com/stjude/CICERO>`_
* - **RNAIndel** calls coding indels from tumor RNA-Seq data and classifies them as somatic, germline, and artifactual.
- `GitHub <https://github.com/stjude/RNAIndel>`_
* - **Teltale** is a program that computes the fraction of telomeric reads in a BAM file.
- `GitHub <https://github.com/stjude/teltale>`_
* - **NetBID** (Network-based Bayesian Inference of Drivers) is a data-driven system biology pipeline and toolkit for finding drivers from transcriptomics, proteomics and phosphoproteomics data, where the drivers can be either transcription factors (TF) or signaling factors (SIG).
- `GitHub <https://github.com/jyyulab/NetBID>`_


0 comments on commit 7e336e2

Please sign in to comment.