Skip to content

Commit

Permalink
Merge pull request #612 from isb-cgc/John-staging
Browse files Browse the repository at this point in the history
Add machine learning page that references new notebook
  • Loading branch information
jhphan committed Aug 30, 2021
2 parents dd64ecf + ef92d9a commit 101fef8
Show file tree
Hide file tree
Showing 7 changed files with 296 additions and 28 deletions.
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ The `ISB-CGC <https://isb-cgc.org>`_ aims to serve the needs of a broad range of

sections/HowTos
sections/RegulomeExplorerNotebooks
sections/MachineLearningNotebooks
sections/TutorialsAndHow-ToGuides
sections/Releases
sections/Quick-links-updated
Expand Down
14 changes: 14 additions & 0 deletions docs/source/sections/MachineLearningNotebooks.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
**************************
Machine Learning Notebooks
**************************
Machine learning methods have enabled researchers to leverage and integrate the vast amounts of diverse cancer data to reveal new insights, develop better diagnostics, and improve therapy. ISB-CGC offers examples of how to use Google Cloud resources to train and use machine learning models for a variety of cancer applications and datasets.

.. list-table::
:widths: 100 10 10
:align: center
:header-rows: 0

* - How to build an RNA-seq logistic regression classifier
- `Python <https://github.com/isb-cgc/Community-Notebooks/blob/master/MachineLearning/How_to_build_an_RNAseq_logistic_regression_classifier.ipynb>`_
-

2 changes: 1 addition & 1 deletion docs/source/sections/ProgrammaticAccess.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ ISB-CGC provides programmatic access to cancer data (both open and controlled-ac
Workflow Gallery
================

We have compiled a collection of tutorials and sample workflows designed to introduce users to running workflows (CWL, Nextflow, Snakemake, WDL) on the Google Cloud Platform (GCP). ISB-CGC does not make the choice for users but instead enables as many workflow technologies as possible through documentation, support, and where necessary, infrastructure.
We have compiled a collection of tutorials and sample workflows designed to introduce users to running workflows (CWL, Nextflow, Snakemake, WDL, and GeneFlow) on the Google Cloud Platform (GCP). ISB-CGC does not make the choice for users but instead enables as many workflow technologies as possible through documentation, support, and where necessary, infrastructure.



Expand Down
57 changes: 43 additions & 14 deletions docs/source/sections/gcp-info/Cheatsheet.rst
Original file line number Diff line number Diff line change
@@ -1,64 +1,63 @@
VM Workflow Tools Installation Cheat Sheet
##########################################

When working with a new Virtual Machine (VM), more often than not installing software, packages and dependencies is required, and the process can be cumbersome. This cheat sheet was created with running workflows on Google Cloud VM in mind. It contains quick shortcuts to install common software, dependencies, and quick fixes.
When working with a new Virtual Machine (VM), installing software, packages, and dependencies is usually required, and the process can be cumbersome. This cheat sheet was created with running workflows on Google Cloud VM in mind. It contains quick shortcuts to install common software, dependencies, and quick fixes.

********
NEXTFLOW
********

Install:
========

::

$ export NXF_VER=20.01.0
$ export NXF_MODE=google
$ curl https://get.nextflow.io | bash



*******************
*********
SNAKEMAKE
*******************
*********

Step 1 install Miniconda:
=========================

`Installer <https://docs.conda.io/en/latest/miniconda.html#linux-installers>`_
| `Instruction <https://conda.io/projects/conda/en/latest/user-guide/install/index.html>`_
| `Instructions <https://conda.io/projects/conda/en/latest/user-guide/install/index.html>`_
.. note:: After “conda init fish” step **restarting your VM command line** is needed.
In addition, if the conda command is not found, try: $export PATH=./miniconda3/bin/:$PATH



Step 2 install Snakemake:
=========================

`Instruction <https://snakemake.readthedocs.io/en/stable/getting_started/installation.html#conda-install>`_

`Instructions <https://snakemake.readthedocs.io/en/stable/getting_started/installation.html#conda-install>`_

Installer:

::

$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
Snakemake environment:
----------------------

Create and activate Environment for Snakemake from a file (yml/yaml):

::

$ conda env create --name <yourEnvironmentName> --file environment.yaml
$ source activate <yourEnvironmentName>

Updating current environment
Update current environment:

::

$ conda env update -f environment.yml

.. note:: For more conda commands, visit: `Conda Cheat sheet <https://docs.conda.io/projects/conda/en/4.6.0/_downloads/52a95608c49671267e40c689e0bc00ca/conda-cheatsheet.pdf>`_.



***
WDL
***
Expand All @@ -70,17 +69,33 @@ WDL
$ wget https://github.com/broadinstitute/cromwell/releases/download/52/cromwell-52.jar
$ wget https://github.com/broadinstitute/cromwell/releases/download/52/womtool-52.jar
***
CWL
***

::

$ sudo apt-get install python-pip
$ pip install --upgrade pip
$ pip install cwltool


********
GENEFLOW
********

`Instructions and Source Code <https://github.com/CDCgov/geneflow2>`_

Install Python3 and Pip (see below), then install GeneFlow in a virtual environment with the following:

::

$ python3 -m venv gf
$ source gf/bin/activate
$ pip3 install geneflow

Be sure to always activate the GeneFlow Python virtual environment before using the GeneFlow command line.


*******************
Common dependencies
Expand Down Expand Up @@ -114,6 +129,9 @@ DOCKER

Install:
--------

For Ubuntu:

::

$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
Expand All @@ -122,6 +140,17 @@ Install:
$ apt-cache policy docker-ce
$ sudo apt-get install -y docker-ce

For Debian:

::

$ curl -fsSL https://download.docker.com/linux/debian/gpg | sudo apt-key add -
$ sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/debian $(lsb_release -cs) stable"
$ sudo apt-get update
$ apt-cache policy docker-ce
$ sudo apt-get install -y docker-ce


Check docker status:
--------------------
::
Expand Down
9 changes: 5 additions & 4 deletions docs/source/sections/gcp-info/GCE-101.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,12 @@ Running Workflows on ISB-CGC

The ISB-CGC platform is intentionally designed to be a light-weight infrastructure above the Google Cloud Platform (GCP). As a result, our end-users have at their disposal all of the tools and technologies that the GCP has to offer. This includes full **sudo** acccess to Google Cloud Compute Engines and Virtual Machines. For more detailed information about GCP tools and technologies, please see their in-depth `documentation page <https://cloud.google.com/docs>`_.

We have put together some basic documentation and workflow examples for ISB-CGC users who are both new to the GCP as well as to the commonly used workflow languages in -omics research. We have chosen the workflow languages of CWL, NextFlow, Snakemake and WDL. If there are other workflow languages you are interested in, please let us know and we'll put together some examples for them as well (email us at feedback@isb-cgc.org).
We have put together some basic documentation and workflow examples for ISB-CGC users who are both new to the GCP as well as to the commonly used workflow languages in -omics research. We have chosen the workflow languages of CWL, NextFlow, Snakemake, WDL, and GeneFlow. If there are other workflow languages you are interested in, please let us know and we'll put together some examples for them as well (email us at feedback@isb-cgc.org).


Getting Started
================
The links in this section: **(1)** help new users get familiar with the Google Cloud Platform tools and technologies (virtual machines and cloud storage) necessary to run workflows with your ISB-CGC Google Cloud Platform project, **(2)** provide a cheatsheet of the packages and dependencies required on the virtual machines to successfully execute the workflow examples provided below. **(3)** Calculating cloud costs before running large workflows is very crucial. We provide here some tips and tricks to determining costs before running workflows.
The links in this section: **(1)** help new users get familiar with the Google Cloud Platform tools and technologies (virtual machines and cloud storage) necessary to run workflows with your ISB-CGC Google Cloud Platform project, **(2)** provide a cheatsheet of the packages and dependencies required on the virtual machines to successfully execute the workflow examples provided below. **(3)** Calculating cloud costs before running large workflows is very crucial. We provide here some tips and tricks to determining costs before running workflows.


- `Launching a Virtual Machine <gcp-info2/LaunchVM.html>`_
Expand All @@ -36,8 +36,9 @@ We have put together these examples to **(1)** demonstrate the syntax of these w
- `Snakemake RNA-seq <FirstWorkflow.html>`_
- `CWL RNA-seq <CWL-RNAseq.html>`_
- `Nextflow RNA-seq <Nextflow-RNAseq.html>`_
- `Running Nextflow Blast <Nextflow-Blast.html>`_
- `Running CWL Blast <CWL-Blast.html>`_
- `GeneFlow RNA-seq <GeneFlow-RNAseq.html>`_
- `Nextflow Blast <Nextflow-Blast.html>`_
- `CWL Blast <CWL-Blast.html>`_

.. toctree::
:maxdepth: 1
Expand Down

0 comments on commit 101fef8

Please sign in to comment.