Skip to content

Commit

Permalink
Merge pull request #647 from isb-cgc/staging
Browse files Browse the repository at this point in the history
Staging
  • Loading branch information
DeenaBleich committed Oct 27, 2021
2 parents 81c453a + 5f05ca2 commit 9eed019
Show file tree
Hide file tree
Showing 11 changed files with 41 additions and 26 deletions.
2 changes: 2 additions & 0 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
sphinx==4.2.0
docutils==0.16
21 changes: 12 additions & 9 deletions docs/source/sections/BestPractices.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,25 +22,26 @@ Most of the same linux commands, scripts, pipelines/workflows, genomics software



a.) The basics and best practices on how to launch virtual machines (VMs) are described `here <https://isb-cancer-genomics-cloud.readthedocs.io/en/latest/sections/gcp-info/gcp-info2/LaunchVM.html>`_ in our documentation. **NOTE: When launching VMs, please maintain the default firewall settings.**
a. The basics and best practices on how to launch virtual machines (VMs) are described `here <https://isb-cancer-genomics-cloud.readthedocs.io/en/latest/sections/gcp-info/gcp-info2/LaunchVM.html>`_ in our documentation. **NOTE: When launching VMs, please maintain the default firewall settings.**


b.) Compute Engine instances can run the public images for Linux and Windows Server that Google provides as well as private custom images that you can `create <https://cloud.google.com/compute/docs/images/create-delete-deprecate-private-images>`_ or `import from your existing systems <https://cloud.google.com/compute/docs/images/importing-virtual-disks>`_.
b. Compute Engine instances can run the public images for Linux and Windows Server that Google provides as well as private custom images that you can `create <https://cloud.google.com/compute/docs/images/create-delete-deprecate-private-images>`_ or `import from your existing systems <https://cloud.google.com/compute/docs/images/importing-virtual-disks>`_.

Be careful as you spin up a machine, as larger machines cost you more. If you are not using a machine, shut it down. You can always restart it easily when you need it.

Example use-case: You would like to run Windows-only genomics software package on the TCGA data. You can create a Windows based VM instance.


c.) More details on how to deploy docker containers on VMs are described here in Google’s documentation: `deploying containers <https://cloud.google.com/compute/docs/containers/deploying-containers>`_
c. More details on how to deploy docker containers on VMs are described here in Google’s documentation: `deploying containers <https://cloud.google.com/compute/docs/containers/deploying-containers>`_

d.) A good way to estimate costs for running a workflow/pipeline on large data sets is to test them first on a small subset of data.
d. A good way to estimate costs for running a workflow/pipeline on large data sets is to test them first on a small subset of data.

e.) There are different VM types depending on the sort of jobs you wish to execute. By default, when you create a VM instance, it remains active until you either stop it or delete it. The costs associated with VM instances are detailed here: `compute pricing <https://cloud.google.com/compute/pricing>`_
e. There are different VM types depending on the sort of jobs you wish to execute. By default, when you create a VM instance, it remains active until you either stop it or delete it. The costs associated with VM instances are detailed here: `compute pricing <https://cloud.google.com/compute/pricing>`_

f.) If you plan on running many short compute-intensive jobs (for example indexing and sorting thousands of large bam files), you can execute your jobs on preemptible virtual machines. They are 80% cheaper than regular instances. `preemptible vms <https://cloud.google.com/preemptible-vms/>`_
f. If you plan on running many short compute-intensive jobs (for example indexing and sorting thousands of large bam files), you can execute your jobs on preemptible virtual machines. They are 80% cheaper than regular instances. `preemptible vms <https://cloud.google.com/preemptible-vms/>`_

**Example use-cases:**

- Using preemptible VMs, researchers were able to quantify transcript levels on over 11K TGCA RNAseq samples for a total cost of $1,065.49.
Tatlow PJ, Piccolo SR. `A cloud-based workflow to quantify transcript-expression levels in public cancer compendia <https://www.nature.com/articles/srep39259>`_. Scientific Reports 6, 39259
- Also Broad’s popular variant caller pipeline, GATK, was designed to be able to run on preemptible VMs.
Expand All @@ -52,17 +53,19 @@ Storage on the Cloud

The Google Cloud Platform offers a number of different storage options for your virtual machine instances: `disks <https://cloud.google.com/compute/docs/disks/>`_

a.) `Block Storage: <https://cloud.google.com/compute/docs/disks/#pdspecs>`_
a. `Block Storage: <https://cloud.google.com/compute/docs/disks/#pdspecs>`_

- By default, each virtual machine instance has a single boot persistent disk that contains the operating system. The default size is 10GB but can be adjusted up to 64TB in size. (Be careful! High costs here, spend wisely!)
- Persistent disks are restricted to the zone where your instance is located.
- Use persistent disks if you are running analyses that require low latency and high-throughput.

b.) `Object Storage: <https://cloud.google.com/compute/docs/disks/#gcsbuckets>`_ Google Cloud Storage (GCS) buckets are the most flexible and economical storage option.
b. `Object Storage: <https://cloud.google.com/compute/docs/disks/#gcsbuckets>`_ Google Cloud Storage (GCS) buckets are the most flexible and economical storage option.

- Unlike persistent disks, Cloud Storage buckets are not restricted to the zone where your instance is located.
- Additionally, you can read and write data to a bucket from multiple instances simultaneously.
- You can mount a GCS bucket to your VM instance when latency is not a priority or when you need to share data easily between multiple instances or zones.
An example use-case: You want to slice thousands of bam files and save the resulting slices to share with a collaborator who has instances in another zone to use for downstream statistical analyses.

An example use-case: You want to slice thousands of bam files and save the resulting slices to share with a collaborator who has instances in another zone to use for downstream statistical analyses.
- You can save objects to GCS buckets including images, videos, blobs and unstructured data.
A comparison table detailing the current pricing of Google’s storage options can be found here: `storage features <https://cloud.google.com/storage/features/>`_

1 change: 1 addition & 0 deletions docs/source/sections/BigQueryTableSearchUI.rst
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,7 @@ Table Access in Google BigQuery
To access the BigQuery tables in Google Cloud Console directly from the Table Search UI, simply click on the **Open** button on the right-hand side.

**Note:**

* If you have previously accessed the Google Cloud Platform and have a Google Cloud Platform project already set up, this button will automatically open up the table in the Google BigQuery Console as depicted in the image below.

* If you have never accessed Google Cloud Platform, you will be presented with a Google login page. You can use any Google ID to log in. Instructions on how to create a Google identity if you don't already have one can be found `here <HowToGetStartedonISB-CGC.html#data-access-and-google-cloud-project-setup>`_. You will be prompted to create a project, free of charge. Once you create the project, you will be directed to the BigQuery table you wished to open in the Google BigQuery Cloud Platform Console.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/sections/HowToGetStarted-Analysis.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ filter data from one or more public data sets (such as TCGA, CCLE, and TARGET),
* `ISB-CGC Mitelman Database <https://mitelmandatabase.isb-cgc.org/>`_
* - The *TP53* Database
| *Explore TP53 variant data that have been reported in the published literature or are available in other public databases.*
- * ISB-CGC The *TP53* Database `Documentation <https://isb-cancer-genomics-cloud.readthedocs.io/en/latest/sections/the_TP53_databaset.html>`_
- * ISB-CGC The *TP53* Database `Documentation <https://isb-cancer-genomics-cloud.readthedocs.io/en/latest/sections/the_TP53_database.html>`_
* ISB-CGC The *TP53* `Database <https://tp53.isb-cgc.org/>`_

Cancer data analysis using Google BigQuery
Expand Down
3 changes: 3 additions & 0 deletions docs/source/sections/HowTos.rst
Original file line number Diff line number Diff line change
Expand Up @@ -124,5 +124,8 @@ I'm an advanced user, how do I...
* - Compare gene expression in tumor against gene expression in normal tissue?
- `Python <https://github.com/isb-cgc/Community-Notebooks/blob/master/Notebooks/How_to_analyze_differential_expression_between_paired_tumor_and_normal_samples.ipynb>`_
-
* - Identify cancer pathways from the Reactome database that are related to a set of genes?
- `Python <https://github.com/isb-cgc/Community-Notebooks/blob/master/Notebooks/How_to_use_the_Reactome_BQ_dataset.ipynb>`_
-

*Notebook inspired by a `Query of the Month Blog <QueryOfTheMonthClub.html>`_ post
13 changes: 9 additions & 4 deletions docs/source/sections/QueryOfTheMonthClub.rst
Original file line number Diff line number Diff line change
Expand Up @@ -586,13 +586,13 @@ the table is being partitioned by a date.

Now, *Felipe* notes:

- **CLUSTER BY wiki, title**: Whenever people query using the wiki
- CLUSTER BY wiki, title: Whenever people query using the wiki
column, BigQuery will optimize these queries. These queries will
be optimized even further if the user also filters by title. If
the user only filters by title, clustering won’t work, as the
order is important (think boxes inside boxes).

- **require_partition_filter=true**: This option reminds my users to
- require_partition_filter=true: This option reminds my users to
always add a date filtering clause to their queries. That’s how I
remind them that their queries could be cheaper if they only
query through a fraction of the year.
Expand Down Expand Up @@ -1691,8 +1691,10 @@ Next we're going to start up a set of VMs, link them together as a cluster, and
We're still going to use googleComputeEngineR to start up VMs, keeping them in a list, and
then using the future package to `create the cluster <https://cran.r-project.org/web/packages/future/index.html>`_.
Here's a couple cloudyr links: `massively parallel <https://cloudyr.github.io/googleComputeEngineR/articles/massive-parallel.html>`_
and `install and auth <https://cloudyr.github.io/googleComputeEngineR/articles/installation-and-authentication.html>`_.
Here's a couple cloudyr links:
- `massively parallel <https://cloudyr.github.io/googleComputeEngineR/articles/massive-parallel.html>`_
- `install and auth <https://cloudyr.github.io/googleComputeEngineR/articles/installation-and-authentication.html>`_.
.. code-block:: r
Expand Down Expand Up @@ -2484,6 +2486,7 @@ really ramp up, otherwise the model is 'not getting any traction'.
Once we have created a model, we have a few options of what to do with it:
- evaluation functions: `ML.EVALUATE <https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-evaluate>`_ and `ML.ROC_CURVE <https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-roc>`_ (which only applies to logistic regression models)
- prediction function: `ML.PREDICT <https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-predict>`_
- inspection functions: `ML.TRAINING_INFO <https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-train>`_, `ML.FEATURE_INFO <https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-feature>`_, and `ML.WEIGHTS <https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-weights>`_
Expand Down Expand Up @@ -5811,6 +5814,7 @@ with gene mutations. Therefore we'll be using two tables and a small set of
variables:
+ `isb-cgc:TCGA_bioclin_v0.Clinical <https://bigquery.cloud.google.com/table/isb-cgc:TCGA_bioclin_v0.Clinical>`_ for survival data
- **days_to_last_known_alive**: This field indicates the number of days to the last
follow up appointment (still alive) or until death, relative to "time zero" (typically
the day of diagnosis).
Expand All @@ -5819,6 +5823,7 @@ variables:
were known to still
be "Alive", while 3622 were "Dead", and 4 were of unknown vital status.
+ `isb-cgc:TCGA_hg38_data_v0.Somatic_Mutation <https://bigquery.cloud.google.com/table/isb-cgc:TCGA_hg38_data_v0.Somatic_Mutation>`_ for mutation status
- **Variant_Classification**: eg Missense_Mutation, Silent, 3'UTR, Intron, etc (18 different values occur in this table)
- **Variant_Type**: one of 3 possible values: SNP, DEL, INS
- **IMPACT**: one of 4 values: LOW, MODERATE, HIGH, or MODIFIER
Expand Down
1 change: 1 addition & 0 deletions docs/source/sections/gcp-info/GCE-101.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ We have put together these examples to **(1)** demonstrate the syntax of these w
FirstWorkflow
CWL-RNAseq
Nextflow-RNAseq
GeneFlow-RNAseq
Nextflow-Blast
CWL-Blast

Expand Down
3 changes: 3 additions & 0 deletions docs/source/sections/gcp-info/gcp-info2/LaunchVM.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,15 +26,18 @@ On your Google Cloud Platform project console page:
* **Choose the Compute Engine option from the menu icon in the upper-left corner.**

* Choose the VM instances page.

#. Note: the first time you visit the page, you will see two options: "Create Instance" or "Take the quickstart".) After that, you will see a page with a list of existing (running or stopped) VMs.

* Select the Create Instance option, and customize your instance preferences:

#. **Name:** this name is relatively arbitrary, choose something that is meaningful to you;
#. **Zone**: choose one of the us-east or us-central zones;
#. **Machine type**: you can specify a VM with anywhere between 1 and 16 cores (aka vCPUs), and with up to 100 GB of RAM (you can try the "Customize" view if you prefer a more graphical approach); note that as you change the specifications of the VM, the estimated cost shown on this page will update;
#. **Boot disk**: the default boot disk and OS will be shown, but you can change this as you wish: the "Change" button will result in a flyout panel where you can choose from a variety of Preconfigured images (Debian, CentOS, Ubuntu, RedHat, etc) or previously created images or disks; you can also choose between "standard disks" and faster (and more expensive) solid-state drives (SSDs), and specify the size of the disk (up to 64TB).

* Once you have all of the options set, you can click on the blue **Create** button.

#. Creating the VM should take less than a minute, after which you will see it listed on the "VM instances" page, with the Name, Zone, Disk, Network, and External IP address shown. There is also an SSH button that you can use directly from the Console.


Expand Down
5 changes: 5 additions & 0 deletions docs/source/sections/office_hours.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,8 @@ We have **virtual Office Hours on Tuesdays and Thursdays** for any questions on
- 11:00am – 12:00pm Eastern
- John Phan
- http://meet.google.com/jai-kgkg-sii


Note

If you are unable to join either meeting link, please email feedback@isb-cgc.org.
2 changes: 2 additions & 0 deletions docs/source/sections/webapp/IGV-Browser.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ To view the selected files in the IGV Browser, click on the "Launch IGV" button


NOTES:

- You will only be able to view controlled access sequence files if you have `logged in as a registered dbGaP authorized user <http://isb-cancer-genomics-cloud.readthedocs.io/en/latest/sections/Gaining-Access-To-Controlled-Access-Data.html>`_.
- You will need to disable your browser pop-up blocker to view files with IGV. If you see a 403 error when using the IGV viewer, the pop-up blocker is the cause of that error. Turn off the blocker and try again.

Expand All @@ -37,6 +38,7 @@ To load BAM files from ISB-CGC Google Cloud Storage, use the "File" > "Load from


NOTE:

- You will only be able to view controlled access sequence files if you have `logged in as a registered dbGaP authorized user <http://isb-cancer-genomics-cloud.readthedocs.io/en/latest/sections/Gaining-Access-To-Controlled-Access-Data.html>`_.


Expand Down
14 changes: 2 additions & 12 deletions docs/source/sections/webapp/Menu.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,21 +19,11 @@ The **MENU** bar supplies links for the following Web App features:
- Saved Workbooks - Displays all your saved workbooks and allows you to edit, duplicate or delete the workbooks.
- Create a New Workbook - Allows you to create a new workbook by selecting the data source and analysis type.

* **PROGRAMS** - This menu item provides a shortcut to the programs you have created if you uploaded your own data.

- Saved Programs - Here you can:

* Edit or delete a Saved Program
* Start a New Workbook
* Create a New Program

- Upload Program Data - Here you can:

* Create a new program for analysis. To create a new program you provide a name for program, name for your project, and attach files that meet our Data Type requirements. Please see `Program Data Upload <program_data_upload.html>`_ for more information on data type accepted by the ISB-CGC.
* **PROGRAMS** - This menu item provides a shortcut to view public programs which available in the ISB-CGC Web App.

- Public Programs - Here you can:

* View the public programs that are currently in the ISB-CGC system.
* View the public programs that are currently in the ISB-CGC Web App.

* **ANALYSES** - From here you can Create, Edit Details, Duplicate, Delete, or Share Analyses. You can use a specific analysis type to create a new workbook customized with the specific data (Genes and miRNAs, Variables, Cohorts) you have selected. The plot types that you can select are:

Expand Down

0 comments on commit 9eed019

Please sign in to comment.