Home
This is the pre-conference workshop wiki where we'll take you through the projects used for the conference.
- Setup
- Part 1: SRA and NCBI Virus
- Introduction
- Background
- Introductory Exercises
- Projects
- Part 2: NCBI Pathogen Detection
- Background
- Exercises
- Project 1: Use BigQuery to search MicroBIGG-E and Isolates data
- Project 2: Build a phylogeny of reference blaKPC alleles
- Project 3: Look for evidence of positively selected sites in blaKPC genes
- Demonstrate use of BigQuery in the Google Cloud Console and commandline
bq
- Show how BigQuery can be used to do analysis of microbigge, isolates, and isolate_exceptions tables and how they relate to the web interface
- Demonstrate downloading sequences and phylogenetic analysis from the Reference Gene Catalog and visualization using iTOL
- Demonstrate using
gsutil
to download MicroBIGG-E contig sequences from cloud storage buckets - Demonstrate the use of seqkit to perform some common operations on FASTA files
- Show how to slice out coding sequences from contig sequences and perform simple selection analysis on genes in MicroBIGG-E
NIH Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) Initiative has launched a new NIH Cloud Lab program that lets you experiment with using cloud for your research. You can request a GCP or AWS account, and will receive $500 and three months, in addition to access to biomedical tutorials that walk you through common cloud-based research use cases. This is available to intramural researchers currently but expect it to be ready for extramural researchers in the coming months. Learn more via this link- https://cloud.nih.gov/resources/cloudlab/
Please let us know at suggest@ncbi.nlm.nih.gov if this has been useful and if you're using any of these resources in your work and especially if you use these resources in your publications.
Commands used to set up the VM image These are the commands used to install the software used in the workshop on the stock Google Cloud Ubuntu 20.04 image.
This work was supported by the National Center for Biotechnology Information of the National Library of Medicine (NLM) and the National Institute of Allergy and Infectious disease (NIAID), National Institutes of Health
- Setup
- Part 1: SRA and Viral Resources
- Introduction
- Background
-
Introductory Exercises
- Exercise 1: Review metadata table contents and help docs
- Exercise 2: Review taxonomy table and help docs
- Exercise 3: Review STAT results table and help docs
- Exercise 4: Review VCF results table and help docs
- Project A: Find SARS2 data with paired Illumina and ONT samples, generated using ARTICv3
-
Project B: Find SARS-CoV-2 data with low reference coverage
- Part 1: Retrieving SRA data
-
Part 2: Retrieving a reference genome sequence
- Command 4: To retrieve the reference genome as a fasta, we can use edirect.
- Part 3: Writing bash scripts to calculate and plot reference genome coverage
-
Part 4: Analyzing your SRA records
- Command 9: Now analyze the two SRA records
- Project C: Find variant calls that are common between paired Illumina and ONT SARS-CoV-2 records
- Part 2: NCBI Pathogen Detection
- Background
- Project 1
-
Project 2 Generate tree of KPC alleles to examine evolution of size variants
- Background
- Step 1 Download FASTA file of all blaKPC alleles from the Reference Gene Catalog
- Step 2 Filter for sequences less than 297 amino-acids in length
- Step 3 Align the sequences with Muscle muscle
- Step 4 Infer the tree using RAxML raxml
- Step 5 Visualize the tree in iTOL
- Discussion
-
Project 3 Selection analysis on 293-aa blaKPC genes
- Background
- Step 1 Get a list of contigs with sequences of interest
- Step 2 Download contig sequences
- Step 3 Cut out coding sequences
- Step 4 Prepare sequences for selection analysis
- Step 5 Use RAxML to infer a tree
- Step 6 Run FUBAR test with HyPhy
- Discussion