# Tutorial

I assume you've already installed refgenie. In this tutorial I'll show you a few ways to use refgenie from the command line (commands that start with a `!`), and also some Python commands.

To start, initialize an empty refgenie configuration file from the shell and subscribe to the desired asset server:

In [1]:
!refgenie init -c refgenie.yaml -s http://rg.databio.org:82

Initialized genome configuration file: /Users/mstolarczyk/code/refgenie/docs_jupyter/refgenie.yaml
Created directories:
 - /Users/mstolarczyk/code/refgenie/docs_jupyter/data
 - /Users/mstolarczyk/code/refgenie/docs_jupyter/alias


Here's what it looks like:

In [2]:
!cat refgenie.yaml

config_version: 0.4
genome_folder: /Users/mstolarczyk/code/refgenie/docs_jupyter
genome_servers: 
 - http://rg.databio.org:82
genomes: null


Now let's enter python and do some stuff.

In [3]:
import refgenconf
rgc = refgenconf.RefGenConf("refgenie.yaml")

Use `listr` to see what's available on the server:

In [4]:
rgc.listr()

{'http://rg.databio.org:82/v3/assets': OrderedDict([('hg38',
               ['bowtie2_index:default',
                'fasta.chrom_sizes:default',
                'fasta.fai:default',
                'fasta:default']),
              ('human_repeats',
               ['bwa_index:default',
                'fasta.chrom_sizes:default',
                'fasta.fai:default',
                'fasta:default',
                'hisat2_index:default']),
              ('mouse_chrM2x',
               ['bowtie2_index:default',
                'bwa_index:default',
                'fasta.chrom_sizes:default',
                'fasta.fai:default',
                'fasta:default']),
              ('rCRSd',
               ['bowtie2_index:default',
                'fasta.chrom_sizes:default',
                'fasta.fai:default',
                'fasta:default'])])}

Use `pull` to download one of the assets:

In [5]:
rgc.pull("mouse_chrM2x", "fasta", "default")

Output()

(['194f8681e3d9e35b9eca2d17ec5e36bbf5e8c2beea486496', 'fasta', 'default'],
 {'asset_path': 'fasta',
  'asset_digest': '8dfe402f7d29d5b036dd8937119e4404',
  'archive_digest': 'deae753231ebb9df82622c7140e0bd3a',
  'asset_size': '46.8KB',
  'archive_size': '9.1KB',
  'seek_keys': {'fasta': '194f8681e3d9e35b9eca2d17ec5e36bbf5e8c2beea486496.fa',
   'fai': '194f8681e3d9e35b9eca2d17ec5e36bbf5e8c2beea486496.fa.fai',
   'chrom_sizes': '194f8681e3d9e35b9eca2d17ec5e36bbf5e8c2beea486496.chrom.sizes'},
  'asset_parents': [],
  'asset_children': ['194f8681e3d9e35b9eca2d17ec5e36bbf5e8c2beea486496/bwa_index:default',
   '194f8681e3d9e35b9eca2d17ec5e36bbf5e8c2beea486496/bowtie2_index:default']},
 'http://rg.databio.org:82')

Once it's downloaded, use `seek` to retrieve a path to it.

In [6]:
rgc.seek("mouse_chrM2x", "fasta")

'/Users/mstolarczyk/code/refgenie/docs_jupyter/alias/mouse_chrM2x/fasta/default/mouse_chrM2x.fa'

You can get the unique asset identifier with `id()`

In [7]:
rgc.id("mouse_chrM2x", "fasta")

'8dfe402f7d29d5b036dd8937119e4404'

## Building and pulling from the command line

Here, we can build a fasta asset instead of pulling one. Back to the shell, we'll grab the Revised Cambridge Reference Sequence (human mitochondrial genome, because it's small):

In [8]:
!wget -O rCRSd.fa.gz http://big.databio.org/refgenie_raw/files.rCRSd.fasta.fasta

--2020-11-02 08:49:28--  http://big.databio.org/refgenie_raw/files.rCRSd.fasta.fasta
Resolving big.databio.org (big.databio.org)... 128.143.245.181, 128.143.245.182
Connecting to big.databio.org (big.databio.org)|128.143.245.181|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8399 (8.2K) [application/octet-stream]
Saving to: ‘rCRSd.fa.gz’


2020-11-02 08:49:29 (10.3 KB/s) - ‘rCRSd.fa.gz’ saved [8399/8399]



In [9]:
!refgenie build rCRSd/fasta -c refgenie.yaml  --files fasta=rCRSd.fa.gz -R

Using 'default' as the default tag for 'rCRSd/fasta'
Recipe validated successfully against a schema: /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/refgenie/schemas/recipe_schema.yaml
Building 'rCRSd/fasta:default' using 'fasta' recipe
Initializing genome: rCRSd
Loaded AnnotatedSequenceDigestList (1 sequences)
Set genome alias (511fb1178275e7d529560d53b949dba40815f195623bce8e: rCRSd)
Created alias directories: 
 - /Users/mstolarczyk/code/refgenie/docs_jupyter/alias/rCRSd
Saving outputs to:
- content: /Users/mstolarczyk/code/refgenie/docs_jupyter/data/511fb1178275e7d529560d53b949dba40815f195623bce8e
- logs: /Users/mstolarczyk/code/refgenie/docs_jupyter/data/511fb1178275e7d529560d53b949dba40815f195623bce8e/fasta/default/_refgenie_build
### Pipeline run code and environment:

*              Command:  `/Library/Frameworks/Python.framework/Versions/3.6/bin/refgenie build rCRSd/fasta -c refgenie.yaml --files fasta=rCRSd.fa.gz -R`
*         Compute host:  Michal

In [10]:
!refgenie seek rCRSd/fasta -c refgenie.yaml

/Users/mstolarczyk/code/refgenie/docs_jupyter/alias/rCRSd/fasta/default/rCRSd.fa


You can do the same thing from within python:

In [11]:
rgc = refgenconf.RefGenConf("refgenie.yaml")
rgc.seek("rCRSd", "fasta")

'/Users/mstolarczyk/code/refgenie/docs_jupyter/alias/rCRSd/fasta/default/rCRSd.fa'

 Now if you have bowtie2-build in your PATH you can build the bowtie2 index with no further requirements.

You can see the requirements with `--requirements`:


In [12]:
!refgenie build rCRSd/bowtie2_index -c refgenie.yaml --requirements

'bowtie2_index' recipe requirements: 
- assets:
	fasta (fasta asset for genome); default: fasta


Since I already have the fasta asset, that means I don't need anything else to build the bowtie2_index.

In [13]:
!refgenie build rCRSd/bowtie2_index -c refgenie.yaml

Using 'default' as the default tag for 'rCRSd/bowtie2_index'
Recipe validated successfully against a schema: /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/refgenie/schemas/recipe_schema.yaml
Building 'rCRSd/bowtie2_index:default' using 'bowtie2_index' recipe
Saving outputs to:
- content: /Users/mstolarczyk/code/refgenie/docs_jupyter/data/511fb1178275e7d529560d53b949dba40815f195623bce8e
- logs: /Users/mstolarczyk/code/refgenie/docs_jupyter/data/511fb1178275e7d529560d53b949dba40815f195623bce8e/bowtie2_index/default/_refgenie_build
### Pipeline run code and environment:

*              Command:  `/Library/Frameworks/Python.framework/Versions/3.6/bin/refgenie build rCRSd/bowtie2_index -c refgenie.yaml`
*         Compute host:  MichalsMBP
*          Working dir:  /Users/mstolarczyk/code/refgenie/docs_jupyter
*            Outfolder:  /Users/mstolarczyk/code/refgenie/docs_jupyter/data/511fb1178275e7d529560d53b949dba40815f195623bce8e/bowtie2_index/default/_refge

Asset digest: 1262e30d4a87db9365d501de8559b3b4
Default tag for '511fb1178275e7d529560d53b949dba40815f195623bce8e/bowtie2_index' set to: default

### Pipeline completed. Epilogue
*        Elapsed time (this run):  0:00:01
*  Total elapsed time (all runs):  0:00:01
*         Peak memory (this run):  0.0003 GB
*        Pipeline completed time: 2020-11-02 08:49:34
Finished building 'bowtie2_index' asset
Created alias directories: 
 - /Users/mstolarczyk/code/refgenie/docs_jupyter/alias/rCRSd/bowtie2_index/default


You can see a list of available recipes like this:

In [14]:
!refgenie list -c refgenie.yaml --recipes

bismark_bt1_index, bismark_bt2_index, blacklist, bowtie2_index, bwa_index, cellranger_reference, dbnsfp, dbsnp, ensembl_gtf, ensembl_rb, epilog_index, fasta, fasta_txome, feat_annotation, gencode_gtf, hisat2_index, kallisto_index, refgene_anno, salmon_index, salmon_partial_sa_index, salmon_sa_index, star_index, suffixerator_index, tallymer_index


You can get the unique digest for any asset with `refgenie id`:

In [15]:
!refgenie id rCRSd/fasta -c refgenie.yaml

4eb430296bc02ed7e4006624f1d5ac53


## Versions

In [16]:
from platform import python_version 
python_version()

'3.6.5'

In [17]:
!refgenie --version

refgenie 0.10.0-dev | refgenconf 0.10.0-dev
