# Tutorial

I assume you've already installed refgenie. In this tutorial I'll show you a few ways to use refgenie from the command line (commands that start with a `!`), and also some Python commands.

To start, initialize an empty refgenie configuration file from the shell:

In [1]:
!refgenie init -c refgenie.yaml

Initialized genome configuration file: /home/nsheff/code/refgenie/docs_jupyter/refgenie.yaml


Here's what it looks like:

In [2]:
!cat refgenie.yaml

config_version: 0.3
genome_folder: /home/nsheff/code/refgenie/docs_jupyter
genome_servers: ['http://refgenomes.databio.org']
genomes: null


Now let's enter python and do some stuff.

In [3]:
import refgenconf
rgc = refgenconf.RefGenConf("refgenie.yaml")

Use `pull` to download the actual asset:

In [4]:
rgc.pull("hs38d1", "fasta", "default")

                                                       

(['hs38d1', 'fasta', 'default'],
 {'archive_digest': '310c578812a64fcdf08d2df60d7b79b4',
  'archive_size': '1.7MB',
  'asset_children': ['hs38d1/star_index:default',
   'hs38d1/bwa_index:default',
   'hs38d1/bowtie2_index:default',
   'hs38d1/bismark_bt1_index:default',
   'hs38d1/bismark_bt2_index:default',
   'hs38d1/hisat2_index:default',
   'hs38d1/tallymer_index:default',
   'hs38d1/suffixerator_index:default'],
  'asset_digest': 'eddf5466faa3391a7114e87648466dcb',
  'asset_parents': [],
  'asset_path': 'fasta',
  'asset_size': '6.0MB',
  'seek_keys': {'chrom_sizes': 'hs38d1.chrom.sizes',
   'fai': 'hs38d1.fa.fai',
   'fasta': 'hs38d1.fa'}},
 'http://refgenomes.databio.org')

Once it's downloaded, use `seek` to retrieve a path to it.

In [5]:
rgc.seek("hs38d1", "fasta")

'/home/nsheff/code/refgenie/docs_jupyter/hs38d1/fasta/default/hs38d1.fa'

You can get the unique asset identifier with `id()`

In [6]:
rgc.id("hs38d1", "fasta")

'eddf5466faa3391a7114e87648466dcb'

## Building and pulling from the command line

Here, we can build a fasta asset instead of pulling one. Back to the shell, we'll grab the Revised Cambridge Reference Sequence (human mitochondrial genome, because it's small):

In [7]:
!wget http://big.databio.org/refgenie_raw/rCRSd.fa.gz

--2020-03-13 16:11:59--  http://big.databio.org/refgenie_raw/rCRSd.fa.gz
Resolving big.databio.org (big.databio.org)... 128.143.245.181
Connecting to big.databio.org (big.databio.org)|128.143.245.181|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8399 (8.2K) [application/octet-stream]
Saving to: ‘rCRSd.fa.gz’


2020-03-13 16:11:59 (214 MB/s) - ‘rCRSd.fa.gz’ saved [8399/8399]



In [8]:
!refgenie build rCRSd/fasta -c refgenie.yaml  --files fasta=rCRSd.fa.gz -R

Using 'default' as the default tag for 'rCRSd/fasta'
Building 'rCRSd/fasta:default' using 'fasta' recipe
Saving outputs to:
- content: /home/nsheff/code/refgenie/docs_jupyter/rCRSd
- logs: /home/nsheff/code/refgenie/docs_jupyter/rCRSd/fasta/default/_refgenie_build
### Pipeline run code and environment:

*              Command:  `/home/nsheff/.local/bin/refgenie build rCRSd/fasta -c refgenie.yaml --files fasta=rCRSd.fa.gz -R`
*         Compute host:  puma
*          Working dir:  /home/nsheff/code/refgenie/docs_jupyter
*            Outfolder:  /home/nsheff/code/refgenie/docs_jupyter/rCRSd/fasta/default/_refgenie_build/
*  Pipeline started at:   (03-13 16:11:59) elapsed: 0.0 _TIME_

### Version log:

*       Python version:  3.7.6
*          Pypiper dir:  `/home/nsheff/.local/lib/python3.7/site-packages/pypiper`
*      Pypiper version:  0.12.1
*         Pipeline dir:  `/home/nsheff/.local/bin`
*     Pipeline version:  None

### Arguments passed to pipeline:

* `asset_registry_paths`:  `[

In [9]:
!refgenie seek rCRSd/fasta -c refgenie.yaml

/home/nsheff/code/refgenie/docs_jupyter/rCRSd/fasta/default/rCRSd.fa


You can do the same thing from within python:

In [10]:
rgc = refgenconf.RefGenConf("refgenie.yaml")
rgc.seek("rCRSd", "fasta")

'/home/nsheff/code/refgenie/docs_jupyter/rCRSd/fasta/default/rCRSd.fa'

 Now if you have bowtie2-build in your PATH you can build the bowtie2 index with no further requirements.

You can see the requirements with `--requirements`:


In [11]:
!refgenie build rCRSd/bowtie2_index -c refgenie.yaml --requirements

'bowtie2_index' recipe requirements: 
- assets:
	fasta (fasta asset for genome); default: fasta


Since I already have the fasta asset, that means I don't need anything else to build the bowtie2_index.

In [12]:
!refgenie build rCRSd/bowtie2_index -c refgenie.yaml

Using 'default' as the default tag for 'rCRSd/bowtie2_index'
Building 'rCRSd/bowtie2_index:default' using 'bowtie2_index' recipe
Saving outputs to:
- content: /home/nsheff/code/refgenie/docs_jupyter/rCRSd
- logs: /home/nsheff/code/refgenie/docs_jupyter/rCRSd/bowtie2_index/default/_refgenie_build
### Pipeline run code and environment:

*              Command:  `/home/nsheff/.local/bin/refgenie build rCRSd/bowtie2_index -c refgenie.yaml`
*         Compute host:  puma
*          Working dir:  /home/nsheff/code/refgenie/docs_jupyter
*            Outfolder:  /home/nsheff/code/refgenie/docs_jupyter/rCRSd/bowtie2_index/default/_refgenie_build/
*  Pipeline started at:   (03-13 16:12:02) elapsed: 0.0 _TIME_

### Version log:

*       Python version:  3.7.6
*          Pypiper dir:  `/home/nsheff/.local/lib/python3.7/site-packages/pypiper`
*      Pypiper version:  0.12.1
*         Pipeline dir:  `/home/nsheff/.local/bin`
*     Pipeline version:  None

### Arguments passed to pipeline:

* `asset_r

You can see a list of available recipes like this:

In [13]:
!refgenie list -c refgenie.yaml

Server subscriptions: http://refgenomes.databio.org
Local genomes: hs38d1, rCRSd
Local recipes: bismark_bt1_index, bismark_bt2_index, blacklist, bowtie2_index, bwa_index, cellranger_reference, dbnsfp, dbsnp, ensembl_gtf, ensembl_rb, epilog_index, fasta, fasta_txome, feat_annotation, gencode_gtf, hisat2_index, kallisto_index, refgene_anno, salmon_index, salmon_partial_sa_index, salmon_sa_index, star_index, suffixerator_index, tallymer_index
Local assets:
              hs38d1/   fasta.chrom_sizes:default, fasta.fai:default, fasta:default
               rCRSd/   bowtie2_index:default, fasta.chrom_sizes:default, fasta.fai:default, fasta:default


You can get the unique digest for any asset with `refgenie id`:

In [14]:
!refgenie id rCRSd/fasta -c refgenie.yaml

rCRSd/fasta:default,4eb430296bc02ed7e4006624f1d5ac53


## Versions

In [15]:
from platform import python_version 
python_version()

'3.5.2'

In [16]:
!refgenie --version

refgenie 0.9.0-dev


In [17]:
refgenconf.__version__

'0.7.0-dev'