#  Getting started: initializing, adding data, and saving your SwanGraph 

First, if you haven't already, make sure to [install Swan](https://github.com/fairliereese/swan_vis/wiki#installation).
After installing, you'll be able to run Swan from Python.

Then, download the data and the reference transcriptome annotation from [here](https://hpc.oit.uci.edu/~freese/swan_files/). The bash commands to do so are given below.

Skip to section: 
* [Example data download](#data_download)
* [Starting and initializing your SwanGraph](#init)
* [Add transcript models (GTF) and abundance info](#gtf_ab)
* [Saving and loading your SwanGraph](#save_load)
* [Adding transcript models (GTF) and abundance info separately](#gtf_ab_sep)

## <a name="data_download"></a> Download example data

In [13]:
# mkdir data
# mkdir figures
# cd data/

# # gencode v29 human annotation
# wget https://hpc.oit.uci.edu/~freese/swan_files/gencode.v29.annotation.gtf

# # hepg2 data
# wget https://hpc.oit.uci.edu/~freese/swan_files/hepg2_1_talon.gtf
# wget https://hpc.oit.uci.edu/~freese/swan_files/hepg2_2_talon.gtf

# # hffc6 data
# wget https://hpc.oit.uci.edu/~freese/swan_files/hffc6_1_talon.gtf
# wget https://hpc.oit.uci.edu/~freese/swan_files/hffc6_2_talon.gtf
# wget https://hpc.oit.uci.edu/~freese/swan_files/hffc6_3_talon.gtf

# # abundance file
# wget https://hpc.oit.uci.edu/~freese/swan_files/all_talon_abundance_filtered.tsv

# # example saved SwanGraph
# wget https://hpc.oit.uci.edu/~freese/swan_files/swan.p


# # talon database
# wget https://hpc.oit.uci.edu/~freese/swan_files/talon.db
# cd ../

In [16]:
annot_gtf = 'data/gencode.v29.annotation.gtf'
hep_1_gtf = 'data/hepg2_1_talon.gtf'
hep_2_gtf = 'data/hepg2_2_talon.gtf'
hff_1_gtf = 'data/hffc6_1_talon.gtf'
hff_2_gtf = 'data/hffc6_2_talon.gtf'
hff_3_gtf = 'data/hffc6_3_talon.gtf'
ab_file = 'data/all_talon_abundance_filtered.tsv'

talon_db = 'data/talon.db'

## <a name="init"></a>Starting up Swan and initializing your SwanGraph

In [17]:
import swan_vis as swan

Initialize an empty SwanGraph and add the transcriptome annotation to the SwanGraph.

In [18]:
sg = swan.SwanGraph()
sg.add_annotation(annot_gtf)

Adding dataset annotation to the SwanGraph.


## <a name="gtf_ab"></a>Adding transcript models (GTF) and abundance information at the same time

Add each dataset to the SwanGraph, along with the corresponding abundance information from the abundance matrix. The `count_cols` variable refers to the column name in the abundance file that corresponds to the counts for the input dataset.

In [13]:
sg.add_dataset('HepG2_1', hep_1_gtf,
	counts_file=ab_file,
	count_cols='hepg2_1')
sg.add_dataset('HepG2_2', hep_2_gtf,
	counts_file=ab_file,
	count_cols='hepg2_2')
sg.add_dataset('HFFc6_1', hff_1_gtf,
	counts_file=ab_file,
	count_cols='hffc6_1')
sg.add_dataset('HFFc6_2', hff_2_gtf,
	counts_file=ab_file,
	count_cols='hffc6_2')
sg.add_dataset('HFFc6_3', hff_3_gtf,
	counts_file=ab_file,
	count_cols='hffc6_3')

Adding dataset HepG2_1 to the SwanGraph.
Adding dataset HepG2_2 to the SwanGraph.
Adding dataset HFFc6_1 to the SwanGraph.
Adding dataset HFFc6_2 to the SwanGraph.
Adding dataset HFFc6_3 to the SwanGraph.


##  <a name="save_load"></a>Saving and loading your SwanGraph

Following this, you can save your SwanGraph so you can easily work with it again without re-adding all the data.

In [11]:
sg.save_graph('swan')

Saving graph as swan.p


And you can reload the graph again.

In [12]:
sg = swan.SwanGraph('swan.p')

Graph from swan.p loaded


##  <a name="gtf_ab_sep"></a>Adding transcript models (GTF) and abundance information separately

Swan can also run without abundance information, although many of Swan's analysis functions depend on abundance information. To load just the transcript models, simply just leave out the `counts_file` and `count_cols` arguments to the `add_dataset()` function as shown below.

In [16]:
sg = swan.SwanGraph()
sg.add_annotation(annot_gtf)
sg.add_dataset('HepG2_1', hep_1_gtf)
sg.add_dataset('HepG2_2', hep_2_gtf)
sg.add_dataset('HFFc6_1', hff_1_gtf)
sg.add_dataset('HFFc6_2', hff_2_gtf)
sg.add_dataset('HFFc6_3', hff_3_gtf)

Adding dataset annotation to the SwanGraph.
Adding dataset HepG2_1 to the SwanGraph.
Adding dataset HepG2_2 to the SwanGraph.
Adding dataset HFFc6_1 to the SwanGraph.
Adding dataset HFFc6_2 to the SwanGraph.
Adding dataset HFFc6_3 to the SwanGraph.


If you have just added transcript models to the graph via `add_dataset()` and wish to add abundance information, this can be done using the `add_abundance()` function as seen below. Here, the string passed to `count_cols` is the column in the abundance file that corresponds to the dataset, and the argument passed to `dataset_name` is the name of the dataset that has already been added to the SwanGraph in the previous code block.

In [8]:
sg.add_abundance(ab_file, count_cols='hepg2_1', dataset_name='HepG2_1')
sg.add_abundance(ab_file, count_cols='hepg2_2', dataset_name='HepG2_2')
sg.add_abundance(ab_file, count_cols='hffc6_1', dataset_name='HFFc6_1')
sg.add_abundance(ab_file, count_cols='hffc6_2', dataset_name='HFFc6_2')
sg.add_abundance(ab_file, count_cols='hffc6_3', dataset_name='HFFc6_3')

## <a name="db_ab"></a> Adding transcript models (TALON db) and abundance information

Swan is also directly compatible with TALON databases and can pull transcript models directly from them. There are future plans to expand on this compatibility, making the transition between TALON and Swan even smoother.

In [20]:
sg.add_dataset('HepG2_1', talon_db, 'hepg2_1',
	counts_file=ab_file,
	count_cols='hepg2_1')
sg.add_dataset('HepG2_2', talon_db, 'hepg2_2',
	counts_file=ab_file,
	count_cols='hepg2_2')
sg.add_dataset('HFFc6_1', talon_db, 'hffc6_1',
	counts_file=ab_file,
	count_cols='hffc6_1')
sg.add_dataset('HFFc6_2', talon_db, 'hffc6_2',
	counts_file=ab_file,
	count_cols='hffc6_2')
sg.add_dataset('HFFc6_3', talon_db, 'hffc6_3',
	counts_file=ab_file,
	count_cols='hffc6_3')

Adding dataset HepG2_1 to the SwanGraph.
Getting transcripts for data/talon.db from hepg2_1


OperationalError: near "WHERE": syntax error