#  Getting started: initializing, adding data, and saving your SwanGraph 

First, if you haven't already, make sure to [install Swan](https://github.com/fairliereese/swan_vis/wiki#installation).
After installing, you'll be able to run Swan from Python.

Then, download the data and the reference transcriptome annotation from [here](http://crick.bio.uci.edu/freese/swan_files_example/). The bash commands to do so are given below.

The main workflow to get started with Swan consists of:
1. [Adding a reference transcriptome (optional)](#add_trans)
2. Adding a transcriptome for your samples
    * [From a GTF](#add_gtf)
    * [From a TALON db](#add_db)
3. [Adding datasets and their expression values](#add_ab)
4. [Adding metadata to your datasets](#add_meta)
 
Other sections: 
* [Example data download](#data_download)
* [Starting and initializing your SwanGraph](#init)
* [Saving and loading your SwanGraph](#save_load)

This page can also be read from top to bottom, just know that you may be running things more than once!

For information on the file formats needed to use Swan, please read the [file format specifications FAQ](https://freese.gitbook.io/swan/faqs/file_formats).

<!-- Running this tutorial (with only one of the dataset addition options) on my laptop took around 7 minutes and 5 GB of RAM.  -->

## <a name="data_download"></a> Download example data

This data is the data used in the [Swan publication](https://academic.oup.com/bioinformatics/article/37/9/1322/5912931)

Run this block in your bash terminal
```bash
mkdir data
mkdir figures
cd data/

# download files
wget http://crick.bio.uci.edu/freese/swan_files.tgz

# expand files 
tar -xzf swan_files.tgz
mv swan_files/* .
rm -r swan_files/

# download reference annotation
wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_29/gencode.v29.annotation.gtf.gz
gunzip gencode.v29.annotation.gtf.gz

cd ../
```

Alternatively, just run on a smaller example, chr20.

Run this block in your bash terminal

```bash
mkdir data
mkdir figures
cd data/

# download files
wget http://crick.bio.uci.edu/freese/swan_files_example.tgz
    
# expand files 
tar -xzf swan_files_example.tar.gz
mv swan_files_example/* .
rm -r swan_files_example/

cd ../
```

## <a name="init"></a>Starting up Swan and initializing your SwanGraph

The rest of the code in this tutorial should be run in using Python

Initialize an empty SwanGraph and add the transcriptome annotation to the SwanGraph.

In [1]:
import swan_vis as swan

# initialize a new SwanGraph
sg = swan.SwanGraph() 

**Note:** to initialize a SwanGraph in single-cell mode (which will avoid calculating percent isoform use \[pi\] numbers for each cell), use the following code:

```python
sg = swan.SwanGraph(sc=True)
```

In [3]:
annot_gtf = 'data/gencode.v29.annotation.gtf'
data_gtf = 'data/all_talon_observedOnly.gtf'
ab_file = 'data/all_talon_abundance_filtered.tsv'
talon_db = 'data/talon.db'
pass_list = 'data/all_pass_list.csv'
meta = 'data/metadata.tsv'

## <a name="add_trans"></a>Adding a reference transcriptome

In [76]:
# add an annotation transcriptome 
sg.add_annotation(annot_gtf)


Adding annotation to the SwanGraph


## <a name="add_gtf"></a>Adding transcript models from a GTF

Add all filtered transcript models to the SwanGraph.

In [77]:
# add a dataset's transcriptome and abundance information to
# the SwanGraph
sg.add_transcriptome(data_gtf)


Adding transcriptome to the SwanGraph


## <a name="add_ab"></a>Adding datasets and their abundance

Use an abundance matrix with columns for each desired dataset to add datasets to the SwanGraph.

In [78]:
# add each dataset's abundance information to the SwanGraph
sg.add_abundance(ab_file)


Adding abundance for datasets hepg2_1, hepg2_2, hffc6_1, hffc6_2, hffc6_3 to SwanGraph.




##  <a name="save_load"></a>Saving and loading your SwanGraph

Following this, you can save your SwanGraph so you can easily work with it again without re-adding all the data.

In [79]:
# save the SwanGraph as a Python pickle file
sg.save_graph('swan')

Saving graph as swan.p


And you can reload the graph again.

In [80]:
# load up a saved SwanGraph from a pickle file
sg = swan.read('swan.p')

Read in graph from swan.p


##  <a name="add_db"></a>Adding transcript models from a TALON DB

Swan is also directly compatible with TALON databases and can pull transcript models directly from them. You can also optionally pass in a list of isoforms from [`talon_filter_transcripts`](https://github.com/mortazavilab/TALON#talon_filter) to filter your input transcript models.

In [5]:
# for this new example, create a new empty SwanGraph
sg = swan.SwanGraph()

# and add the annotation transcriptome to it
sg.add_annotation(annot_gtf)

# add transcriptome from TALON db
sg.add_transcriptome(talon_db, pass_list=pass_list)

# add each dataset's abundance information to the SwanGraph
sg.add_abundance(ab_file)


Adding annotation to the SwanGraph

Adding transcriptome to the SwanGraph

Adding abundance for datasets hepg2_1, hepg2_2, hffc6_1, hffc6_2, hffc6_3 to SwanGraph.




##  <a name="add_meta"></a>Adding metadata

Swan provides functionality to perform tests and plotting on the basis of metadata categories.

In [9]:
sg.add_metadata(meta)

AnnData expects .obs.index to contain strings, but got values like:
    [0, 1, 2, 3, 4]

    Inferred to be: integer

  value_idx = self._prep_dim_index(value.index, attr)


In [10]:
sg.adata.obs

Unnamed: 0_level_0,dataset,cell_line,replicate
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
hepg2_1,hepg2_1,hepg2,1
hepg2_2,hepg2_2,hepg2,2
hffc6_1,hffc6_1,hffc6,1
hffc6_2,hffc6_2,hffc6,2
hffc6_3,hffc6_3,hffc6,3


In [13]:
# save the SwanGraph as a Python pickle file
sg.save_graph('data/swan')
sg.save_graph('swan')
sg.save_graph('data/swan_files_full')
# sg.save_graph('data/swan_back')

Saving graph as data/swan.p
Saving graph as swan.p
Saving graph as data/swan_files_full.p
Saving graph as data/swan_back.p


In [12]:
print(sg.adata.layers['counts'][:5, :5])
print(sg.adata.layers['tpm'][:5, :5])
print(sg.adata.layers['pi'][:5, :5])

[[ 98.  43.   4.  23.   0.]
 [207.  66.   6.  52.   0.]
 [100. 148.   0.  82.   0.]
 [108. 191.   0.  98.   0.]
 [ 91. 168.   2. 106.   0.]]
[[196.13847    86.06076     8.005652   46.032497    0.       ]
 [243.97517    77.789185    7.071744   61.28845     0.       ]
 [131.32097   194.35504     0.        107.6832      0.       ]
 [137.06158   242.39594     0.        124.37069     0.       ]
 [147.9865    273.20584     3.2524502 172.37987     0.       ]]
[[100.       100.       100.       100.         0.      ]
 [ 99.519226 100.        60.000004 100.         0.      ]
 [ 98.039215 100.         0.       100.         0.      ]
 [ 99.08257  100.         0.       100.         0.      ]
 [100.       100.       100.       100.         0.      ]]


In [18]:
df = swan.calc_pi(sg.adata, sg.t_df, obs_col='dataset')

In [19]:
print(sg.adata.layers['counts'][:5, :5])
print(sg.adata.layers['tpm'][:5, :5])
print(sg.adata.layers['pi'][:5, :5])

[[ 98.  43.   4.  23.   0.]
 [207.  66.   6.  52.   0.]
 [100. 148.   0.  82.   0.]
 [108. 191.   0.  98.   0.]
 [ 91. 168.   2. 106.   0.]]
[[196.13847    86.06076     8.005652   46.032497    0.       ]
 [243.97517    77.789185    7.071744   61.28845     0.       ]
 [131.32097   194.35504     0.        107.6832      0.       ]
 [137.06158   242.39594     0.        124.37069     0.       ]
 [147.9865    273.20584     3.2524502 172.37987     0.       ]]
[[100.       100.       100.       100.         0.      ]
 [ 99.519226 100.        60.000004 100.         0.      ]
 [ 98.039215 100.         0.       100.         0.      ]
 [ 99.08257  100.         0.       100.         0.      ]
 [100.       100.       100.       100.         0.      ]]
