# ExploSig Connect Demo

## Step 1. Install

In [0]:
!pip install explosig-connect

## Step 2. Import

In [0]:
from explosig_connect import connect

## Step 3. Connect!

Executing the `connect` function with no parameters starts a new "empty" ExploSig session. On Colab, a new tab will be opened.

There are alternative ways to access the new session that may be useful for other Python environments:
- `connect(how='nb_link')` This prints the session URL as a link (in a Jupyter notebook).
- `connect(how='nb_js')` This injects JavaScript to automatically open the session URL (in a Jupyter notebook).
- `connect(how='browser')` This opens the session URL as a new tab (using the default browser).

In [0]:
conn = connect()

## Step 4. Process data

This is where you would do some custom data analysis to obtain mutation signature data to send to ExploSig for visualization.

To demonstrate this, we present an example in which we reproduce results from Kasar et al. \[1\].

In [0]:
import pandas as pd

# Load data
data_url = "https://raw.githubusercontent.com/keller-mark/explosig-connect/master/examples/data/{filename}"
# - counts data
sbs_counts_nosplit_df = pd.read_csv(data_url.format(filename="counts.DFCI-30-Kasar2015.nosplit.WGS.SBS-96.tsv"), sep='\t', index_col=0)
sbs_counts_nmd1000_df = pd.read_csv(data_url.format(filename="counts.DFCI-30-Kasar2015.nmd1000.WGS.SBS-96.tsv"), sep='\t', index_col=0)
# - signatures and exposures data,
#   preprocessed using the code here: https://github.com/keller-mark/Reproducing-Kasar2015
sbs_sigs_nosplit_df = pd.read_csv(data_url.format(filename="nosplit_run_5_W.txt"), sep='\t', index_col=0).transpose()
sbs_sigs_nmd1000_df = pd.read_csv(data_url.format(filename="nmd1000_run_30_W.txt"), sep='\t', index_col=0).transpose()
sbs_exps_nosplit_df = pd.read_csv(data_url.format(filename="nosplit_run_5_H.txt"), sep='\t', index_col=0).transpose()
sbs_exps_nmd1000_df = pd.read_csv(data_url.format(filename="nmd1000_run_30_H.txt"), sep='\t', index_col=0).transpose()


# Generate mutation type counts df
counts_nosplit_df = pd.DataFrame(index=sbs_counts_nosplit_df.index.values.tolist(), columns=["SBS", "DBS", "INDEL"], data=[{"SBS": sbs_count, "DBS": 0, "INDEL": 0} for sbs_count in sbs_counts_nosplit_df.sum(axis=1).values.tolist()])
counts_nmd1000_df = pd.DataFrame(index=sbs_counts_nmd1000_df.index.values.tolist(), columns=["SBS", "DBS", "INDEL"], data=[{"SBS": sbs_count, "DBS": 0, "INDEL": 0} for sbs_count in sbs_counts_nmd1000_df.sum(axis=1).values.tolist()])

# Generate samples metadata df
sample_metadata_nosplit_df = pd.DataFrame(
    index=sbs_counts_nosplit_df.index.values.tolist(),
    columns=["Study"],
    data=[ {"Study": "DFCI-30-Kasar2015"} for sample_id in sbs_counts_nosplit_df.index.values.tolist() ]
)
sample_metadata_nmd1000_df = pd.DataFrame(
    index=sbs_counts_nmd1000_df.index.values.tolist(),
    columns=["Study"],
    data=[ {"Study": "DFCI-30-Kasar2015"} for sample_id in sbs_counts_nmd1000_df.index.values.tolist() ]
)

## Step 5. Send data

In [0]:
def send_nosplit_data():
    conn.send_sample_metadata(sample_metadata_nosplit_df)
    conn.send_mutation_type_counts(counts_nosplit_df)
    conn.send_signatures('SBS', sbs_sigs_nosplit_df)
    conn.send_exposures('SBS', sbs_exps_nosplit_df)

def send_nmd1000_data():
    conn.send_sample_metadata(sample_metadata_nmd1000_df)
    conn.send_mutation_type_counts(counts_nmd1000_df)
    conn.send_signatures('SBS', sbs_sigs_nmd1000_df)
    conn.send_exposures('SBS', sbs_exps_nmd1000_df)


# Uncomment the following line to send non-split data
#send_nosplit_data()

# Send split data
send_nmd1000_data()


## References
\[1\] Kasar S, Kim J, Improgo R, Tiao G, Polak P, Haradhvala N, Lawrence MS, Kiezun A, Fernandes SM, Bahl S, Sougnez C. Whole-genome sequencing reveals activation-induced cytidine deaminase signatures during indolent chronic lymphocytic leukaemia evolution. _Nature Communications_. 2015 Dec 7;6:8866.