## Infer causal Structure on ScanPy Data

#### Structure:
A: Load Data from file & look at structure

B: Algorithms
1. GRNBoost2
2. GIES
3. DCDI

Dependencies:
 use a conda-env with:
 - scanpy python-igraph leidenalg

 GRNBoost:
 - conda install -c bioconda arboreto
 
 GIES:
 - pip install gies

In [None]:
import numpy as np
import pandas as pd
import scanpy as sc
import matplotlib.pyplot as plt

import scp_infer as scpi

In [None]:
results_file = '../data/edited/Schraivogel_chr8_ad-scaled_10gene.h5ad'  # the file that will store the analysis results

1. Read File

In [None]:
adata = sc.read_h5ad(results_file)

Check what count distribution looks like:

In [None]:
#1st step: extract data matrix, gene names and cell names from the AnnData object
gene_names = adata.var_names
cell_names = adata.obs_names

#print("Data matrix shape: ", df.shape)
#print("sample: ", df.iloc[0:3,0:3])
print(len(gene_names),"genes: ", [i for i in gene_names[:3]])
print(len(cell_names),"cells: ", [i for i in cell_names[:1]])

#2nd step: extract metadata from the AnnData object and exctract perturbation information
metadata = adata.obs
metadata.head()

# Look at more perturbation labels
# print(adata.obs['perturbation'].astype(str).copy()[1000:1020])

In [None]:
# print([i for i in adata.var['mean'][0:10]])
# print([i for i in adata.var['std'][0:10]])
# print corresponding perturbation labels
print('Perturbations: ', [i for i in adata.obs['perturbation'][:10]])

scpi.adata.print_expression_mean_std(adata)

# B. Algorithms

### 1. GRNBoost2

In [None]:
run_GRNBoost = False
if run_GRNBoost:
    grnb = scpi.inference.grnboost2.GRNBoost2Imp(adata, verbose= True)
    grnb.convert_data()
    grnboost_matrix = grnb.infer()
    scpi.visualize.plot_adjacency_matrix(grnboost_matrix, title="DCDI")

### 2. GIES

1. Reshape Count matrix
2. Run GIES


GIES Matrix Format - collected by intervention locations :
- data: n_interventions x n_samples/intervention (->take min.) x n_variables
- Intervention: 1 x n_intervention

Data Distribution:
- scale to mean 0 & std 1

-> intervened values <<0

In [None]:
run_GIES = True
data_GIES = True

In [None]:
if run_GIES or data_GIES:
    gies_imp = scpi.inference.gies.GIESImp(adata, verbose= True)
    gies_imp.convert_data(singularized=False)

In [None]:
# Save the data if it should be used externally
if data_GIES:
    np.save("../data/temp/gies_data_matrix.npy", gies_imp.data_matrix)

    import json

    with open("../data/temp/gies_intervention_list.json", 'w') as f:
        # indent=2 is not needed but makes the file human-readable 
        # if the data is nested
        json.dump(gies_imp.intervention_list, f, indent=2) 

In [None]:
# Run GIES
if run_GIES:
    gies_matrix = gies_imp.infer(plot=True)
    scpi.visualize.plot_adjacency_matrix(gies_matrix, title="DCDI")

### 3. DCDI

In [None]:
run_DCDI = False


In [None]:
if run_DCDI:
    dcdi_imp = scpi.inference.dcdi.DCDIImp(adata, verbose= True)
    dcdi_imp.convert_data()
    dcdi_matrix = dcdi_imp.infer()
    scpi.visualize.plot_adjacency_matrix(dcdi_matrix, title="DCDI")