# Preliminary work
The session showcases how to infer activities of transcription factor from single cell RNA-seq data and spatial transcriptomics data using three method. Please follow this notebook after you have [set up the environment](https://github.com/osmanbeyoglulab/Tutorials-on-ISMB-2024?tab=readme-ov-file#environment-set-up).

In [1]:
import scanpy as sc
import warnings
warnings.filterwarnings("ignore")

## Downloading datasets
The following datasets will be used for demonstration, which can be downloaded via `scanpy` to the directory `data`.

In [2]:
sc.datasets.pbmc3k()



100%|██████████| 5.58M/5.58M [00:05<00:00, 1.05MB/s]


AnnData object with n_obs × n_vars = 2700 × 32738
    var: 'gene_ids'

In [3]:
sc.datasets.pbmc3k_processed()



100%|██████████| 23.5M/23.5M [00:04<00:00, 5.47MB/s]


AnnData object with n_obs × n_vars = 2638 × 1838
    obs: 'n_genes', 'percent_mito', 'n_counts', 'louvain'
    var: 'n_cells'
    uns: 'draw_graph', 'louvain', 'louvain_colors', 'neighbors', 'pca', 'rank_genes_groups'
    obsm: 'X_pca', 'X_tsne', 'X_umap', 'X_draw_graph_fr'
    varm: 'PCs'
    obsp: 'distances', 'connectivities'

In [4]:
sc.datasets.visium_sge(sample_id="V1_Human_Lymph_Node")



100%|██████████| 7.86M/7.86M [00:01<00:00, 6.36MB/s]




100%|██████████| 29.3M/29.3M [00:03<00:00, 8.41MB/s]


AnnData object with n_obs × n_vars = 4035 × 36601
    obs: 'in_tissue', 'array_row', 'array_col'
    var: 'gene_ids', 'feature_types', 'genome'
    uns: 'spatial'
    obsm: 'spatial'

## decoupleR

In [5]:
pip install decoupler

[0mNote: you may need to restart the kernel to use updated packages.


In [6]:
import decoupler
decoupler.__version__

'1.5.0'

## pySCENIC

In [7]:
pip install pyscenic



[0mNote: you may need to restart the kernel to use updated packages.


In [8]:
import pyscenic
pyscenic.__version__

'0.12.1'

### Downloading resources and databases

In [9]:
!mkdir resources_pyscenic
!curl https://resources.aertslab.org/cistarget/tf_lists/allTFs_hg38.txt \
    -o resources_pyscenic/allTFs_hg38.txt
!curl https://resources.aertslab.org/cistarget/motif2tf/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl \
    -o resources_pyscenic/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl
!curl https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc_v10_clust/gene_based/hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather \
    -o resources_pyscenic/hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather
!curl https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc_v10_clust/gene_based/hg38_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather \
    -o resources_pyscenic/hg38_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 11690  100 11690    0     0   9482      0  0:00:01  0:00:01 --:--:--  9504
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 94.1M  100 94.1M    0     0   814k      0  0:01:58  0:01:58 --:--:-- 2559k0     0   622k      0  0:02:34  0:00:28  0:02:06  134k0     0   452k      0  0:03:33  0:00:40  0:02:53 5771204:07  0:00:48  0:03:19 76785  0     0   377k      0  0:04:15  0:00:50  0:03:25 784672k      0  0:04:50  0:01:01  0:03:49  143k    0   311k      0  0:05:09  0:01:08  0:04:01  106k   0     0   272k      0  0:05:53  0:01:27  0:04:26  145k 0   271k      0  0:05:54  0:01:28  0:04:26  153k91k      0  0:05:31  0:01:33  0:03:58  628k 0   642k      0  0:02:29  0:01:48  0:00:41 2591k


## STAN
Install additional packages to support STAN.

In [10]:
pip install squidpy==1.5.0

[31mERROR: Ignored the following versions that require a different python version: 1.3.0 Requires-Python >=3.9; 1.3.1 Requires-Python >=3.9; 1.4.0 Requires-Python >=3.9; 1.4.1 Requires-Python >=3.9; 1.5.0 Requires-Python >=3.9[0m[31m
[0m[31mERROR: Could not find a version that satisfies the requirement squidpy==1.5.0 (from versions: 1.0.0, 1.0.1, 1.1.0, 1.1.1, 1.1.2, 1.2.0, 1.2.1, 1.2.2, 1.2.3)[0m[31m
[0m[31mERROR: No matching distribution found for squidpy==1.5.0[0m[31m
[0mNote: you may need to restart the kernel to use updated packages.


In [11]:
import squidpy
squidpy.__version__

'1.2.3'

In [12]:
pip install statsmodels==0.14.2

[31mERROR: Ignored the following versions that require a different python version: 0.14.2 Requires-Python >=3.9[0m[31m
[0m[31mERROR: Could not find a version that satisfies the requirement statsmodels==0.14.2 (from versions: 0.4.0, 0.4.1, 0.4.3, 0.5.0rc1, 0.5.0, 0.6.0rc1, 0.6.0rc2, 0.6.0, 0.6.1, 0.8.0rc1, 0.8.0, 0.9.0rc1, 0.9.0, 0.10.0rc2, 0.10.0, 0.10.1, 0.10.2, 0.11.0rc1, 0.11.0rc2, 0.11.0, 0.11.1, 0.12.0rc0, 0.12.0, 0.12.1, 0.12.2, 0.13.0rc0, 0.13.0, 0.13.1, 0.13.2, 0.13.3, 0.13.4, 0.13.5, 0.14.0rc0, 0.14.0, 0.14.1)[0m[31m
[0m[31mERROR: No matching distribution found for statsmodels==0.14.2[0m[31m
[0mNote: you may need to restart the kernel to use updated packages.


In [13]:
import statsmodels
statsmodels.__version__

'0.14.0'

### Downloading supporting files

In [15]:
!mkdir resources_stan
!curl https://raw.githubusercontent.com/vitkl/cell2location_paper/1c645a0519f8f27ecef18468cf339d35d99f42e7/notebooks/selected_results/lymph_nodes_analysis/CoLocationModelNB4V2_34clusters_4039locations_10241genes_input_inferred_V4_batch1024_l2_0001_n_comb50_5_cps5_fpc3_alpha001/W_cell_density.csv \
    -o resources_stan/W_cell_density.csv
!curl https://raw.githubusercontent.com/vitkl/cell2location_paper/1c645a0519f8f27ecef18468cf339d35d99f42e7/notebooks/selected_results/lymph_nodes_analysis/CoLocationModelNB4V2_34clusters_4039locations_10241genes_input_inferred_V4_batch1024_l2_0001_n_comb50_5_cps5_fpc3_alpha001/manual_GC_annot.csv \
    -o resources_stan/manual_GC_annot.csv

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 2660k  100 2660k    0     0   593k      0  0:00:04  0:00:04 --:--:--  593k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 81467  100 81467    0     0   236k      0 --:--:-- --:--:-- --:--:--  239k


In [14]:
import session_info
session_info.show()