## **File: per_sample_cNMF.ipynb**
---
Description: Seperating each Sample ID in the original h5ad found here: <br>
**<span style="color:magenta">/home/james/data/immune_exclusion_data/outer_combined_all4_dat.h5ad</span>** <br>
This is then used in the automated cNMF script: sample_cnmf.sh

#### **Imports and Directories**

In [1]:
import pandas as pd
import scanpy as sc
import MILWRM.ST as st
import sys; sys.path.append("/home/james/git/spatial_CRC_atlas/resources/ST/")
from visium_utils import deconvolve_cnmf
from MILWRM.ST import assemble_pita

# some stuff to make this notebook work better with Scanpy
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

In [2]:
# make output directory
import os
if not os.path.exists(
    "/home/james/data/immune_exclusion_data/seperate_sample_data"
    ):
        os.mkdir(
            "/home/james/data/immune_exclusion_data/seperate_sample_data"
            )

### **Import data**
---
**<span style="color:magenta">/home/james/data/immune_exclusion_data/outer_combined_all4_dat.h5ad</span>** <br>

In [3]:
# Data
immune_adata = sc.read_h5ad('/home/james/data/immune_exclusion_data/outer_combined_all4_dat.h5ad')

### **Data seperation**
---
**Description:** Using sample ID to create sub samples of each ID

In [37]:
# Unique Sample IDs
samples = immune_adata.obs['SampleId'].unique()

for i in samples:
    # create sub anndata objects containing one sample
    sub_data = immune_adata[immune_adata.obs['SampleId'] == i].copy()
    n_cells = len(sub_data.obs['Cell_Type'].unique())
    # filter genes with 0 counts
    sc.pp.filter_genes(sub_data, min_cells = 1)
    # filter cells with 0 counts
    sc.pp.filter_cells(sub_data, min_counts = 1)
    # write anndata objects to new directory for each sample
    #sub_data.write_h5ad('/home/james/data/immune_exclusion_data/seperate_sample_data/sep_{}_sample.h5ad'.format(i))
    # print the stucture of each subsample after filter
    print(f"Sample: {i} \n Shape: {sub_data.shape} \n Number of Cell Types: {n_cells}")

Sample: 9142_s2 
 Shape: (5569, 15432) 
 Number of Cell Types: 13
Sample: 9142_s1 
 Shape: (1921, 13896) 
 Number of Cell Types: 6
Sample: 10096_s1 
 Shape: (20871, 15725) 
 Number of Cell Types: 12
Sample: 10096_s2 
 Shape: (831, 13337) 
 Number of Cell Types: 6
Sample: 10096_s3 
 Shape: (10322, 15655) 
 Number of Cell Types: 10
Sample: 10096_s4 
 Shape: (12418, 15421) 
 Number of Cell Types: 8
Sample: 10180_01_s1 
 Shape: (13425, 15143) 
 Number of Cell Types: 11
Sample: 10180_01_s2 
 Shape: (1644, 13394) 
 Number of Cell Types: 5
Sample: 10180_01_s3 
 Shape: (1476, 14100) 
 Number of Cell Types: 9
Sample: 10180_01_s4 
 Shape: (6922, 15275) 
 Number of Cell Types: 12
Sample: 10180_02_s1 
 Shape: (8038, 15091) 
 Number of Cell Types: 10
Sample: 10180_02_s2 
 Shape: (3248, 14188) 
 Number of Cell Types: 9
Sample: 10180_02_s3 
 Shape: (13742, 15594) 
 Number of Cell Types: 12
Sample: 10180_02_s4 
 Shape: (7859, 15325) 
 Number of Cell Types: 12
Sample: 10284_s1 
 Shape: (1618, 12764) 
 