In [1]:
import sys
#show the pass to basiss
sys.path.append('../')
from basiss.preprocessing import Sample 

import pickle as pkl

The experimental data consists of several layers of information that are tightly linked, the main parts are:

1) BaSISS (mut) or ISS (exp and imm) decoded singals 
2) Backround tissue image (DAPI)
3) Segmented nulcei locations
4) Selected regions of interest.

In addition, due to the large image size and the decoding limitations, some large images are splited into 'Top' and 'Bottom' parts. Those should be registered back on the whole slide DAPI background.

To make the downstream analysis easier we store these layers in a single `basiss.preprocessing.Sample` object, which represent a fluorescent imaging experiment on a single tissue slide with the attached metadata. 

To create a sample and register signals run
```
from basiss.preprocessing import Sample 
sample = Sample(iss_data='path_to_decoded_iss_data',
                image='path_to_original_image_used_in_decoding',
                cell_data='path_to_segmented_nuclei_position',
                masks_svg='path_to_regions_as_svg')
sample.transform_points2background('path_to_full_background_DAPI')
```
If splited sample needs to be combined, after the execution of aforementioned code run
```
import copy
sample = copy.deepcopy(sample_top.add_gene_data(sample_bottom))
```

# PD9694 (Case 1)
Case 1 consists of two oestrogen receptor positive and HER2-negative primary invasive breast cancers (PBC) within a 5cm bed of DCIS: We sampled both PBCs (PD9694a,c or ER1/ER2) and three regions from the DCIS (PD9694d,l,m or D1, D2 and D3).

For all tissue samples, we have **main BaSISS** (mut), **ISS oncology** (exp) and **ISS expression** (imm). In addition to the main BaSISS layer (R1), for samples PD9694d,a,c (D1, ER1, ER2), we have a **validation BaSISS** technical replica done on the consequitive slides (R0)

## BaSISS main

In [2]:
# fusing split images

labels = ['d', 'd', 'm', 'm']
sections = ['Top', 'Bottom', 'Top', 'Bottom']
masks_svgs = ['../submission/external_data/PD9694/regions/Mut_PD9694d_path.svg',
              '../submission/external_data/PD9694/regions/Mut_PD9694d_path.svg',
              None,
              None]

sample_list = []
for i in range(len(labels)):
    print(f'MutR1_PD9694{labels[i]}')
    sample = Sample(iss_data=f'../submission/external_data/PD9694/GMM_decoding/decoding/MutR1_PD9694{labels[i]}_{sections[i]}_GMMdecoding.csv',
                    image=f'../submission/external_data/PD9694/GMM_decoding/restored_DAPI/MutR1_PD9694{labels[i]}_{sections[i]}.tif',
                    cell_data=f'../submission/external_data/PD9694/cell_segmentation/MutR1_PD9694{labels[i]}_cellpos.csv',
                    masks_svg=masks_svgs[i])
    #map on the scaffold full image
    sample.transform_points2background(f'../submission/external_data/PD9694/DAPI_background/MutR1_PD9694{labels[i]}.tif', upsampling=15)
    sample_list.append(sample)

sample_list[0].add_gene_data(sample_list[1])
sample_list[2].add_gene_data(sample_list[3])
d_mut_sample = sample_list[0]
m_mut_sample = sample_list[2]


MutR1_PD9694d
image load complete
good matches 495 / 3000
MutR1_PD9694d
image load complete
good matches 418 / 3000
MutR1_PD9694m
image load complete
good matches 859 / 3000
MutR1_PD9694m
image load complete
good matches 682 / 3000


In [3]:
labels = ['a', 'c', 'l']
masks_svgs = ['../submission/external_data/PD9694/regions/Mut_PD9694a_path.svg',
              '../submission/external_data/PD9694/regions/Mut_PD9694c_path.svg',
              '../submission/external_data/PD9694/regions/Mut_PD9694l_path.svg']

sample_list = []
for i in range(len(labels)):
    print(f'MutR1_PD9694{labels[i]}')
    sample = Sample(iss_data=f'../submission/external_data/PD9694/GMM_decoding/decoding/MutR1_PD9694{labels[i]}_GMMdecoding.csv',
                    image=f'../submission/external_data/PD9694/GMM_decoding/restored_DAPI/MutR1_PD9694{labels[i]}.tif',
                    cell_data=f'../submission/external_data/PD9694/cell_segmentation/MutR1_PD9694{labels[i]}_cellpos.csv',
                    masks_svg=masks_svgs[i])
    #map on the scaffold image (in case decoding image was shifted)
    sample.transform_points2background(f'../submission/external_data/PD9694/DAPI_background/MutR1_PD9694{labels[i]}.tif', upsampling=15)
    sample_list.append(sample)

MutR1_PD9694a
image load complete
good matches 725 / 3000
MutR1_PD9694c
image load complete
good matches 947 / 3000
MutR1_PD9694l
image load complete
good matches 2613 / 3000


In [4]:
#list of bassis R1 samples
mut_sample_list = [d_mut_sample] + sample_list[:] + [m_mut_sample]

## BaSISS Validation

In [5]:
labels = ['d', 'a', 'c']
masks_svgs = [None,
              None,
              None]

val_sample_list = []
for i in range(len(labels)):
    print(f'MutR0_PD9694{labels[i]}')
    sample = Sample(iss_data=f'../submission/external_data/PD9694/GMM_decoding/decoding/MutR0_PD9694{labels[i]}_GMMdecoding.csv',
                    image=f'../submission/external_data/PD9694/GMM_decoding/restored_DAPI/MutR0_PD9694{labels[i]}.tif',
                    cell_data=f'../submission/external_data/PD9694/cell_segmentation/MutR0_PD9694{labels[i]}_cellpos.csv',
                    masks_svg=masks_svgs[i])
    #map on the scaffold image (in case decoding image was shifted)
    sample.transform_points2background(f'../submission/external_data/PD9694/DAPI_background/MutR0_PD9694{labels[i]}.tif', upsampling=15)
    val_sample_list.append(sample)

MutR0_PD9694d
image load complete
good matches 1110 / 3000
MutR0_PD9694a
image load complete
good matches 1297 / 3000
MutR0_PD9694c
image load complete
good matches 1037 / 3000


## ISS oncology and immune

In [6]:
labels = ['d', 'a', 'c', 'l', 'm']
masks_svgs = [f'../submission/external_data/PD9694/regions/Exp_PD9694{label}_path.svg' for label in labels[:-1]] + [None]

exp_sample_list = []
for i in range(len(labels)):
    print(f'Exp_PD9694{labels[i]}')
    sample = Sample(iss_data=f'../submission/external_data/PD9694/GMM_decoding/decoding/Exp_PD9694{labels[i]}_GMMdecoding.csv',
                    image=f'../submission/external_data/PD9694/GMM_decoding/restored_DAPI/Exp_PD9694{labels[i]}.tif',
                    cell_data=f'../submission/external_data/PD9694/cell_segmentation/Exp_PD9694{labels[i]}_cellpos.csv',
                    masks_svg=masks_svgs[i])
    #map on the scaffold image (in case decoding image was shifted)
    sample.transform_points2background(f'../submission/external_data/PD9694/DAPI_background/Exp_PD9694{labels[i]}.tif', upsampling=15)
    exp_sample_list.append(sample)

Exp_PD9694d
image load complete
good matches 1871 / 3000
Exp_PD9694a
image load complete
good matches 2498 / 3000
Exp_PD9694c
image load complete
good matches 2659 / 3000
Exp_PD9694l
image load complete
good matches 2903 / 3000
Exp_PD9694m
image load complete
good matches 1385 / 3000


In [7]:
labels = ['m', 'm']
sections = ['Top', 'Bottom']
masks_svgs = [None,
              None]

sample_list = []
for i in range(len(labels)):
    print(f'Imm_PD9694{labels[i]}')
    sample = Sample(iss_data=f'../submission/external_data/PD9694/GMM_decoding/decoding/Imm_PD9694{labels[i]}_{sections[i]}_GMMdecoding.csv',
                    image=f'../submission/external_data/PD9694/GMM_decoding/restored_DAPI/Imm_PD9694{labels[i]}_{sections[i]}.tif',
                    cell_data=f'../submission/external_data/PD9694/cell_segmentation/Imm_PD9694{labels[i]}_cellpos.csv',
                    masks_svg=masks_svgs[i])
    #map on the scaffold full image
    sample.transform_points2background(f'../submission/external_data/PD9694/DAPI_background/Imm_PD9694{labels[i]}.tif', upsampling=15)
    sample_list.append(sample)
    
sample_list[0].add_gene_data(sample_list[1])
m_imm_sample = sample_list[0]

Imm_PD9694m
image load complete
good matches 1474 / 3000
Imm_PD9694m
image load complete
good matches 1016 / 3000


In [8]:
labels = ['d', 'a', 'c', 'l']
masks_svgs = [f'../submission/external_data/PD9694/regions/Exp_PD9694{label}_path.svg' for label in labels]

imm_sample_list = []
for i in range(len(labels)):
    print(f'Imm_PD9694{labels[i]}')
    sample = Sample(iss_data=f'../submission/external_data/PD9694/GMM_decoding/decoding/Imm_PD9694{labels[i]}_GMMdecoding.csv',
                    image=f'../submission/external_data/PD9694/GMM_decoding/restored_DAPI/Imm_PD9694{labels[i]}.tif',
                    cell_data=f'../submission/external_data/PD9694/cell_segmentation/Imm_PD9694{labels[i]}_cellpos.csv',
                    masks_svg=masks_svgs[i])
    #map on the scaffold image (in case decoding image was shifted)
    sample.transform_points2background(f'../submission/external_data/PD9694/DAPI_background/Imm_PD9694{labels[i]}.tif', upsampling=15)
    imm_sample_list.append(sample)


Imm_PD9694d
image load complete
good matches 1014 / 3000
Imm_PD9694a
image load complete
good matches 908 / 3000
Imm_PD9694c
image load complete
good matches 1218 / 3000
Imm_PD9694l
image load complete
good matches 2551 / 3000


In [9]:
imm_sample_list = imm_sample_list + [m_imm_sample]


Save all objects as a pickled dictionary, for easy access

In [10]:
#save all the data

saved_list = {'imm_sample_list':imm_sample_list, 'exp_sample_list':exp_sample_list, 'mut_sample_list':mut_sample_list, 'val_sample_list':val_sample_list}

with open('../submission/generated_data/data_structures/data_case1_saved.pkl', 'wb') as file:
    pkl.dump(saved_list, file)

# PD14780 (Case 2)
Case 2 includes two PBCs of ‘triple-negative’ subtype (lacking oestrogen, progesterone and HER2 receptors). We sampled both PBCs (TN1/TN2) and a metastatic axillary lymph node that contained metastatic deposits (sample LN1).

For all tissue samples, we have **main BaSISS** (mut), **ISS oncology** (exp) and **ISS expression** (imm). 

## BaSISS 

In [11]:
labels = ['a', 'd', 'e']
masks_svgs = ['../submission/external_data/PD14780/regions/Mut_PD14780a_path.svg',
              None,
              '../submission/external_data/PD14780/regions/Mut_PD14780e_path.svg']

mut_sample_list = []
for i in range(len(labels)):
    print(f'Mut_PD14780{labels[i]}')
    sample = Sample(iss_data=f'../submission/external_data/PD14780/GMM_decoding/decoding/Mut_PD14780{labels[i]}_GMMdecoding.csv',
                    image=f'../submission/external_data/PD14780/GMM_decoding/restored_DAPI/Mut_PD14780{labels[i]}.tif',
                    cell_data=f'../submission/external_data/PD14780/cell_segmentation/Mut_PD14780{labels[i]}_cellpos.csv',
                    masks_svg=masks_svgs[i])
    #map on the scaffold image (in case decoding image was shifted)
    sample.transform_points2background(f'../submission/external_data/PD14780/DAPI_background/Mut_PD14780{labels[i]}.tif', upsampling=15)
    mut_sample_list.append(sample)

Mut_PD14780a
image load complete
good matches 2552 / 3000
Mut_PD14780d
image load complete
good matches 2693 / 3000
Mut_PD14780e
image load complete
good matches 3000 / 3000


## ISS oncololgy and immune

In [12]:
labels = ['a', 'd', 'e']
masks_svgs = ['../submission/external_data/PD14780/regions/Exp_PD14780a_path.svg',
              None,
              '../submission/external_data/PD14780/regions/Exp_PD14780e_path.svg']
              

exp_sample_list = []
for i in range(len(labels)):
    print(f'Exp_PD14780{labels[i]}')
    sample = Sample(iss_data=f'../submission/external_data/PD14780/GMM_decoding/decoding/Exp_PD14780{labels[i]}_GMMdecoding.csv',
                    image=f'../submission/external_data/PD14780/GMM_decoding/restored_DAPI/Exp_PD14780{labels[i]}.tif',
                    cell_data=f'../submission/external_data/PD14780/cell_segmentation/Exp_PD14780{labels[i]}_cellpos.csv',
                    masks_svg=masks_svgs[i])
    #map on the scaffold image (in case decoding image was shifted)
    sample.transform_points2background(f'../submission/external_data/PD14780/DAPI_background/Exp_PD14780{labels[i]}.tif', upsampling=15)
    exp_sample_list.append(sample)


Exp_PD14780a
image load complete
good matches 2324 / 3000
Exp_PD14780d
image load complete
good matches 1670 / 3000
Exp_PD14780e
image load complete
good matches 1505 / 3000


In [13]:
labels = ['a', 'd', 'e']
masks_svgs = ['../submission/external_data/PD14780/regions/Imm_PD14780a_path.svg',
              None,
              '../submission/external_data/PD14780/regions/Imm_PD14780e_path.svg']
imm_sample_list = []
for i in range(len(labels)):
    print(f'Imm_PD14780{labels[i]}')
    sample = Sample(iss_data=f'../submission/external_data/PD14780/GMM_decoding/decoding/Imm_PD14780{labels[i]}_GMMdecoding.csv',
                    image=f'../submission/external_data/PD14780/GMM_decoding/restored_DAPI/Imm_PD14780{labels[i]}.tif',
                    cell_data=f'../submission/external_data/PD14780/cell_segmentation/Imm_PD14780{labels[i]}_cellpos.csv',
                    masks_svg=masks_svgs[i])
    #map on the scaffold image (in case decoding image was shifted)
    sample.transform_points2background(f'../submission/external_data/PD14780/DAPI_background/Imm_PD14780{labels[i]}.tif', upsampling=15)
    imm_sample_list.append(sample)


Imm_PD14780a
image load complete
good matches 2566 / 3000
Imm_PD14780d
image load complete
good matches 1635 / 3000
Imm_PD14780e
image load complete
good matches 2536 / 3000


Save all objects as a pickled dictionary, for easy access

In [14]:
saved_list = {'imm_sample_list':imm_sample_list, 'exp_sample_list':exp_sample_list, 'mut_sample_list':mut_sample_list}

with open('../submission/generated_data/data_structures/data_case2_saved.pkl', 'wb') as file:
    pkl.dump(saved_list, file)