# File format and folder structure

The folder structure for this project is designed so that there won't be any hassling with folders and paths when using these codes. After downloading the data files from the given github links all that is needed is to place the .zip files or extract the contents of the .zip files to the right folders inside the `datasets/` folder of the repository.

Images can be of any format cv2 can read and the masks are in the same format as in the the HoVer-Net paper which is:

- .mat files
- 'inst_map' is the key for accessing the nuclei instance maps
- 'type_map' is the key for accessing the nuclei type maps if the dataset contains them. 
- For now consep and pannuke has type maps so 'type_map' so 'type_map' key only works on them.
- Only exception to the HoVer-Net paper is that nuclei centroids and instance types are not saved to the .mat files

By running these, also the instance and type map overlays can be drawn to theis corresponding folders for further visual investigations...

In [1]:
import sys
sys.path.append("..")
sys.path.append("../utils")
from file_manager import *
from config import *

In [2]:
fm = ProjectFileManager.from_conf(conf)

# Kumar

This moves the raw data to it's own folder and and writes .mat masks and copies images to their own folders

In [4]:
kumar_raw_data_dir = conf['paths']['raw_data_dirs']['kumar']
fm.handle_raw_data("kumar", kumar_raw_data_dir, rm_zips=False, overlays=True)

# Consep

This moves the raw original data to it's own folder and copies the .mat masks and images to right folders

In [3]:
consep_raw_data_dir = conf['paths']['raw_data_dirs']['consep']
fm.handle_raw_data("consep", consep_raw_data_dir, rm_zips=False, overlays=True)

# Pannuke

This converts the .npy files to png and .mat files and moves them to their corresponding folders
- Takes a couple of mins

In [3]:
pannuke_raw_data_dir = conf['paths']['raw_data_dirs']['pannuke']
fm.handle_raw_data("pannuke", pannuke_raw_data_dir, rm_zips=False, overlays=True)

In [3]:
dd = fm.data_dirs["pannuke"]

In [15]:
from omegaconf.dictconfig import DictConfig
print(type(dd))
isinstance(dd, DictConfig)

omegaconf.dictconfig.DictConfig


True

In [4]:
OmegaConf.to_container(dd)

{'train_im': '../../datasets/pannuke/train/images',
 'train_gt': '../../datasets/pannuke/train/labels',
 'test_im': '../../datasets/pannuke/test/images',
 'test_gt': '../../datasets/pannuke/test/labels'}

In [19]:
g = set(["Adrenal_gland", "Breast", "Breast"])
e = set(['fold1', 'fold3', "fold3"])
r = "both"
# list(map(lambda x : f"{g}_{x}", ['fold1', 'fold2', 'fold3']))
print(sorted(list(e)))
print(sorted(list(g)))

['fold1', 'fold3']
['Adrenal_gland', 'Breast']


In [22]:
if r == "img":
    wc2 = ".png"
elif r == "mask":
    wc2 = ".mat"
else:
    wc2 = ""

g = sorted(list(g))
e = sorted(list(e))
    
paths = []
for d in dict(fm.data_dirs["pannuke"]).values():
    for tissue in g:
        tf = list(map(lambda fold : f"{tissue}_{fold}", e))
        for wc in tf:
            for f in sorted(Path(d).glob(f"*{wc}*{wc2}")):
                paths.append(f)
                
                
                    
paths = sorted(paths)
path_dict = {}
if r == "both":
    imgs = [path for path in paths if path.suffix == ".png"]
    masks = [path for path in paths if path.suffix == ".mat"]
    path_dict["img"] = imgs
    path_dict["mask"] = masks
else:
    path_dict[r] = paths
    
path_dict['mask']

PosixPath('../../datasets/pannuke/train/labels/Breast_fold1_99.mat')

In [5]:
fm.get_pannuke_fold(['fold3', 'fold2'], ['Ovarian'], "both")

{'img': [PosixPath('../../datasets/pannuke/test/images/Ovarian_fold3_0.png'),
  PosixPath('../../datasets/pannuke/test/images/Ovarian_fold3_1.png'),
  PosixPath('../../datasets/pannuke/test/images/Ovarian_fold3_10.png'),
  PosixPath('../../datasets/pannuke/test/images/Ovarian_fold3_11.png'),
  PosixPath('../../datasets/pannuke/test/images/Ovarian_fold3_12.png'),
  PosixPath('../../datasets/pannuke/test/images/Ovarian_fold3_13.png'),
  PosixPath('../../datasets/pannuke/test/images/Ovarian_fold3_14.png'),
  PosixPath('../../datasets/pannuke/test/images/Ovarian_fold3_15.png'),
  PosixPath('../../datasets/pannuke/test/images/Ovarian_fold3_16.png'),
  PosixPath('../../datasets/pannuke/test/images/Ovarian_fold3_17.png'),
  PosixPath('../../datasets/pannuke/test/images/Ovarian_fold3_18.png'),
  PosixPath('../../datasets/pannuke/test/images/Ovarian_fold3_19.png'),
  PosixPath('../../datasets/pannuke/test/images/Ovarian_fold3_2.png'),
  PosixPath('../../datasets/pannuke/test/images/Ovarian_fold