# Bellastore Demo

The purpose of this demo is to showcase the main functionality of this package.

The main purpose of `bellastore` is to organize whole slide image (WSI) scans both on a filesystem as well as on a database level.
Therefore, `bellastore` creates and manages an ingress and a storage database and moves valid WSI files from ingress to storage within the file system and records this within the databases.

This ultimately leads to a comprehensive storage databse that can efficiently be queried in order to retrieve files from a storage containing possibly tens of thousands of WSIs.

On the other hand the ingress database serves the purpose of recording the origin of the files. Often clinical labs encode valuable information in folder and file names which are directly recorded in the ingress database. Thus, even after moving and renaming at the storage level all the original source metadata is tracked.\\
Furthermore the ingress serves the purpose of only allowing slides to the storage that are not already tracked. It still records a new possible duplicate in the ingress (as it could hold valuable metadata), but the WSI will not procedd to the storage. WSI identity is checked by hashing the file to moved which is the main time consuming point within the pipeline.

The beauty of `bellastore` is that the storage database allows for being extended in the spirit of relational databses with all sorts of metadata, like patient identifiers, clinical grades, cohort identifiers, etc.

The philosophy of the `bellastore` backend is that it keeps track of the file system via the `Fs` class and keeps this in sync with the databases by the inheriting `Db` class.

This demo showcases the workflow of `bellastore` by encapsulating the main integration test `tests/test_db_fs.py::test_classic`. 

## Setting up a mock ingress and storage

First of all we need to mock the state of the file system and the databases.

In the main usecase of `bellastore` we are in the following scenario:
- there are scans already recorded both in the ingress and storage databse
- the respective scans are present in the storage file system

Now the we imagine new scans arriving from the clinic, which we need to integrate in the already existing file sysetm and databases

In [2]:
import os
from os.path import join as _j
from pathlib import Path
from typing import List
import sqlite3
from tempfile import TemporaryDirectory, TemporaryFile

from bellastore.utils.scan import Scan
from bellastore.database.db import Db

In [None]:
def create_scans(path: Path, amount=4) -> List[Scan]:
    '''
    Mocks scans on a specified path.

    The mock scans are just txt files containing content unique for each scan.
    However they carry the file ending .ndpi 

    Parameters
    -----------
    path : Path
        The shared directory holding the mocked scans
    amount : int
        The amount of scans to be created
    
    Returns
    --------
    List[Scans]
        The list of created scans
    '''
    scans = []
    for i in range(amount):
        p = path / f"scan_{i}.ndpi"
        p.write_text(f"Content of scan_{i}.ndpi", encoding="utf-8")
        scan = Scan(str(p))
        scans.append(scan)
    return scans

In [None]:
# the root of the fs holding both storage and ingress
root_dir = TemporaryDirectory().name

# create scans in ingress
ingress_dir = Path(root_dir) / "new_scans"
os.makedirs(ingress_dir)
ingress_scans = create_scans(path=ingress_dir, amount=4)

In [19]:
# initialize the database holding ingress and storage table
db = Db(root_dir=root_dir, ingress_dir=ingress_dir, filename='scans.sqlite')
str(db)



├── backup
├── new_scans
│   ├── scan_0.ndpi
│   ├── scan_1.ndpi
│   ├── scan_2.ndpi
│   └── scan_3.ndpi
└── storage
    └── scans.sqlite


'Ingress DB:\n Empty DataFrame\nColumns: [hash, filepath, filename]\nIndex: []\n Storage DB:\n Empty DataFrame\nColumns: [hash, filepath, filename, scanname]\nIndex: []\n'

The `Db` class holds now all information of both the database as well as the file system.
- there are 4 scans in the ingress (`new_scans`)
- the storage contains only the database file `scans.sqlite`
- the database holds two empty tables `Ingress` and `Storage`

Now it is time to insert the scans from the ingress into the storage.

Note that this is a complex process, requiring the following actions
- we need to check if the file in the ingress is a valid scan
    - if not it will stay in the ingress as it is
- hash the scan in order to make it comparable to existing scans
- compare the hash to the already recorded hashes in the **ingress table**
    - if there is an entry with identical hash, path and sanname, file is removed from the fs
- compare hash to the the already recorded hashes in the **storage table**
    - if hash is already in storage table, record scan only in ingress table and then remove file from the fs
- add scan to the **storage databse**
    - move file into the sotrage directory
    - record scan in the storage table