# Overview of Client

The `Client()` object as part of the `dl_train.client`
module provides a simple yet powerful interface for loading and preprocessing
data in the context of neural network training. With a properly formatted
`*.csv` file referencing data location, a `Client()` object can be initialized
with nothing more than a simple `*.yml` configuration file. In this notebook we
will explore the following functionality:

**Configuration**

* `*.csv` file
format
* `*.yml` file settings 
* rates
* specs
* split

**Loading Data**

* via
`self.get(...)` method
* via `self.generator(...)` method

**Testing**

* via
`self.test(...)` method

# Set Up

Let us first set up the required imports and
dependencies:

In [None]:
from dl_train.client import Client

# Configuration

A fully functional `Client()` object can be instantiated with a
`*.yml` file containing the necessary configurations:

In [None]:
# --- Instantiate a new Client
client = Client('./client.yml')

The following shows the contents of the `./client.yml` file in the current directory:

```yml
_db: ../../../../dl_utils/data/bet/ymls/db.yml
rates:
  sampling:
    fg: 0.5
    bg: 0.5 
  training:
    train: 0.8
    valid: 0.2
specs:
  batch: 16
  xs:
    dat:
      dtype: float32
      loads: dat
      norms:
        clip: 
          min: 0
          max: 256
        shift: 64
        scale: 64
      shape:
      - 1
      - 512
      - 512
      - 1
  ys:
    bet:
      dtype: uint8
      loads: bet 
      norms: null
      shape:
      - 1
      - 512
      - 512
      - 1
split: 
  fold: -1
  cohorts:
    fg: fg
    bg: bg 
```

Let us walk through this configuration file in detail.

## Database

The prepared database file path is set as `_db` in the first line of the configuration file; it may be references as either a `*.csv` for `*.yml` file. See tutorial on generic database objects for further additional information.

For the `Client()` object, a special database is required whereby **each row** in the `*.csv` file represents a unique signature for every training **example** in the dataset. For 3D datasets, this may yield a new row for every slice or slab in the entire volume (if the network is designed to train on a slice- or slab- basis). 

This schematic shows a representative `*.csv` file:

```
             fname-dat       fname-lbl       coord    bg      fg        mu      sd
             
patient_00   /path/to/file   /path/to/file   0.0      True    False     100.0   20.0
patient_00   /path/to/file   /path/to/file   0.1      True    False     100.0   20.0
patient_00   /path/to/file   /path/to/file   0.2      True    False     100.0   20.0
patient_00   /path/to/file   /path/to/file   0.3      False   True      100.0   20.0
patient_00   /path/to/file   /path/to/file   0.4      False   True      100.0   20.0
patient_00   /path/to/file   /path/to/file   0.5      False   True      100.0   20.0
...
patient_00   /path/to/file   /path/to/file   1.0      True    False     100.0   20.0
patient_01   /path/to/file   /path/to/file   0.0      True    False     150.0   25.0
...

```

This format is consistent with the standard database object format (e.g. all columns containing filenames must be prefixed with `fname-`, the row indices represent patient studyids, etc). In addition, the following three types of header columns should be considered as needed:

1. If you are using a 3D individual slice- or slab-based network, you will need to provide a column name `coord` that contains the **normalized** coordinate between `[0, 1]` for that slice- or slab- of data. 

2. If you are using any stratified sampling technique, you will need to create columns containing a boolean vector the corresponds to each individual cohort. In the example above, we have two cohorts (`bg` and `fg` which correspond to slices containing background and foreground mask values, respectively) for which we plan to sample from at balanced 50/50% distribution. Keep in mind that the defined cohorts **do not** need to be either mutually exclusive or inclusive of the entire dataset---they simply need to correspond to cohorts for which you plan to implement stratified sampling.  

3. If you are using parameters for image preprocessing that cannot be dynamically inferred during data loading (e.g. the mean of the entire 3D volume when loading data in a slice-by-slice manner), you will need to create column(s) that contain the required information. Common  

# Tear Down