****
# links
[Kaggle Challenge - Getting Started](https://www.kaggle.com/c/rsna-pneumonia-detection-challenge#Getting%20Started) <br>
[md ai GitHub](https://github.com/mdai/ml-lessons) <br>
[md ai My projects](https://public.md.ai/hub/projects/user) <br>
[Google Colab kaggle](https://colab.research.google.com/github/mdai/ml-lessons/blob/master/lesson3-rsna-pneumonia-detection-kaggle.ipynb) <br>
[darknet](https://github.com/pjreddie/darknet) <br>
[darknet train classifier from scratch](https://pjreddie.com/darknet/train-cifar/) <br>
****
## DarkNet, You Only Look Once YOLO & python wrappers

[YOLO](https://pjreddie.com/darknet/yolo/) <br>
[YOLO - python](https://github.com/madhawav/YOLO3-4-Py) <br>
[YOLO - py - docker](https://github.com/madhawav/YOLO3-4-Py/tree/master/docker) <br>
[darknetpy pypi](https://pypi.org/project/darknetpy/) <br>
[darknetpy GitHub](https://github.com/danielgatis/darknetpy) <br>
[lightnet GitHub](https://github.com/explosion/lightnet) <br>
****
## DarkNet Training Example:
[darknet train from scratch](https://pjreddie.com/darknet/train-cifar/) <br>
```bash
# cfar.data:
classes=10
train  = data/cifar/train.list
valid  = data/cifar/test.list
labels = data/cifar/labels.txt
backup = backup/
top=2
```
    * files:
        * cfg/cfar.data == location of backup dir and labels, train and test data list files.
        * cfg/cifar_small.cfg == the configuration of the network (in full detail)
    * get the training list and the test list in files
****
```bash
cd cifar
find `pwd`/train -name \*.png > train.list
find `pwd`/test -name \*.png > test.list
cd ../..
```
****
    * Train the network (or restart training)
```bash
../../darknet/darknet classifier train cfg/kaggle.data cfg/kaggle.cfg
# ../../darknet/darknet classifier train cfg/kaggle.data cfg/kaggle.cfg ../data/backup/kaggle_x.weights
# Use last completed number  x id. est. kaggle_7.weights
```
****
    * Validate the model
```bash
darknet classifier valid cfg/cifar.data cfg/cifar_small.cfg backup/cifar_small.backup
```

In [1]:
%matplotlib inline
import os
import sys
import random
from shutil import copyfile
import glob

import pylab
import pandas as pd
import pydicom
import numpy as np

from PIL import Image

# '../../src/dcm_wrangler.py'
sys.path.insert(1, '../src/')
import kaggle_wrangler as kgwr

kaggle_data_dir = '../data/all'
train_data_dir = os.path.join(kaggle_data_dir, 'stage_1_train_images')
test_data_dir = os.path.join(kaggle_data_dir, 'stage_1_test_images')
preprocessed_images_dir = '../data/train_data_selected'

## Sample a training and test set from the resized images

In [2]:
#                                    Define - designate
test_set_fraction = 0.1
config_dir = 'cfg'
train_data_dir = '../data/kaggle_train'
test_data_dir = '../data/kaggle_test'

#                                    Locate
all_files_list = os.listdir(preprocessed_images_dir)

number_of_samples = len(all_files_list)
test_set_size = int(number_of_samples * test_set_fraction)
train_set_size = number_of_samples - test_set_size
print('%40s: %9i'%('total number of files', number_of_samples))
print('%40s: %9i'%('train set files', train_set_size))
print('%40s: %9i'%('test set files', test_set_size))

#                                    Sample
train_set = random.sample(all_files_list, train_set_size)
test_set = list(set(all_files_list) - set(train_set))
print('\n%40s: %9s'%('type', 'size'))
print('%40s: %9i'%(type(train_set), len(train_set)))
print('%40s: %9i'%(type(test_set), len(test_set)))

                   total number of files:     30980
                         train set files:     27882
                          test set files:      3098

                                    type:      size
                          <class 'list'>:     27882
                          <class 'list'>:      3098


In [3]:
#                                    Move train set
train_files_name = os.path.join(config_dir, 'train.list')
test_files_name = os.path.join(config_dir, 'test.list')

train_files_list = []
test_files_list = []

for train_file in train_set:
    src_file_name = os.path.join(preprocessed_images_dir, train_file)
    dest_file_name = os.path.join(train_data_dir, train_file)
    copyfile(src_file_name, dest_file_name)
    train_files_list.append(dest_file_name)


with open(train_files_name, 'w') as f:
    for item in train_files_list:
        f.write("%s\n" % item)
    print(train_files_name, 'written from %i'%(len(train_files_list)), 'items in train_files_list' )


#                                    Move test set
for test_file in test_set:
    src_file_name = os.path.join(preprocessed_images_dir, test_file)
    dest_file_name = os.path.join(test_data_dir, test_file)
    copyfile(src_file_name, dest_file_name)
    test_files_list.append(dest_file_name)

with open(test_files_name, 'w') as f:
    for item in test_files_list:
        f.write("%s\n" % item)
    print(test_files_name, 'written from %i'%(len(test_files_list)), 'items in test_files_list' )

cfg/train.list written from 27882 items in train_files_list
cfg/test.list written from 3098 items in test_files_list


### Write the config files and train the network

In [8]:
os.path.isfile('../../darknet/darknet')

True

In [7]:
%%writefile cfg/labels.txt
Normal_Negative
NoLuOpNotNorm_Negative
LuOp_Positive

Overwriting cfg/labels.txt


In [6]:
%%writefile cfg/kaggle.data
classes=3
train  = cfg/train.list
valid  = cfg/test.list
labels = cfg/labels.txt
backup = ../data/backup/
top=2

Overwriting cfg/kaggle.data


```bash

# initial start:
../../darknet/darknet classifier train cfg/kaggle.data cfg/kaggle.cfg

# for a restart:
../../darknet/darknet classifier train cfg/kaggle.data cfg/kaggle.cfg ../data/backup/kaggle_x.weights
```
#### modified config files for kaggle data (grayscale):
    * cfg/kaggle.data
        * classes = 3                # (instead of 10)
        * backup  = ../data/backup/  # (after creating backup dir)
    * cfg/kaggle.cfg  removed:
        * hue=.1
        * saturation=.75
        * exposure=.75


```bash
[net]
batch=128
subdivisions=1
height=28
width=28
channels=1
max_crop=32
min_crop=32

learning_rate=0.1
policy=poly
power=4
max_batches = 5000
momentum=0.9
decay=0.0005
```

In [12]:
%%writefile cfg/kaggle.cfg
[net]
batch=128
subdivisions=1
height=28
width=28
channels=3
max_crop=32
min_crop=32

learning_rate=0.1
policy=poly
power=4
max_batches = 5000
momentum=0.9
decay=0.0005

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[dropout]
probability=.5

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[dropout]
probability=.5

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[dropout]
probability=.5

[convolutional]
filters=3
size=1
stride=1
pad=1
activation=leaky

[avgpool]

[softmax]
groups=1


Overwriting cfg/kaggle.cfg


## pastable clde for writing labels file
```python
#                                       get the labels, write the labels file
def get_underscore_labels(data_dir):
    """ labels_list = get_underscore_labels(data_dir) """
    files_list = os.listdir(data_dir)
    labels_list = []
    for name in files_list:
        s2 = name.split('.')[0].split('_')
        new_name = s2[-2] + '_' + s2[-1]
        if not new_name in labels_list:
            labels_list.append(new_name)
            
    return labels_list
    
labels_list = get_underscore_labels(preprocessed_images_dir)
for label in labels_list:
    print(label)
    
config_dir = 'cfg'
labels_file_name = os.path.join(config_dir, 'labels.txt')
if os.path.isdir(config_dir) and not os.path.isfile(labels_file_name):
    with open(labels_file_name, 'w') as f:
        for item in labels_list:
            f.write("%s\n" % item)
else:
    print('Not writing:', labels_file_name)
```