# Batch training demonstration

#### Imports

In [1]:
# training automation
from simcat import BatchTrain
# defining the model
from keras.models import Sequential, load_model
from keras.layers import Dense

Using TensorFlow backend.


#### Define the parameters for training:

* Where is the data?
* What do want to name the outputs?
* What proportion do we want to split into training vs. testing?
* Do we want to exclude scenarios with introgression between sister taxa?
* Do we want to exclude scenarios where introgression is really low?
* Do we want to make a "zero" category that includes all remaining simulations with magnitude under some number?

In [2]:
tester = BatchTrain.BatchTrain(input_name='cleaned',
                    output_name='model_training',
                    directory='../../imb_8tip_20mil/merged/',
                    prop_training=0.9,
                    exclude_sisters=True,
                    exclude_magnitude=0.1,
                    to_zero_magnitude=None
                   )

77840 total simulations.
53249 total simulations compatible with parameters.
Data split into 47924 training and 5325 testing simulations.


Analysis reference file saved to ../../imb_8tip_20mil/merged/model_training.analysis.h5


#### An "analysis.h5" file has been saved as output. It contains indices for simulations in the training vs testing dataset, as well as some metadata about the training. 
#### A "onehot_dict.csv" file has also been saved, to convert between integer codes and the literal string labels.

### Define a neural network

In [3]:
# Neural network architecture defined with Keras tools
model = Sequential()
model.add(Dense(1000, input_dim=tester.input_shape, activation='relu'))
model.add(Dense(tester.num_classes, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

### Initialize the network model for the BatchTrain object -- which will also save the model as a file

In [4]:
tester.init_model(model)

New neural network saved to: ../../imb_8tip_20mil/merged/model_training.model.h5


### Now designate the batch size and the number of epochs, and train!

In [5]:
tester.train(batch_size=200,
             num_epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


#### The model is automatically saved to disk after each epoch.