# 1. Generate Test Data

Generate a dataset for using in subequent tests. Save the data.
This is more efficient than creating the data each time and also ensures that the same data is used accross multiple tests.

In [1]:
import sys
sys.path.append('../src')
import numpy as np
import os
from pathlib import Path
from utils.mnist_reader import get_and_save_train_test_dataset

## Specify the data requirements

Specify the problem space for this test, the appropriate training and test data will be generated.

The problem space describes the data that will be generated. 
* `dataset`: The name of the openml dataset to use. For example, `mnist_784` or `fashion-MNIST`. The number of features `n` in **Table 1** in the paper is derived from the data.
* `precision_required`: The number of possible discrete values excluding zero that can be assigned to a feature. This is `d-1`in **Table 1** in the paper.
* `trains_per_class`: The number of training examples from each class.
* `tests_per_class`: The number of test examples from each class.
* `trains_in_test_set`: Include the training examples in the test data. Usually this should be false, but it is useful for checking for overfitting.
* `training_labels`: Set to `None` to include data examples across all the labels during training. To just train a subset of networks, provide the labels as a list. For example: `['0','8']`.
* `testing_labels`: Set to `None` to include data examples across all the labels during test. To just test a subset of networks, provide the labels as a list. For example: `['0','8']`.
* `shuffle`: Set to `False` for class incremental learning. Set to `True` to shuffle the training data. Note that this makes no difference to the test results as the subnetworks learn independently.
* `use_edge_detection`: Set to `True` to incorporate a Prewitt edge detection step into the pre-processing. **Note that this was not explored in the paper.** 

In [2]:
data_params = {'dataset': 'mnist_784',  # mnist_784
               'trains_per_class': 50,  # 5000,
               'tests_per_class': 1222,  # 1000,
               'trains_in_test_set': False,
               'training_labels': None,  # ['1', '8'], #None, # ['0', '2', '3', '4', '5', '6', '7', '8', '9'],
               'testing_labels': None,
               'precision_required': 7,
               'shuffle': False,
               'use_edge_detection': False}



## Define where the data should be stored.

The example below simply defines a file based on the number of test and training examples. Further granularity may be required if you are experimenting with varying other aspects of the data parameters.

If the specified directory already exists, the data saving will fail. This is to prevent overwriting of previously generated data.

In [3]:
data_root_dir = '../datasets'
data_sub_dir = 'split_' + str(data_params['dataset'])+'_' +str(data_params['trains_per_class'])+'_'+str(data_params['tests_per_class'])

## Generate and save the data

In [4]:
x_train, y_train, x_test, y_test = get_and_save_train_test_dataset(data_root_dir = data_root_dir, data_sub_dir = data_sub_dir, data_params=data_params)

Looking for previously acquired mnist_784 dataset in the folder ../datasets/mnist_784 (Current working dir is /Users/katy/Code/neurogen_classifier/examples)
Reading MNIST data from ../datasets/mnist_784/x.npy and ../datasets/mnist_784/y.npy  (Current working dir is /Users/katy/Code/neurogen_classifier/examples)
... precision reduced.  to 7 ...
mnist_784 data has been loaded. Preparing the training and test examples. ...test and train data has been prepared.
Saving train and test examples to directory ../datasets/split_mnist_784_50_1222. Current directory is /Users/katy/Code/neurogen_classifier/examples.
Directory ../datasets/split_mnist_784_50_1222 does not exist. Creating it.
Data examples and data description saved in directory ../datasets/split_mnist_784_50_1222.


## Using the data

To reload the data use:
```
full_save_dir = os.path.join(data_root_dir, data_sub_dir)
utils.mnist_reader.load_train_test_dataset(full_save_dir)
```

