<p>
     <img src="https://raw.githubusercontent.com/hhelmbre/Rockstar-Lifestyle/master/doc/Logo.png" width="19%" align="left">
 
</p>

---

# Neural Network Using Python
---

### The folling jupyter notebook will use neural networks in order to get quantifiable data for given images. It uses the neuralnet.py python file and imcrop.py image manipulation python file

---

The following functions will be used:

* neuralnet( dataset = [], NN_settings = None, train = False, save_settings = True, print_acc = False)
    * Description: This function is a wrapper function that will use a neural network in order to best estimate the protein count of the input images. The solver used in this neural network is L-BFGS which is a quasi-Newton method that is theoretically and experimentally verified to have a faster convergence and works well with low-dimensional input. L-BFGS does not work well in other settings because of its high memory requirement and lack of use of minibatches. There are other methods that can be tried (Adam, SGD BatchNorm , Adadelta, Adagrad), but L-BFGS was the best choice for this application and for the time constraint given in producing this package.
    * Parameters: 
    
        - dataset: dataset to use neural network on
        - NN_settings: classifier data
        - train: decide on whether to train or test
        - save_settings: decide on whether to save classifier data to 'data' folder
        - print_acc: passes bool to accuracy() to tell function to print accuracy
    
    * Output:
        - count: The counts of the input dataset
    
    
* accuracy( test_x, test_y, classifier, output = False)
    * Description: This function checks the accuracy of the neural network by sending testing data through the classifier and outputing the accuracy
    * Parameters: 
    
        - test_x: testing data
        - test_y: answered data
        - classifier: classifier object set by MLPClassifier
        - output: boolean operator to determine whether to output info or not

    * Output:
        - acc: accuracy of the neural network
    
    
* create_train_set(numb_sets=None, prev_set=None, bin_list=None, gauss_blur_list=None)
    * Description: This function creates a training set of n datasets. If a previous set is provided, the function will add to the previous set and output a backup set along with the changed set.
    
     *Note: This function is weak, the image formation is not up to par and will be changed in the future to provide a better training set for the neural network. Right now, the training sets output only protein counts between 5600 and 5800 proteins which will not train the neural network properly.
     
    * Parameters: 
    
        - numb_sets: number of training sets to create
        - prev_set: list of training objs that usr might have
        - bin_list: list of bins to use in MRH calculations
        - gauss_blur_list: list of gauss blur vars to use
    
    * Output:
        - prev_set: list of training data
        - backup: list of backup training data in case new set is unacceptable

* load_objects(name='untitled.dat')
    * Description: This function will load an object-based dataset using dill which is a version of pickle (a function that stores large amounts of object data) into a subfolder
    * Parameters: 
    
        - name: name of filename to use, default = untitled.dat
    
    * Output:
        - output: list of objects

* _loadall(dir_par_path, name)
    * Description: This function function is a private function for load_objects that will load a generator for the file name in the given path
    * Parameters: 
    
        - dir_par_path: folder path of the given file
    
    * Output:
        - generator: generator of the loaded file
          or
          [] if directory doesn't exist

* save_objects(dataset, name='untitled.dat')
    * Description: This function will save an object set using dill which is a version of pickle (a function that stores large amounts of object data) into a subfolder named 'data'.
    
     *Note: If only saving a single object, put in a [list] of len 1 i.e. save_objects([obj], name = 'file_name')
     
    * Parameters: 
    
        - dataset: array of objects to save
        - name: name of file pickle will be saved to

The following objects will be used:

* TestObj()
    * Description: Object for test and training values
    * Contains:
        - res : image resolution
        - data : image data
        - MRH : Multi Resolution Histogram data for image
        - GB : Gaussian blur data for image
        - bin_list : bin list for corresponding MRH
        - heights : heights from MRH difference
        - count : number of objects in image
    
* NNSettings()
    * Description: Stores settings for neural network, mostly used for sgd; will be used more for future projects
    * Contains:
        - classifier : classifier object
        - learn_rate : learning rate

##### Before starting on the example, there is one thing to note: pickling is a dangerous method to use, the data used in this package is data which is simply compressed into ASCII text via pickling. This makes it so that anyone can adjust these parameters and access python os commands maliciously. For this demo,we will use pickling, but in the future we will look into other solutions for storing large sets of data.

In addition to this, it is useful to note that these files are very large and people who wish to contribute or manipulate files will need to download and use [Git Large File Storage](https://git-lfs.github.com/). This is a pretty easy extension to use:
1. Download the extension from the link provided
2. Go to desired master branch
3. Execute the following commands:

    i. Set up git lfs
    ```console
    git lfs install 
    ```
    ii. Select type of file to track
    ```console
    git lfs track "*.dat"
    ```
    iii. Make sure .gitattributes is tracked
    ```console
    git add .gitattributes 
    ```

And thats it, git will automatically use lfs on all .dat files

---

**Before going through this example, please use GLFS to download all files located in the /data/ folder of this repository into where your package is installed. For example, mine is located in:**

**C:/Users/David/Miniconda3/lib/site-packages/rockstarlifestyle/data/**

Future iterations wont need to do this, but the data files to run the neural network are large and take a very long time to produce. This example shows how to make a whole new training set from scratch, but it's not recommended right now.

---


The packages we will be using in this example:

In [1]:
import warnings
from rockstarlifestyle import neuralnet as nn
from rockstarlifestyle import imcrop as iC
import skimage.io as sio

To start, we need data for our NN, so we will run through the imcrop example which can be found in imcrop_demo.ipynb

In [2]:
#adjust these depending on your img location + name:
file = 'P10_PAM_ipsi_40x_hippo_scan_MaxIP'  
filetype = '.png' 
path = r"C:/Users/David/Rockstar-Lifestyle/Images/" 
res_x = 256
res_y = 256
im_path = path + file+ filetype
timpng = sio.imread(im_path)
img_stack = iC.img_crop(path, file, filetype, (res_x,res_y))
obj_stack = iC.stackmrh(img_stack)

Great, now we have a stack of objects with all the data we need. We should next save this data as a pickle. This function will also create a folder named /data/ in your packages folder if you dont already have it. The code will automatically rename your file if you name it improperly.

In [3]:
nn.save_objects(obj_stack, name = 'stacked_img_test')

  "renaming to '{}'.".format(name))


saving to: C:/Users/David/Miniconda3/lib/site-packages/rockstarlifestyle/data/


Testing to see if we saved correctly:

In [4]:
objects = nn.load_objects(name = 'stacked_img_test.dat')

In [5]:
objects[10]

<rockstarlifestyle.imcrop.ImgID at 0x2b1022f89e8>

Great! Now that we know how to save, lets load some files.

In [6]:
training_set = nn.load_objects('training_set.dat')

In [7]:
print(training_set[0])
print(len(training_set))
print(training_set[0].data)

<__main__.test_obj object at 0x000002B107A819E8>
201
[[  0.   0.   0. ...   0.   0.   0.]
 [  0.   0.   0. ...   0.   0.   0.]
 [  0.   0.   0. ...   0.   0.   0.]
 ...
 [  0.   0.   0. ...   0.   0.   0.]
 [  0.   0.   0. ...   0.   0.   0.]
 [  0.   0.   0. ... 255.   0. 255.]]


Our training set is already pretty large, but lets add one more dataset to our training dataset as an example. We will create a new training set and also a backup in case our training set doesn't work.

In [8]:
training_set, backup = nn.create_train_set(numb_sets=1, prev_set=training_set)

Wowee, that sure did take forever! Good thing we already have a set saved. Future work on this project will help to alleviate this wait time and also get better data for better training sets.

In [9]:
print(training_set[0])
print(len(training_set))
print(training_set[0].data)

<__main__.test_obj object at 0x000002B107A819E8>
202
[[  0.   0.   0. ...   0.   0.   0.]
 [  0.   0.   0. ...   0.   0.   0.]
 [  0.   0.   0. ...   0.   0.   0.]
 ...
 [  0.   0.   0. ...   0.   0.   0.]
 [  0.   0.   0. ...   0.   0.   0.]
 [  0.   0.   0. ... 255.   0. 255.]]


As shown, we have 1 extra dataset in our training array. Lets go ahead and save this for future work.

In [10]:
nn.save_objects(training_set, 'training_set')

saving to: C:/Users/David/Miniconda3/lib/site-packages/rockstarlifestyle/data/


  "renaming to '{}'.".format(name))


In [26]:
training = training_set[0:int((len(training_set)*.8))]
testing = training_set[int((len(training_set)*.8)):]

We can now create our neural network. To do this, we'll use the neuralnet() function which can both train and test datasets depending on the given user input

For training, this function will train a classifier network using lbfgs which is a quasi-newton method using a modified hessian. Based on research, even though it takes a lot of memory to use, it's a good choice for low dimmensional data like ours. It is also MUCH easier to work with than the more popular sgd method which requires the user to tune for loss function convergence. Future work will look more into these different methods, but due to project time constraints, we are going with lbfgs. We've set save_settings == True which means a classifier object containing the neural network info will be saved into our data folder as 'classifier_info.dat' for future use and modification.

In [27]:
nn.neuralnet(training, train = True, save_settings= True, print_acc = False)

saving to: C:/Users/David/Miniconda3/lib/site-packages/rockstarlifestyle/data/
Training complete


Lets load it up:

In [28]:
classifier = nn.load_objects('classifier_info.dat')

This outputs an MLPClassifier object as a list, lets make it a single object

In [29]:
classifier[0]

MLPClassifier(activation='tanh', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(100,), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=None,
       shuffle=True, solver='lbfgs', tol=0.0001, validation_fraction=0.1,
       verbose=False, warm_start=False)

And now we can test it out

In [31]:
nn.neuralnet(testing, nn_settings=classifier[0], train = False, save_settings=False)

[array([5680, 5685, 5648, 5667, 5704, 5684, 5705, 5671, 5684, 5704, 5671,
        5698, 5684, 5672, 5709, 5704, 5694, 5684, 5685, 5684, 5713, 5685,
        5672, 5671, 5685, 5694, 5713, 5684, 5698, 5704, 5696, 5694, 5684,
        5709, 5707, 5675, 5672, 5680, 5705, 5705, 5713, 5684, 5694, 5658,
        5694, 5694, 5675, 5671, 5684, 5684, 5671, 5695, 5678, 5705, 5694,
        5713, 5713, 5648, 5671, 5667, 5657, 5696, 5694, 5690, 5703, 5684,
        5668, 5694, 5672, 5678, 5709, 5676, 5672, 5648, 5705, 5683, 5707,
        5684, 5689, 5684, 5687, 5684, 5684, 5696, 5684, 5685, 5648, 5694,
        5694, 5672, 5685, 5705, 5704, 5694, 5691, 5696, 5705, 5708, 5675,
        5641, 5666, 5668, 5685, 5707, 5694, 5709, 5675, 5713, 5673, 5713,
        5698, 5676, 5667, 5648, 5709, 5680, 5713, 5690, 5688, 5661, 5678,
        5689, 5668, 5696, 5697, 5698, 5675, 5681, 5694, 5704, 5717, 5684,
        5671, 5678, 5680, 5713, 5671, 5648, 5648, 5689, 5698, 5708, 5706,
        5684, 5648, 5684, 5705, 5648, 

Great! So it seems to work well for the generated data, but how well does it work for real data?

In [32]:
nn.neuralnet(obj_stack, nn_settings=classifier[0], train = False, save_settings=False)

[array([5685, 5709, 5709, 5658, 5691, 5694, 5658, 5689, 5689, 5678, 5678,
        5662, 5698, 5689, 5658, 5662, 5658, 5662, 5662, 5662, 5713, 5662,
        5708, 5662, 5662, 5708, 5709, 5685, 5708, 5698, 5711, 5709, 5658,
        5658, 5658, 5717, 5709, 5709, 5662, 5685, 5658, 5658, 5661, 5708,
        5662, 5689, 5708, 5667, 5662, 5662, 5689, 5689, 5658, 5658, 5662,
        5662, 5689, 5689, 5708, 5678, 5678, 5691, 5689, 5685, 5658, 5662,
        5689, 5685, 5662, 5661, 5661, 5685, 5708, 5698, 5689, 5689, 5708,
        5689, 5704, 5658, 5658, 5658, 5689, 5658, 5661, 5658, 5708, 5668,
        5658, 5661, 5662, 5658, 5685, 5689, 5708, 5662, 5662, 5661, 5689,
        5706, 5717, 5662, 5704, 5663, 5662, 5662, 5662, 5654, 5662, 5694,
        5694, 5713, 5668, 5708, 5708, 5658, 5678, 5668, 5662, 5668, 5708,
        5694, 5663, 5708, 5708, 5668, 5713, 5694, 5708, 5713, 5709, 5678,
        5709, 5662, 5662, 5713, 5662, 5662, 5678, 5689, 5662, 5711, 5709,
        5662, 5662, 5708, 5668, 5713, 

Well that doesn't look good. Obviously the trained neural network will output results similar to the training dataset, our training figures only had images with pixels between 5600 and 5800 pixels. In future work, we will need to make image data a lot more robust in order to produce better results. There are also other solutions which are detailed extensively in the .py file 

That concludes the neural network demo, there still needs to be a lot of work to be done, but the neural network itself seems to work very well.