# ML - memory estimation

*I will be grateful for reviews, suggestions and fixes - [jacek@golem.network](jacek@golem.network)*

This is a document that explores some issues with memory requirements of sequential-verification algorithm described [here](https://github.com/imapp-pl/golem_rd/wiki/Verification-of-iterative-tasks) (this link does not work) 
It is part of the ML PoC task described [here](https://github.com/imapp-pl/golem_rd/issues/91)

## Remarks

 1. Neural networks can be very big - that is true. Currently, the large ones are about ~1-12GB of weights data + additionally training data. That is because 1-12GB RAM GPUs are currently dominating both private and commercial segments. (private deep learning researchers typically use NVIDIA1060/1080/Titan X 6/12GB cards or cloud solutions with (eg Amazon 1/2/4/8 GB RAM GPUs) - and a network typically has to fit on one card to leverage the computational power (but mulit-gpus solutions are also in use).
 2. It doesn't mean that all networks are that big - although it is currently thought that it's best to max the size and then de-overfit it by regularization techniques, the training times and difficulty of training large networks, plus marginal benefits of using them if the user has only a limited amount of data, mean, that the medium-sized netwoks also have some use.
 3. Even if that will not be true at some point, there are others sequential algorithms, to which, I believe, this algorithm can be used, if nothing better is developed before.

In the second notebook in this directory, there is a hypotesis that, using some additional assumptions about rationality of agents, only a very limited - maybe even very small constant - number of verification steps has to be done - about 2-5 checks should suffice.  
**Important!** The analysis done in the second notebook assumes we can choose any fragment of the solution to validate. However, if we were to precommit to choosing $k$ steps out of all epochs, then after the last check, provider would have no incentive to continue working on it.  
It is important then to randomize the number of checks too - in some clever way (ie probably most of the first checks will be done at the middleof computations, and assuming we can only afford to run 2-3 checks, and our analysis in second notebook showed that it will suffice, we have to somehow choose point next).

So, it's just a matter of how much data can we afford to send.  
 - If it is measured in tens of mbytes, then this would seriously prevent such a scheme to be used in Golem.
 - If it is measured in hundreds mbytes/gigabytes, then it would be possible to validate small neural networks
 - If it is measured in tens of gigabytes, we can afford to validate all currently used networks

I've checked snapshot files from PyTorch, Keras and TensorFlow experimentally and there is almost no overhead in terms of storage - ie, matrices take about $\textrm{size of matrix} * \textrm{size of float32}$.

Sending files can be improved a little by using some compression method.  
Result of experiment on a small and medium network below. 

Small network (4.2M):
 - gzip - 3.9M
 - bzip2 - 4.0M
 - xz - 3.9M
 - 7zip - 3.9M
 
Medium network (168mb):
 - gzip - 155M
 - bzip2 - 158M
 - xz - 153M
 - 7zip - 153M

It shows that there is not much to optimize in terms of one network file compression.

Code:

```
#!/bin/bash
tar -czvf small.gzip small
tar -cjvf small.bzip small
tar -cJvf small.xz small
7za a small.7z small

tar -czvf medium.gzip medium
tar -cjvf medium.bzip medium
tar -cJvf medium.xz medium
7za a medium.7z medium
```

At the same time, keep in mind that in our case one state of the network is actually 2 states of the physical network (ie the beginning and the end of epoch).  
Although compressors are doing a slightly better job here (probably because not all paramters are changed during one iteration) it is still about ~2x increase in memory usage.

Small network (8.5M):
 - gzip - 8.5M
 - bzip2 - 8.5M
 - xz - 4.9M
 - 7zip - 4.9M
 
Medium network (336mb):
 - gzip - 309M
 - bzip2 - 315M
 - xz - 306M
 - 7zip - 305M

I haven't done any experiments to check if multiple checkpoints would be better compressed together, but I think we can safely assume it is not the case, looking at just above results.

**BUT - there is a possible modification of the original idea - we are using random, small part of the network instead of using the hash to precommit. Then, we don't have to send the whole "after" state - we can just compare to this "hash" - fingerprint of the network.  
It still can be used to precommitment and for verification.**