An implementation of machine learning on secure data. We allow a model to be trained and used on data that is kept private. We use the MPyC library to perform a secure multi-party computation (MPC) that trains a forest of decision trees using an algorithm that is similar to the C4.5 machine learning algorithm.
Install Python 3.7, then invoke:
pip install -r requirements.txt
The spect.py and balance.py files contain examples of how to specify a dataset and to train a random forest on this data. These examples can be run as follows:
python spect.py
python balance.py
Please keep in mind that these computations are much slower than their non-MPC counterparts.
Run the test by invoking:
pytest
Run tests in watch mode:
ptw [-c]
(The -c
flag causes the screen to be cleared before each run.)
pip install snakeviz
python -m cProfile -o spect.stats spect.py
snakeviz spect.stats
This algorithm was developed as part of the SODA project. Many thanks to Mark Abspoel, Daniel Escudero and Nikolaj Volgushev for designing the decision tree algorithm for MPC (See chapter 6 of this SODA document). Many thanks to Berry Schoenmakers who developed MPyC and helped us throughout the implementation of this algorithm.