this is respository of DAIG (Distributed A.I Grid) project client program. it is based on PyQT5 because we use tensorflow for model training and others.
DAIG (Distributed A.I Grid) is distributed deep learning based machine learning system. Usually, deep learning based machine learning methods require more training time than other methods. One way to solve this long training time problem is using multiple GPUs. However, it is pretty expensive. So, we tried to use other people's left pc resources instead of multiple GPUs
DAIG system consists of Learning requestor, Resource provider and Management server. Learning requestor makes project and upload train data to Management server. Then, Management server distribute train data shards and model information to registered Resource providers When all train data shards are used for leatning, Management server save final model and weight result at object storage. Learning requestor can download trained model at anytime.
Name | version | usage |
---|---|---|
Django | 3.1.7 | for server development |
boto3 | 1.17.67 | for object storage |
numpy | 1.19.5 | for data manipulation |
requests | 2.25.1 | for http communication |
h5py | 3.1.0 | for model saving |
iamport | for pay procedure |
We constructed DAIG distribution and result gathering system based on K-batch sync SGD. And it gathers trained gradients based on all-reduce method. K-batch size can be controlled by Learning requestor. So, its final result is also contorlled by Learning requestor.
This is server program. so, you should better check "https://github.com/netroid314/ASWCS_front"
First, you need to install python libraries which are listed above.
Or you can use requirement file.
Then use manage.py for Django server launch. One exmaple is
python manage.py runserver 0.0.0.0:8000
Refer Django reference book for more detail
However, DAIG also focused on balance among Resource providers. so, it may not be pure K-batch sync SGD. (depends on situation)
this project has been developed by korean developers. So, there are some korean comments. And this is server program so please also check https://github.com/netroid314/ASWCS_front.