GA

code repo for "Gymnasie Arbete"

ROADMAP

Use MNIST dataset for metrics
Create TfJobs (https://www.kubeflow.org/docs/components/training/tftraining/) with MNIST and deploy on k8s
Create and deploy MNIST model with single node computation
Deploy MNIST model with multi worker computation (https://www.tensorflow.org/guide/distributed_training#multiworkermirroredstrategy) Use mirror strat
Run metrics on both systems (TfJobs on k8s and multi worker computation)
Analyze metrics
Done

I created the AI model
I downloaded kubeflow manifests to the cluster using k8s kustomize (make sure k8s cluster is v1.21.1, kustomize is v3.2.0)
"k create -f mnist.yaml"

implement Parimiter server to speed up syncronisation between workers.

Should only need to specify PS job to join stategy, and worker job to do the default work.

8ps 12 work: 71 sek

2ps 4work: 90 sek

time to beat: 27.4 sek

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
__pycache__		__pycache__
k8s_logs/train		k8s_logs/train
kubeflow		kubeflow
logs		logs
mnist_example		mnist_example
mnist_gpu		mnist_gpu
mnist_ps		mnist_ps
mnist_worker		mnist_worker
mp-test		mp-test
README.md		README.md
archive.zip		archive.zip
deployment.yaml		deployment.yaml
get_data.py		get_data.py
main.py		main.py
mnist.yaml		mnist.yaml
mnist_basic.py		mnist_basic.py
plot_model.py		plot_model.py
ps.yaml		ps.yaml
simple.yaml		simple.yaml
train.py		train.py