# NGC Digits App on HPE Container Platform (GPU & persistent storage)

In this tutorial, you see how to access and use a HPE Container Platform-provided Kubernetes cluster to deploy containers with GPUs, persistent storage and access to data through an NFS mount. It is also shown how to access an endpoint in the deployed container through the HPE CP gateway.

Towards the end of this tutorial there is an interactive part that allows to experience working with a deployed application and simulating a crash of the system including a successful rebuild without data loss is also available. 

![image.png](attachment:image.png)

## Authenticate & Init 

In [1]:
kubectl plugin list
#kubectl hpecp --help

The following compatible plugins are available:

/usr/local/bin/kubectl-hpecp


In [2]:
kubectl hpecp refresh hcp50.demo.local --hpecp-user=dgeisse --hpecp-pass=$(cat ~/.dgeisse_passwd) --insecure-skip-tls-verify



Retrieved new Kube Config from HPECP server at hcp50.demo.local:8080.
The KUBECONFIG environment variable HAS NOT been set.
Your current session WILL NOT have the new configuration.
To persist these changes by loading all current Kube Config
values into your default Kube Config file, run the
following command:

    KUBECONFIG="/root/.kube/.hpecp/hcp50.demo.local/config:/root/.kube/config-backup" kubectl config view --raw > /root/.kube/config

To persist these changes by changing your local KUBECONFIG
environment variable, run the following command:

    export KUBECONFIG="/root/.kube/.hpecp/hcp50.demo.local/config"

CAUTION - both of these commands will OVERWRITE your current
Kube Config settings. This is probably what you want, but
to confirm that this command will not break your system,
run the following command to view the resulting Kube
Config file:

    KUBECONFIG="/root/.kube/.hpecp/hcp50.demo.local/config:/root/.kube/config" kubectl config view



In [3]:
export KUBECONFIG="/root/.kube/.hpecp/hcp50.demo.local/config"

In [4]:
kubectl config set-context --current --namespace=k8s-demo-tenant

Context "CTC Boeblingen-CTC K8s Cluster-dgeisse" modified.


## Create Persistent Volume Claim

In [5]:
kubectl get sc

NAME                PROVISIONER        AGE
default (default)   com.mapr.csi-kdf   53d


In [None]:
cat << 'EOF' | kubectl apply -f -
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: digits-pv-claim
  labels:
    run: ngc-example
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi
EOF

## Create Digits Deployment with GPU

The following deployment definition includes:
- A Digits application image from the Nvidia GPU cloud (line 20)
- A GPU resource request (lines 28 - 31)
- Mounting a directory (/workspace) on MapR persistent storage (lines 24-26 and 32-35)
- Mounting the TenantStorage using FSMount (line 16)

In [None]:
cat << 'EOF' | kubectl apply -f -
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: hello-ngc
spec:
  selector:
    matchLabels:
      run: ngc-example
  replicas: 1
  template:
    metadata:
      labels:
        run: ngc-example
        hpecp.hpe.com/fsmount: k8s-demo-tenant
    spec:
      containers:
        - name: hello-ngc
          image: nvcr.io/nvidia/digits:20.03-tensorflow-py3            
          ports:
            - containerPort: 5000
              protocol: TCP
          volumeMounts:
          - name: ngc-persistent-storage
            mountPath: /workspace
          resources: 
            requests: 
              nvidia.com/gpu: "1"
            limits: 
              nvidia.com/gpu: "1"
      volumes:
      - name: ngc-persistent-storage
        persistentVolumeClaim:
          claimName: digits-pv-claim

EOF

## Create Service Endpoint

By including line 7 the HPE Container Platform provides access through its gateway

In [None]:
cat << 'EOF' | kubectl apply -f -
apiVersion: v1
kind: Service
metadata:
  name: ngc-demo-lb
  labels:
    hpecp.hpe.com/hpecp-internal-gateway: "true"
spec:
  selector: 
    run: ngc-example
  ports:
  - name: http-hello-ngc
    protocol: TCP
    port: 5000
    targetPort: 5000
  type: NodePort
EOF

In [8]:
kubectl describe service ngc-demo-lb

Name:                     ngc-demo-lb
Namespace:                k8s-demo-tenant
Labels:                   hpecp.hpe.com/hpecp-internal-gateway=true
Annotations:              hpecp-internal-gateway/5000: hcp50gateway.demo.local:10131
Selector:                 run=ngc-example
Type:                     NodePort
IP:                       10.99.248.4
Port:                     http-hello-ngc  5000/TCP
TargetPort:               5000/TCP
NodePort:                 http-hello-ngc  30750/TCP
Endpoints:                <none>
Session Affinity:         None
External Traffic Policy:  Cluster
Events:
  Type    Reason  Age   From         Message
  ----    ------  ----  ----         -------
  Normal  HpeCp   2s    hpecp-agent  Created HPECP K8S service


## Use Digits

### First use: Download MNIST Dataset

If not already there, download the MNIST dataset or other demo dataset to the TenantShare

In [None]:
kubectl get pods 

In [None]:
kubectl exec hello-ngc-64fc897fd-9p6hq -- python -m digits.download_data mnist /bd-fs-mnt/TenantShare/repo/mnist

### Import data set and train CNN

To access the Digits environment, go to the URL that is shown in the output of `kubectl describe service ngc-demo-lb` above, in the line that starts with *Annotations*.

You can follow the tutorial described [here](https://docs.nvidia.com/deeplearning/digits/digits-container-getting-started/index.html#example1_mnist) (jump directly to step 5) to import the MNIST dataset and train a first CNN.

## Delete stuff

Use the following to simulate crashes and/or clean up your environment afterwards.

### Delete deployment and service (simulate crash)

Run the following to see that the deployment is gone. After that, go back to [create the deployment](#Create-Digits-Deployment-with-GPU) again and see that the application has kept its data from before because it was persisted.

In [None]:
kubectl delete deployment hello-ngc
kubectl delete service ngc-demo-lb

### Delete entire demo (clean up)

In [None]:
kubectl delete deployment hello-ngc
kubectl delete service ngc-demo-lb
kubectl delete pvc digits-pv-claim

In [None]:
rm -rf /root/.kube/