# Running causal-model python script as a kubernetes job

This notebook will walk through step by step to run causal-model python script as a kubernetes job

<video width="760" height="500" controls src="./media/pod_creation_and_copy_scripts.mp4" />

# Step 0: Prerequisites


``Add a text to display namespace or pod in k9s.``


1. You must have been added to a Nautilus namespace 
2. You must have kubectl installed. There is a notebook to assist you [here](./Step1-kubectl_installation.ipynb).
3. You must have a PVC on the Nautilus cluster in your assigned namespace. There is a notebook to assist you [here](./Step2-persistant_volume_creation.ipynb).



# Step 1: Create causal-model.py script to run as a kubernetes job

### Step 1A: Create causal-model-a.py file
You can find the python script here [causal-model-a.py](./scripts/causal-model-a.py)

### Step 1B: Create script to install all required python libraries before runing causal-model model
You can find the script here [run_install.sh](./scripts/run_install.sh)

# Step 2: Copying Our Script to the Cluster

### Step 2A: Spawn Pod with PVC
You now need to spawn a pod on the cluster with your peristent volume attached

``The link below is broken.``

``The video showed the old yaml file which had the security/permission problem.``


For a refresher, [here is a sample YAML file](.yaml//pod_pvc.yml). Be sure to change the `name` of the pod and the `persistentVolume-name`

```yaml
apiVersion: v1
kind: Pod
metadata:
  name: {{pod name}}
spec:
  containers:
  - name: pod-name-sso
    image: ubuntu:20.04
    command: ["sh", "-c", "echo 'Im a new pod' && sleep infinity"]
    resources:
      limits:
        memory: 12Gi
        cpu: 2
      requests:
        memory: 10Gi
        cpu: 2
    volumeMounts:
    - mountPath: /data
      name: {{persistentVolume-name}}
  volumes:
    - name: {{persistentVolume-name}}
      persistentVolumeClaim:
        claimName: {{persistentVolume-name}}
```

Once you have updated those values, you can run the following cell:

In [None]:
!kubectl apply -f ./yaml/pod_pvc.yml

"""
The following error was caused by the above command.

(base) jovyan@jupyter-myoungkyu-40unomaha-2eedu:~/0-kubectrl$ kubectl apply -f pod_pvc.yml 
Error from server: error when creating "pod_pvc.yml": admission webhook "pod.nrp-nautilus.io" denied the request: Please accept the AUP at the user portal.
"""

"""
https://portal.nrp-nautilus.io/ use the link , scroll down to Acceptable Use Policy and accept the policy
"""

### Step 2B: Copy the File to the PVC
Run the following cell until your pod is `Running`:

In [None]:
! kubectl get pods

Once your pod is running, we can copy our library installation script and causal-model python scripts to the PVC attached to the pod. Change `PODNAME` to your podname:
 

In [None]:
! kubectl cp ./scripts/causal-model-a.py gp-engine-unoselab01-pod1:/data/causal-model-a.py

In [None]:
! kubectl cp ./scripts/run_install.sh gp-engine-unoselab01-pod1:/data/run_install.sh

#### Note: 
Order of copying sripts does not matter here because these scripts will run as per the command order given in job specification yaml

See Job specification yaml in Step 4
```yaml
command: ["sh", "-c", "bash run_install.sh && python3 /data/causal-model-a.py"]

```
Here running library installation script first before running causal-model job

We can check that our copy was successful with the `exec` subcommand in `kubectl`. Again, replace PODNAME with your pod's name:

In [None]:
! kubectl exec gp-engine-unoselab01-pod1 -- cat /data/causal-model-a.py

In [None]:
! kubectl exec gp-engine-unoselab01-pod1 -- cat /data/run_install.sh

# Step 3: Building the Job Specification YAML

We now have everything we need to run our causal-model job. The final to-do item is to create a YAML Job Specification file. There is a template file for this in the repository [here](./causal_model_job.yml)


```yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: {{job name}}
spec:
  ttlSecondsAfterFinished: 86400 # a day
  template:
    spec:
      automountServiceAccountToken: false
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: topology.kubernetes.io/region
                    operator: In
                    values:
                      - us-central 
      containers:
        - name: job-casual-model-train-container
          image: gitlab-registry.nrp-nautilus.io/gp-engine/jupyter-stacks/bigdata-2023:latest
          workingDir: /data
          command: ["sh", "-c", "bash run_install.sh && python3 /data/causal-model-a.py"]
          volumeMounts:
            - name: pvc-gp-engine-unoselab01
              mountPath: /data
          resources:
            limits:
              memory: 21Gi
              cpu: "8"
              nvidia.com/gpu: 1
            requests:
              memory: 20Gi
              cpu: "8"    
              nvidia.com/gpu: 1
          securityContext:
            allowPrivilegeEscalation: false
            runAsUser: {{runAsUserID}}
      volumes:
        - name: {{ pvc_name }}
          persistentVolumeClaim:
            claimName: {{ pvc_name }}
      restartPolicy: Never
  backoffLimit: 1

```

Fill the job name and pvc name in the template

#### Note: Add securityContext in Job specification yml and add runAsUser Id copied from the pod container shell (Mostly it is 0)
This will provide container root permissions to perform File I/O opertation required in the job. 
```yaml
  securityContext:
    allowPrivilegeEscalation: false
    runAsUser: 0
```

<video width="760" height="500" controls src="./media/running_causal_model_job.mp4" />

# Step 4: Start the Job

Run the cell below to start the job:

In [16]:
! kubectl create -f ./yaml/causal_model_job.yml

job.batch/job-casual-model-gp-engine-unoselab01 created


Run the cell below until your job moves to the `Complete` status. It will go through the stages of: `Pending`, `ContainerCreating`, and `Running`:

# Step 5: Review the Output of the Job

As you can see in the output from Step 5, your job created a pod with the name of `job-ABCDE`. Let's check the output of that pod to see our accuracy. Change `PODNAME` below to the correct pod name:

In [19]:
! kubectl get pods

NAME                                          READY   STATUS              RESTARTS   AGE
gp-engine-unoselab01-pod1                     1/1     Running             0          47m
job-casual-model-gp-engine-unoselab01-85qrx   0/1     ContainerCreating   0          36s


In [None]:
!kubectl get jobs

In [None]:
! kubectl logs -f job-casual-model-gp-engine-unoselab01-85qrx

#### You can also save the job logs into a seprate file to analyze it

<img src="./media/causal-model-job-logs.png" />

In [None]:
!kubectl logs -f <job-pod-name> > causal-model-job.txt

# Step 6: Delete the Job and the Pod

The final step is to delete the job we ran the pod we spawned. Please change `JOBNAME` and `PODNAME` below to the appropriate name:

In [None]:
! kubectl delete job <job-name>

In [None]:
! kubectl delete pod <job-pod-name>