# Model Deployment on Kubernetes
In this demo we are going to look at different ways to run Model Deployments on Kubernetes.

## Having a Look around
In the `manifests` directory.

```bash
manifests
├── deployment.yaml
├── hpa.yaml
├── service.yaml
```

Deploy it!

## Horizontal Pod Autoscaler
HPA plays a huge role in being able to handle scaling your applications up and down.

This is especially useful for ML/AI workloads.

```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: ml-model-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: ml-model-deployment
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
```

`scaleTargetRef` defines what we are going to target to scale.

`minReplicas` and `maxRelicas` define the minimum and maximum number of pods to be running high or light load.

`metrics` is where we define the target resource to watch in order to trigger scaling events. 

## Scale it

## Node Affinity

```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ml-model-deployment
spec:
  selector:
    matchLabels:
      app: ml-model
  template:
    metadata:
      labels:
        app: ml-model
    spec:
      spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/hostname
                operator: In
                values:
                - "node02"
      containers:
      - name: ml-model-container
        image: wbassler/mobilenetv3lg-flask:v1.0
        imagePullPolicy: Always
        resources:
          requests:
            cpu: "200m"
            memory: "250Mi"
          limits:
            cpu: "200m"
            memory: "250Mi"
```

## Node Selector
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ml-model-deployment
spec:
  selector:
    matchLabels:
      app: ml-model
  template:
    metadata:
      labels:
        app: ml-model
    spec:
      nodeSelector:
        kubernetes.io/hostname: "node02"
      containers:
      - name: ml-model-container
        image: wbassler/mobilenetv3lg-flask:v1.0
        imagePullPolicy: Always
        resources:
          requests:
            cpu: "200m"
            memory: "250Mi"
          limits:
            cpu: "200m"
            memory: "250Mi"
```

## Taints
```bash
kubectl taint nodes node02 role=pytorch:NoSchedule
```

```bash
kubectl describe node node02 | grep Taints
```

## Tolerations
```bash
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ml-model-deployment
spec:
  selector:
    matchLabels:
      app: ml-model
  template:
    metadata:
      labels:
        app: ml-model
    spec:
      tolerations:
      - key: "key"
        operator: "Equal"
        value: "value"
        effect: "NoSchedule"
      nodeSelector:
        kubernetes.io/hostname: "node02"
      containers:
      - name: ml-model-container
        image: wbassler/mobilenetv3lg-flask:v1.0
        imagePullPolicy: Always
        resources:
          requests:
            cpu: "200m"
            memory: "250Mi"
          limits:
            cpu: "200m"
            memory: "250Mi"
```


## Assigning GPUs