# Nomad Advanced Job Placement

teaser: |
Explore advanced Nomad job placement strategies with Constraints, Affinities, and Spread.

description: |-
This track will show how you can control job placement in Nomad with:
- [Constraints](https://www.nomadproject.io/docs/job-specification/constraint/)
- [Affinities](https://www.nomadproject.io/docs/job-specification/affinity/)
- and [Spread](https://www.nomadproject.io/docs/job-specification/spread/)
- illustrating the flexibility of Nomad in this area.

You will also learn about Nomad's [Variable Interpolation](https://www.nomadproject.io/docs/runtime/interpolation/) that allow applications deployed by Nomad to do things like use listen on ports dynamically selected by Nomad.

You will deploy a Nomad cluster and run Nomad jobs that deploy a web application and [Traefik](https://containo.us/traefik/), which will provide load balancing across multiple instances of the application.

Before running this track, we suggest you run the **Nomad Basics** and **Nomad Simple Cluster** tracks.

<img src=https://storage.googleapis.com/instruqt-hashicorp-tracks/logo/nomad.png width=100>


# Prep

## AWS Credentials

Set your AWS Credentials. I got one from Instruqt terminal with this command.

```bash
env | grep -iE "^aws.*access" | xargs -I{} echo export {}
```

In [1]:
unset AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY
export AWS_DEFAULT_REGION=us-west-2
export AWS_REGION=$AWS_DEFAULT_REGION
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
export TF_VAR_aws_access_key_id=$AWS_ACCESS_KEY_ID
export TF_VAR_aws_secret_access_key=$AWS_SECRET_ACCESS_KEY
export TF_INPUT=false

printf "%s\n" "#==> Creds:" "$AWS_REGION" "$AWS_ACCESS_KEY_ID" "$AWS_SECRET_ACCESS_KEY"

#==> Creds:
us-west-2
AKIAQZ6XIXZBVTQOWTUM
1TuLFsGVoiBwD/p3JWjTG6c04Vyx76Af0R64k1+1


Create default VPC if needed.

In [2]:
aws configure set region us-west-2 --profile default
aws ec2 create-default-vpc > /dev/null || true
printf "\n#==> Show VPC ids\n"
aws ec2 describe-vpcs | jq -r '.[] | .[] | .VpcId'


#==> Show VPC ids
vpc-022ce4fd180ffc1af


## Clone Repo

In [None]:
pushd /tmp >/dev/null
git clone https://github.com/phanclan/nomad_terraform
# cp -r /tmp/nomad/terraform /tmp/Nomad
popd >/dev/null

In [3]:
# rm -rf /tmp/Nomad/{ssh_key,cluster} 

mkdir -p /tmp/nomad_terraform/ssh_key
# mkdir -p /tmp/nomad_terraform/cluster

## create terragrunt.hcl - ssh_key

In [4]:
cat > /tmp/nomad_terraform/ssh_key/terragrunt.hcl <<"EOL"
# Indicate where to source the terraform module from.
# The URL used here is a shorthand for
# "tfr://registry.terraform.io/terraform-aws-modules/vpc/aws?version=3.5.0".
# Note the extra `/` after the protocol is required for the shorthand
# notation.
terraform {
  source = "tfr:///cloudposse/key-pair/aws?version=0.18.3"
  extra_arguments "plan" {
    commands = [
      "plan",
    ]
    arguments = [
      "-input=false",
    ]
  }
  extra_arguments "apply" {
    commands = [
      "apply",
      "destroy"
    ]
    arguments = [
      "-input=false",
      "-auto-approve"
    ]
  }
}

# generate "versions" {
#   path = "versions.tf"
#   if_exists = "overwrite"
#   contents = <<EOF
# terraform {
#   # required_version = "~1.1.0"
#   required_providers{
#     aws = {
#       source = "hashicorp/aws"
#       version = ">= 2.70.0"
#     }
#   }
# }
# EOF
# }

# generate "provider" {
#   path = "provider.tf"
#   if_exists = "overwrite"
#   contents = <<EOF
# provider "aws" {
#   region = "us-west-2"   # region to deploy the resources into
# }
# EOF
# }

# Indicate the input values to use for the variables of the module.
inputs = {
  ssh_public_key_path       = "/tmp/Nomad/ssh_key"
  generate_ssh_key     = true
  name                 = "aws-key-pair"
  # tags = {
  #   Terraform   = "true"
  #   Environment = "root"
  #   Name        = "Terragrunt-${path_relative_to_include()}"
  # }
}

EOL

In [6]:
pushd /tmp/nomad_terraform/ssh_key >/dev/null
time terragrunt apply > tf_apply_ssh_key_out.txt 2>&1 &
popd >/dev/null


real	0m12.590s
user	0m6.405s
sys	0m1.031s
[1]+  Done                    time terragrunt apply > tf_apply_ssh_key_out.txt 2>&1


In [7]:
pushd /tmp/nomad_terraform/ssh_key >/dev/null
terragrunt output -raw public_key_filename
popd >/dev/null

[33mWARN[0m[0000] No double-slash (//) found in source URL /cloudposse/key-pair/aws. Relative paths in downloaded Terraform code may not work. 
/tmp/Nomad/ssh_key/aws-key-pair.pub

## create terragrunt.hcl - cluster

### packer

In [8]:
ls /tmp/nomad_terraform/aws

[0m[01;34menv[0m  [01;34mmodules[0m  packer.json  README.md


In [16]:
pushd /tmp/nomad_terraform/aws >/dev/null
tee packer.json <<EOL
{
  "builders": [{
    "type": "amazon-ebs",
    "region": "us-west-2",
    "source_ami_filter": {
      "filters": {
        "virtualization-type": "hvm",
        "architecture": "x86_64",
        "name": "ubuntu/images/hvm-ssd/ubuntu-xenial-16.04-amd64-server-*",
        "block-device-mapping.volume-type": "gp2",
        "root-device-type": "ebs"
      },
      "owners": ["099720109477"],
      "most_recent": true
    },
    "instance_type": "t3.large",
    "ssh_username": "ubuntu",
    "ami_name": "hashistack {{timestamp}}"
  }],
  "provisioners":  [
  {
    "type": "shell",
    "inline": [
      "sudo mkdir /ops",
      "sudo chmod 777 /ops"
    ]
  },
  {
    "type": "file",
    "source": "../shared",
    "destination": "/ops"
  },
  {
    "type": "file",
    "source": "../examples",
    "destination": "/ops"
  },
  {
    "type": "shell",
    "script": "../shared/scripts/setup.sh",
    "environment_vars": [
      "INSTALL_NVIDIA_DOCKER=true"
    ]
  }]
}
EOL

time packer build packer.json > /tmp/packer_nomad_out.txt 2>&1 &
popd >/dev/null


real	1m54.013s
user	0m0.476s
sys	0m0.316s
[1]+  Exit 1                  time packer build packer.json > /tmp/packer_nomad_out.txt 2>&1
{
  "builders": [{
    "type": "amazon-ebs",
    "region": "us-west-2",
    "source_ami_filter": {
      "filters": {
        "virtualization-type": "hvm",
        "architecture": "x86_64",
        "name": "ubuntu/images/hvm-ssd/ubuntu-xenial-16.04-amd64-server-*",
        "block-device-mapping.volume-type": "gp2",
        "root-device-type": "ebs"
      },
      "owners": ["099720109477"],
      "most_recent": true
    },
    "instance_type": "t3.large",
    "ssh_username": "ubuntu",
    "ami_name": "hashistack {{timestamp}}"
  }],
  "provisioners":  [
  {
    "type": "shell",
    "inline": [
      "sudo mkdir /ops",
      "sudo chmod 777 /ops"
    ]
  },
  {
    "type": "file",
    "source": "../shared",
    "destination": "/ops"
  },
  {
    "type": "file",
    "source": "../examples",
    "destination": "/ops"
  },
  {
    "type": "shell",
    "scr

In [48]:
tail -n 30 /tmp/packer_nomad_out.txt

ami-0753fcee97d3a73ad


### terraform configs

Get the ami

In [50]:
grep ami /tmp/packer_nomad_out.txt | tail -n 1 | awk '{print $NF}'

ami-0753fcee97d3a73ad


Customize the inputs below. Use the `ami` value above

In [52]:
cat > /tmp/nomad_terraform/aws/env/terragrunt.hcl <<"EOL"
# Indicate where to source the terraform module from.
# The URL used here is a shorthand for
# "tfr://registry.terraform.io/terraform-aws-modules/vpc/aws?version=3.5.0".
# Note the extra `/` after the protocol is required for the shorthand
# notation.
terraform {
  #source = "git::https://github.com/hashicorp/nomad.git//terraform/aws/modules/hashistack"
  extra_arguments "plan" {
    commands = [
      "plan",
    ]
    arguments = [
      "-input=false",
    ]
  }
  extra_arguments "apply" {
    commands = [
      "apply",
      "destroy",
    ]
    arguments = [
      "-input=false",
      "-auto-approve"
    ]
  }
}

generate "versions" {
  path = "versions.tf"
  if_exists = "overwrite"
  contents = <<EOF
terraform {
  # required_version = "~1.1.0"
  required_providers{
    aws = {
      source = "hashicorp/aws"
      version = "~> 3.75.2"
    }
  }
}
EOF
}

generate "provider" {
  path = "provider.tf"
  if_exists = "overwrite"
  contents = <<EOF
provider "aws" {
  region = "us-west-2"   # region to deploy the resources into
}
EOF
}

#// Indicate the input values to use for the variables of the module.
inputs = {
  name         = "pphan"
  region        = "us-west-2"
  ami           = "ami-0753fcee97d3a73ad"
  server_instance_type  = "m5.large"
  client_instance_type  = "m5.large"
  client_count  = 3
  server_count  = 3
  key_name      = "aws-key-pair"
  whitelist_ip = "98.234.158.216/32"
  root_block_device_size = 16
  nomad_binary           = "none"
#   consul_version = "1.13.1"
#   nomad_version  = "1.3.5"
#   owner          = "pphan"
#   vpc_id         = "vpc-0a3da3e09494785db"
#   #//optional
#   public_ip      = true
#   #consul_config = {}
#   tags = {
#     Terraform   = "true"
#     Environment = "root"
#     Name        = "Terragrunt-${path_relative_to_include()}"
#   }
}

EOL
echo done

done


In [18]:
cat > /tmp/nomad_terraform/aws/env/main.tf <<"EOL"
module "hashistack" {
  source = "../modules/hashistack"
  name                   = var.name
  region                 = var.region
  ami                    = var.ami
  server_instance_type   = var.server_instance_type
  client_instance_type   = var.client_instance_type
  key_name               = var.key_name
  server_count           = var.server_count
  client_count           = var.client_count
  retry_join             = var.retry_join
  nomad_binary           = var.nomad_binary
  root_block_device_size = var.root_block_device_size
  whitelist_ip           = var.whitelist_ip
}

variable "name" {
  description = "Used to name various infrastructure components"
}

variable "whitelist_ip" {
  description = "IP to whitelist for the security groups (set 0.0.0.0/0 for world)"
}

variable "region" {}

variable "ami" {}

variable "server_instance_type" {}

variable "client_instance_type" {}

variable "root_block_device_size" {}

variable "key_name" {}

variable "server_count" {}

variable "client_count" {}

variable "retry_join" {
  type = map(string)
  default = {
    provider  = "aws"
    tag_key   = "ConsulAutoJoin"
    tag_value = "auto-join"
  }
}

variable "nomad_binary" {}
EOL

In [96]:
cat > /tmp/nomad_terraform/aws/env/outputs.tf <<"EOL"
output "IP_Addresses" {
  value = <<CONFIGURATION
Client public IPs: ${join(", ", module.hashistack.client_public_ips)}
Server public IPs: ${join(", ", module.hashistack.server_public_ips)}
To connect, add your private key and SSH into any client or server with
`ssh ubuntu@PUBLIC_IP`. You can test the integrity of the cluster by running:
  $ consul members
  $ nomad server members
  $ nomad node status
If you see an error message like the following when running any of the above
commands, it usually indicates that the configuration script has not finished
executing:
"Error querying servers: Get http://127.0.0.1:4646/v1/agent/members: dial tcp
127.0.0.1:4646: getsockopt: connection refused"
Simply wait a few seconds and rerun the command if this occurs.
The Nomad UI can be accessed at http://${module.hashistack.server_lb_ip}:4646/ui.
The Consul UI can be accessed at http://${module.hashistack.server_lb_ip}:8500/ui.
Set the following for access from the Nomad CLI:
  export NOMAD_ADDR=http://${module.hashistack.server_lb_ip}:4646
CONFIGURATION
}
output "consul_http_addr" {
  value = "http://${module.hashistack.server_lb_ip}:8500"
}
output "nomad_addr" {
  value = "http://${module.hashistack.server_lb_ip}:4646"
}
output "client_public_ips_2" {
  value = module.hashistack.client_public_ips_2
}
EOL

#### extra client

- https://aws.amazon.com/ec2/spot/pricing/


In [119]:
cat > /tmp/nomad_terraform/aws/modules/hashistack/client2.tf <<"EOL"
resource "aws_spot_instance_request" "client2" {
  ami                    = var.ami
  spot_price             = "0.14"
  instance_type          = "m5.2xlarge"
  key_name               = var.key_name
  vpc_security_group_ids = [aws_security_group.primary.id]
  count                  = 1
  depends_on             = [aws_instance.server]
  ipv6_address_count     = 0
  ipv6_addresses         = []

  # instance tags
  tags = merge(
    {
      "Name" = "${var.name}-client-${count.index}"
    },
    {
      "${var.retry_join.tag_key}" = "${var.retry_join.tag_value}"
    },
  )

  root_block_device {
    volume_type           = "gp3"
    volume_size           = var.root_block_device_size
    delete_on_termination = "true"
  }

  ebs_block_device {
    device_name           = "/dev/xvdd"
    volume_type           = "gp3"
    volume_size           = "50"
    delete_on_termination = "true"
  }

  user_data = templatefile("${path.root}/user-data-client.sh",
    {
      region = var.region
      retry_join = chomp(
        join(
          " ",
          formatlist("%s=%s ", keys(var.retry_join), values(var.retry_join)),
        ),
      )
      nomad_binary = var.nomad_binary
    }
  )
  iam_instance_profile = aws_iam_instance_profile.instance_profile.name
}
output "client_public_ips_2" {
  value = aws_spot_instance_request.client2[*].public_ip
}
EOL

### copy user-data scripts

In [23]:
pushd /tmp/nomad_terraform/aws/env >/dev/null
cp ./us-east/user-data-*.sh .
popd >/dev/null

### terraform init

In [105]:
pushd /tmp/nomad_terraform/aws/env >/dev/null
terragrunt init -upgrade -force-copy
popd >/dev/null

[0m[1mUpgrading modules...[0m
- hashistack in ../modules/hashistack

[0m[1mInitializing the backend...[0m

[0m[1mInitializing provider plugins...[0m
- Finding hashicorp/aws versions matching "~> 3.75.2"...
- Using previously-installed hashicorp/aws v3.75.2

[0m[1m[32mTerraform has been successfully initialized![0m[32m[0m
[0m[32m
You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.[0m


### terraform apply

In [124]:
pushd /tmp/nomad_terraform/aws/env
time terragrunt apply > /tmp/tf_apply_nomad_out.txt 2>&1 &
# time terragrunt refresh
popd

/tmp/nomad_terraform/aws/env /media/code/hc_demos-jupyter/Nomad
[1] 11798
/media/code/hc_demos-jupyter/Nomad


In [143]:
tail -n 50 /tmp/tf_apply_nomad_out.txt

          [32m+[0m [0m[1m[0mvolume_id[0m[0m             = "vol-07ca49affc3ccdb56"
            [90m# (4 unchanged attributes hidden)[0m[0m
        }
    }


Unless you have made equivalent changes to your configuration, or ignored the
relevant attributes using ignore_changes, the following plan may include
actions to undo or respond to these changes.
[90m
─────────────────────────────────────────────────────────────────────────────[0m
[0m
[1mChanges to Outputs:[0m[0m
  [33m~[0m [0m[1m[0mclient_public_ips_2[0m[0m = [
      [31m-[0m [0m[90mnull[0m[0m,
      [32m+[0m [0m"35.90.197.196",
    ]

You can apply this plan to save these new output values to the Terraform
state, without changing any real infrastructure.
[0m[1m[32m
Apply complete! Resources: 0 added, 0 changed, 0 destroyed.
[0m[0m[1m[32m
Outputs:

[0mIP_Addresses = <<EOT
Client public IPs: 35.90.27.18, 34.216.150.37, 34.217.94.196
Server public IPs: 35.89.162.19, 54.212.87.37, 54.244.58.182


## set nomad and consul variables

In [144]:
pushd /tmp/nomad_terraform/aws/env >/dev/null
export NOMAD_ADDR=$(terragrunt output -raw nomad_addr)
export CONSUL_HTTP_ADDR=$(terragrunt output -raw consul_http_addr)
# export NOMAD_ADDR=http://pphan-server-lb-2092905469.us-west-2.elb.amazonaws.com:4646
# export CONSUL_HTTP_ADDR=http://pphan-server-lb-2092905469.us-west-2.elb.amazonaws.com:8500
printf "%s\n" "Nomad UI: $NOMAD_ADDR" "Consul UI:$CONSUL_HTTP_ADDR"
popd >/dev/null

Nomad UI: http://pphan-server-lb-992086560.us-west-2.elb.amazonaws.com:4646
Consul UI:http://pphan-server-lb-992086560.us-west-2.elb.amazonaws.com:8500


In [137]:
pushd /tmp/nomad_terraform/aws/env >/dev/null
terragrunt output -json client_public_ips_2 | jq -r .[0]
# export NOMAD_ADDR=$(terragrunt output -raw nomad_addr)
# export NOMAD_ADDR=http://pphan-server-lb-2092905469.us-west-2.elb.amazonaws.com:4646
# printf "%s\n" "Nomad UI: $NOMAD_ADDR" "Consul UI:$CONSUL_HTTP_ADDR"
popd >/dev/null

35.90.197.196


- slug: verify-nomad-cluster-health

# Verify the Health of Your Nomad Cluster

teaser: |
    Verify the health of the Nomad cluster that has been deployed for you.

## notes:

In this challenge, you will verify the health of the Nomad cluster that has been deployed for you by the track's setup scripts. This will include checking the health of a Consul cluster that has been set up on the same VMs.

In later challenges, you will run Nomad jobs that deploy a web application and the Traefik load balancer. You will then update them using Nomad's various options for controlling job placement.

---

In this challenge, you will verify the health of the Nomad cluster that has been deployed for you by the track's setup scripts. This will include checking the health of a Consul cluster that has been set up on the same VMs.

The cluster is running:
- 3 Nomad/Consul server
- 3 Nomad/Consul clients.

They are using software versions:
- Nomad 1.3.1
- Consul 1.12.2

First, verify that all 6 Consul agents are running and connected to the cluster:

In [145]:
consul members

Node              Address             Status  Type    Build   Protocol  DC   Partition  Segment
ip-172-31-35-254  172.31.35.254:8301  alive   server  1.12.2  2         dc1  default    <all>
ip-172-31-38-221  172.31.38.221:8301  alive   server  1.12.2  2         dc1  default    <all>
ip-172-31-38-99   172.31.38.99:8301   alive   server  1.12.2  2         dc1  default    <all>
ip-172-31-2-150   172.31.2.150:8301   alive   client  1.12.2  2         dc1  default    <default>
ip-172-31-33-53   172.31.33.53:8301   alive   client  1.12.2  2         dc1  default    <default>
ip-172-31-36-101  172.31.36.101:8301  alive   client  1.12.2  2         dc1  default    <default>
ip-172-31-36-154  172.31.36.154:8301  alive   client  1.12.2  2         dc1  default    <default>


You should see 6 Consul agents with the "`alive`" status.

```
Node              Address             Status  Type    Build   Protocol  DC   Partition  Segment
ip-172-31-32-161  172.31.32.161:8301  alive   server  1.12.2  2         dc1  default    <all>
ip-172-31-35-92   172.31.35.92:8301   alive   server  1.12.2  2         dc1  default    <all>
ip-172-31-46-21   172.31.46.21:8301   alive   server  1.12.2  2         dc1  default    <all>
ip-172-31-36-155  172.31.36.155:8301  alive   client  1.12.2  2         dc1  default    <default>
ip-172-31-38-164  172.31.38.164:8301  alive   client  1.12.2  2         dc1  default    <default>
ip-172-31-46-251  172.31.46.251:8301  alive   client  1.12.2  2         dc1  default    <default>
```

Check that the Nomad server is running:

In [146]:
nomad server members

[0mName                     Address        Port  Status  Leader  Raft Version  Build  Datacenter  Region
ip-172-31-35-254.global  172.31.35.254  4648  alive   false   3             1.3.1  dc1         global
ip-172-31-38-221.global  172.31.38.221  4648  alive   true    3             1.3.1  dc1         global
ip-172-31-38-99.global   172.31.38.99   4648  alive   false   3             1.3.1  dc1         global[0m


You should see 3 Nomad servers with the "`alive`" status.
```
Name                     Address        Port  Status  Leader  Raft Version  Build  Datacenter  Region
ip-172-31-32-161.global  172.31.32.161  4648  alive   false   3             1.3.1  dc1         global
ip-172-31-35-92.global   172.31.35.92   4648  alive   false   3             1.3.1  dc1         global
ip-172-31-46-21.global   172.31.46.21   4648  alive   true    3             1.3.1  dc1         global
```

Check the status of the Nomad client nodes:

In [147]:
nomad node status

[0mID        DC   Name              Class   Drain  Eligibility  Status
086e619b  dc1  ip-172-31-2-150   <none>  false  eligible     ready
22f005c9  dc1  ip-172-31-36-154  <none>  false  eligible     ready
84c13196  dc1  ip-172-31-36-101  <none>  false  eligible     ready
160234a4  dc1  ip-172-31-33-53   <none>  false  eligible     ready[0m


You should see 3 Nomad clients with the "`ready`" status.

```
ID        DC   Name              Class   Drain  Eligibility  Status
ba90fa7e  dc1  ip-172-31-36-155  <none>  false  eligible     ready
48d0b218  dc1  ip-172-31-38-164  <none>  false  eligible     ready
6f750cb2  dc1  ip-172-31-46-251  <none>  false  eligible     ready
```

You can also check the status of the Nomad server and clients in the Nomad and Consul UIs.

In [148]:
printf "%s\n" "Consul UI: $CONSUL_HTTP_ADDR" "Nomad UI: $NOMAD_ADDR"

Consul UI: http://pphan-server-lb-992086560.us-west-2.elb.amazonaws.com:8500
Nomad UI: http://pphan-server-lb-992086560.us-west-2.elb.amazonaws.com:4646


In the next challenge, you will run jobs that deploy a web application and the Traefik load balancer.

---

- slug: deploy-the-jobs

# Deploy a Web Application and Traefik with Nomad

teaser: |
Deploy a web application and Traefik with Nomad jobs.

## notes:

In this challenge, you will run Nomad jobs that deploy a web application and [Traefik](https://containo.us/traefik/), which will serve as a load balancer in front of multiple instances of the web app.

In later challenges, you will learn about Nomad Spread, Constraints, and Affinities.

---

In this challenge, you will run two Nomad jobs:
* The first will deploy 6 instances of a web app.
* The second will run Traefik as a load balancer for the web app.

## Inspect the webapp.nomad Job.

Let's begin by inspecting the Nomad jobs and getting familiar with what you're going to deploy.

Inspect the "`webapp.nomad`" job specification file.

In [149]:
mkdir -p /tmp/nomad_terraform/jobs
cat > /tmp/nomad_terraform/jobs/webapp.nomad <<-EOF
job "webapp" {
  datacenters = ["dc1"]
  group "webapp" {
    count = 6
    network {
      port  "http" {}
    }
    task "server" {
      env {
        PORT    = "\${NOMAD_PORT_http}"
        NODE_IP = "\${NOMAD_IP_http}"
      }
      driver = "docker"
      config {
        image = "hashicorp/demo-webapp-lb-guide"
        ports = ["http"]
      }
      resources {
        cpu    = 20
        memory = 678
      }
      service {
        name = "webapp"
        port = "http"
        tags = [
          "traefik.tags=service",
          "traefik.frontend.rule=PathPrefixStrip:/myapp",
        ]
        check {
          type     = "http"
          path     = "/"
          interval = "2s"
          timeout  = "2s"
        }
      }
    }
  }
}
EOF

- This will deploy 6 instances of our web app to your Nomad cluster since the `count` of the "`webapp`" task group is set to 6.
- Note, however, that we have not yet used any of the job placement stanzas mentioned in this track's description.
    - So, Nomad is free to place the 6 instances wherever it wants.

Since the job specification does not specify a static port to use, Nomad will select a dynamic port for each web app instance.
- This allows us to run more than one instance of the web app on each Nomad client.
- In contrast, if we had specified a static port, we could only have run one instance per Nomad client.

Since we are using dynamic ports, each instance of the web app has to listen on the right port.
- The job enables them to do that with [variable interpolation](https://nomadproject.io/docs/runtime/interpolation/)
- the job sets the `PORT` and `NODE_IP` environment variables to `${NOMAD_PORT_http}` and `${NOMAD_IP_http}` respectively.
- When each instance of the web app starts, it can read those environment variables and bind to the correct IP and port.
    - This is achieved in combination with the [port parameters](https://nomadproject.io/docs/job-specification/network/#port-parameters) in the network stanza of the job specification.

## Run the webapp.nomad Job

Navigate to the `/tmp/nomad_terraform/jobs` directory:

In [150]:
cd /tmp/nomad_terraform/jobs

Run the "`webapp.nomad`" job with this command on the "`Server`" tab:

In [151]:
nomad job run webapp.nomad > /tmp/nomad_job_run.txt 2>&1 &

[1] 12362


In [152]:
tail -n 50 /tmp/nomad_job_run.txt

==> 2022-09-07T21:50:16-07:00: Monitoring evaluation "e3b43658"
    2022-09-07T21:50:16-07:00: Evaluation triggered by job "webapp"
    2022-09-07T21:50:16-07:00: Evaluation within deployment: "d08ba604"
    2022-09-07T21:50:16-07:00: Allocation "81407a40" created: node "160234a4", group "webapp"
    2022-09-07T21:50:16-07:00: Allocation "a821c6ba" created: node "22f005c9", group "webapp"
    2022-09-07T21:50:16-07:00: Allocation "f2906fdc" created: node "84c13196", group "webapp"
    2022-09-07T21:50:16-07:00: Allocation "33d4b95f" created: node "086e619b", group "webapp"
    2022-09-07T21:50:16-07:00: Allocation "464cd457" created: node "160234a4", group "webapp"
    2022-09-07T21:50:16-07:00: Allocation "577157b5" created: node "160234a4", group "webapp"
    2022-09-07T21:50:16-07:00: Evaluation status changed: "pending" -> "complete"
==> 2022-09-07T21:50:16-07:00: Evaluation "e3b43658" finished with status "complete"
==> 2022-09-07T21:50:16-07:00: Monitoring deployment "d08ba604"
 

This should return something like this:

```
==> Monitoring evaluation "a05672bc"
Evaluation triggered by job "webapp"
Evaluation within deployment: "5692a28d"
Allocation "6bc9d9e6" created: node "33fd8505", group "webapp"
Allocation "1b90c684" created: node "3006bb6d", group "webapp"
Allocation "56b0671c" created: node "2f4a35ac", group "webapp"
Evaluation status changed: "pending" -> "complete"
==> Evaluation "a05672bc" finished with status "complete"
```

You can check the status of the job by selecting the "`webapp`" job on the "Nomad UI" tab.

In [153]:
printf "%s\n" "Nomad UI: $NOMAD_ADDR"

Nomad UI: http://pphan-server-lb-992086560.us-west-2.elb.amazonaws.com:4646


After about 1 minute, you should see that the job has 6 healthy allocations, each representing a single instance of the web app.

Please also check the status of the job with the Nomad CLI:

In [154]:
nomad job status webapp

[0mID            = webapp
Name          = webapp
Submit Date   = 2022-09-07T21:50:16-07:00
Type          = service
Priority      = 50
Datacenters   = dc1
Namespace     = default
Status        = running
Periodic      = false
Parameterized = false[0m
[0m
[1mSummary[0m[0m[0m
[0mTask Group  Queued  Starting  Running  Failed  Complete  Lost  Unknown
webapp      0       0         6        0       0         0     0[0m
[0m
[1mLatest Deployment[0m[0m[0m
[0mID          = d08ba604
Status      = running
Description = Deployment is running

[1mDeployed[0m
Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
webapp      6        6       1        0          2022-09-08T05:00:39Z[0m[0m
[0m
[1mAllocations[0m[0m[0m
[0mID        Node ID   Task Group  Version  Desired  Status   Created  Modified
33d4b95f  086e619b  webapp      0        run      running  25s ago  1s ago
464cd457  160234a4  webapp      0        run      running  25s ago  8s ago
577157b5  160234a4  webap

You can also inspect the "`Consul UI`" tab to see the health of the web app instances that have all been registered as services in Consul.

In [None]:
printf "%s\n" "Consul UI: $CONSUL_HTTP_ADDR"

- Click on the "`webapp`" service.
    - Note how the instances are spread across the clients.
- They might or might not be evenly distributed since we did not specify any job placement stanzas.

## Inspect the traefik.nomad Job

Inspect the "`traefik.nomad`" job specification file.

In [155]:
cat > /tmp/nomad_terraform/jobs/traefik.nomad <<-EOF
job "traefik" {
  region      = "global"
  datacenters = ["dc1"]
  type        = "service"
  group "traefik" {
    count = 1
    network {
      port "http" {
        static = 8080
      }
      port "api" {
        static = 8081
      }
    }
    task "traefik" {
      driver = "docker"
      config {
        image        = "traefik:1.7"
        network_mode = "host"
        volumes = [
          "local/traefik.toml:/etc/traefik/traefik.toml",
        ]
      }
      template {
        data = <<EOD
[entryPoints]
    [entryPoints.http]
    address = ":8080"
    [entryPoints.traefik]
    address = ":8081"
[api]
    dashboard = true
# Enable Consul Catalog configuration backend.
[consulCatalog]
endpoint = "127.0.0.1:8500"
domain = "consul.localhost"
prefix = "traefik"
constraints = ["tag==service"]
EOD
        destination = "local/traefik.toml"
      }
      resources {
        cpu    = 250
        memory = 128
      }
      service {
        name = "traefik"
        check {
          name     = "alive"
          type     = "tcp"
          port     = "http"
          interval = "10s"
          timeout  = "2s"
        }
      }
    }
  }
}
EOF

[1]+  Done                    nomad job run webapp.nomad > /tmp/nomad_job_run.txt 2>&1


What does this job file do?
- This will deploy a Docker container that runs Traefik
- Traefik proxies all requests to the web app instances on port `8080` to their dynamic ports allocated by Nomad.
- This job uses Nomad's [template](https://www.nomadproject.io/docs/job-specification/template/) stanza to write out a Traefik configuration file, "`traefik.toml`"
    - Traefik will read this when started.
    - The template includes Traefik's [constraints config](https://docs.traefik.io/providers/consul-catalog/#constraints) for Consul's services catalog with this setting:

```
constraints = ["tag==service"]
```

If you look back at the "`webapp.nomad`" job specification, on line 34 you will see that the same service tag was specified in the `tags` section of the registration of the web app with Consul.

```
tags = [
"traefik.tags=service",
"traefik.frontend.rule=PathPrefixStrip:/myapp",
]
```

We think it's pretty cool that:
- Nomad deploys both jobs
- and registers them as Consul services
- and that Traefik then uses the registrations of the web app instances with Consul to determine how to direct traffic to them.

## Run the traefik.nomad Job

Run the "`traefik.nomad`" job with this command on the "`Server`" tab:

In [156]:
nomad job run traefik.nomad > /tmp/nomad_job_run_traefik.txt 2>&1 &

[1] 12380


In [157]:
time head -n 10 /tmp/nomad_job_run_traefik.txt

==> 2022-09-07T21:50:57-07:00: Monitoring evaluation "9eeacc3a"
    2022-09-07T21:50:57-07:00: Evaluation triggered by job "traefik"
==> 2022-09-07T21:50:58-07:00: Monitoring evaluation "9eeacc3a"
    2022-09-07T21:50:58-07:00: Evaluation within deployment: "e75cc2d8"
    2022-09-07T21:50:58-07:00: Allocation "acfa4d6d" created: node "160234a4", group "traefik"
    2022-09-07T21:50:58-07:00: Evaluation status changed: "pending" -> "complete"
==> 2022-09-07T21:50:58-07:00: Evaluation "9eeacc3a" finished with status "complete"
==> 2022-09-07T21:50:58-07:00: Monitoring deployment "e75cc2d8"
    
2022-09-07T21:50:58-07:00

real	0m0.001s
user	0m0.001s
sys	0m0.000s


This should return something like this:

```
==> Monitoring evaluation "6765c131"
    Evaluation triggered by job "traefik"
    Evaluation within deployment: "d15e6190"
    Allocation "0e36e38a" created: node "44d88b4b", group "traefik"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "6765c131" finished with status "complete"
```

As before, you can check the status of the job by selecting the "`traefik`" job on the "Nomad UI" tab.

Check the status of the job with the Nomad CLI by running this command on the "`Server`" tab:

In [158]:
nomad job status traefik

[0mID            = traefik
Name          = traefik
Submit Date   = 2022-09-07T21:50:57-07:00
Type          = service
Priority      = 50
Datacenters   = dc1
Namespace     = default
Status        = running
Periodic      = false
Parameterized = false[0m
[0m
[1mSummary[0m[0m[0m
[0mTask Group  Queued  Starting  Running  Failed  Complete  Lost  Unknown
traefik     0       0         1        0       0         0     0[0m
[0m
[1mLatest Deployment[0m[0m[0m
[0mID          = e75cc2d8
Status      = running
Description = Deployment is running

[1mDeployed[0m
Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
traefik     1        1       0        0          2022-09-08T05:00:57Z[0m[0m
[0m
[1mAllocations[0m[0m[0m
[0mID        Node ID   Task Group  Version  Desired  Status   Created  Modified
acfa4d6d  160234a4  traefik     0        run      running  11s ago  7s ago[0m


**NEED TO MAKE APPLICABLE OUTSIDE OF INSTRUQT**

Unfortunately, you cannot load the web app or Traefik UIs yet because we have not exposed Instruqt tabs for them.
- In fact, we would have had to add tabs exposing port `8081` on all 3 Nomad clients in order to expose the Traefik dashboard since we could not predict in advance which Nomad client Traefik would be deployed to with the current "`traefik.nomad`" job specification. We will fix this in the next challenge.


---

- slug: use-constraint

# Use Nomad's Constraint Stanza

teaser: |

Use Nomad's constraint stanza to tightly control the placement of the Traefik job.

## notes:

In this challenge, you will update the Traefik job to run on a specific Nomad client node so that you can visit the Traefik Dashboard on a new Instruqt tab.

You will do this by using Nomad's [constraint](https://www.nomadproject.io/docs/job-specification/constraint/) stanza that allows Nomad operators to tightly control the placement of a job's allocations.

assignment:

In this challenge, you will use Nomad's [constraint](https://www.nomadproject.io/docs/job-specification/constraint/) stanza to restrict Traefik to run on a specific Nomad client.

We will be using a constraint that filters on a [node variable](https://www.nomadproject.io/docs/runtime/interpolation/#node-variables) of the Nomad client nodes, but you could also use [client metadata](https://www.nomadproject.io/docs/configuration/client#custom-metadata-network-speed-and-node-class).

Please navigate back to the `/tmp/nomad_terraform/jobs` directory on the "`Server`" tab:

In [159]:
cd /tmp/nomad_terraform/jobs

## Edit the traefik.nomad Job Specification

Edit the "`traefik.nomad`" job specification file on the "`Jobs`" tab, making the following changes:

First, grab a node to deploy the job to.

In [160]:
TRAEFIK_NODE=$(nomad node status | tail -n 1 | awk '{print $3}')
echo $TRAEFIK_NODE

[1]+  Done                    nomad job run traefik.nomad > /tmp/nomad_job_run_traefik.txt 2>&1
ip-172-31-33-53


Find the line with `count = 1` and add the following constraint stanza after it:

```go
constraint {
  attribute = "${node.unique.name}"
  value     = "client1"
}
```

If you prefer, you can do the editing with this command on the "`Server`" tab:

In [161]:
mv traefik.nomad.bak_$(date +%Y%m%d) traefik.nomad
ls -lrt
sed -i".bak_$(date +%Y%m%d)" "s/count = 1/count = 1\n\n\
      constraint { \n \
        attribute = \"\${node.unique.name}\"\n \
        value     = \"${TRAEFIK_NODE}\"\n \
      }/g" \
  traefik.nomad
cat traefik.nomad

mv: cannot stat 'traefik.nomad.bak_20220907': No such file or directory
total 8
-rw-rw-r-- 1 pephan pephan  750 Sep  7 21:49 webapp.nomad
-rw-rw-r-- 1 pephan pephan 1155 Sep  7 21:50 traefik.nomad
job "traefik" {
  region      = "global"
  datacenters = ["dc1"]
  type        = "service"
  group "traefik" {
    count = 1

      constraint { 
         attribute = "${node.unique.name}"
         value     = "ip-172-31-33-53"
       }
    network {
      port "http" {
        static = 8080
      }
      port "api" {
        static = 8081
      }
    }
    task "traefik" {
      driver = "docker"
      config {
        image        = "traefik:1.7"
        network_mode = "host"
        volumes = [
          "local/traefik.toml:/etc/traefik/traefik.toml",
        ]
      }
      template {
        data = <<EOD
[entryPoints]
    [entryPoints.http]
    address = ":8080"
    [entryPoints.traefik]
    address = ":8081"
[api]
    dashboard = true
# Enable Consul Catalog configuration backend.
[cons

## Re-run the traefik.nomad Job

Next, re-run the "`traefik.nomad`" job with this command on the "Server" tab:

In [162]:
time nomad job run traefik.nomad > /tmp/nomad_job_run_traefik.txt 2>&1 &

[1] 12414


In [163]:
cat /tmp/nomad_job_run_traefik.txt | (head ; tail)


==> 2022-09-07T21:51:38-07:00: Monitoring evaluation "85b2258e"
    2022-09-07T21:51:38-07:00: Evaluation triggered by job "traefik"
==> 2022-09-07T21:51:39-07:00: Monitoring evaluation "85b2258e"
    2022-09-07T21:51:39-07:00: Evaluation within deployment: "b34278a2"
    2022-09-07T21:51:39-07:00: Allocation "acfa4d6d" modified: node "160234a4", group "traefik"
    2022-09-07T21:51:39-07:00: Evaluation status changed: "pending" -> "complete"
==> 2022-09-07T21:51:39-07:00: Evaluation "85b2258e" finished with status "complete"
==> 2022-09-07T21:51:39-07:00: Monitoring deployment "b34278a2"
    
2022-09-07T21:51:39-07:00


This should return something like this:<br>
```
==> Monitoring evaluation "63a2e467"
    Evaluation triggered by job "traefik"
    Evaluation within deployment: "662516d9"
    Allocation "b42c964c" created: node "99187f90", group "traefik"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "63a2e467" finished with status "complete"
```

- Look at the "`traefik`" job in the Nomad UI
  - You will see that there is 1 allocation currently `running`.
  - Click on the ID of that allocation in the `Client` column.
  - You will be taken to the "`client1`" node.

This shows that the `constraint` worked as desired.

In [None]:
#DEBUGGING
nomad node status
nomad job status traefik

### **NEED TO ADD RULE TO ALLOW ALL FROM MY IP**

## Verify

Now, you can visit the Traefik dashboard on the "`Traefik UI`" tab.

In [168]:
TRAEFIK_NODE_PUBLIC=$(aws ec2 describe-instances \
  | jq -r ".Reservations[].Instances[] \
  | select(.PrivateDnsName | contains(\"${TRAEFIK_NODE}\")) | .PublicIpAddress")

echo Traefik UI: http://${TRAEFIK_NODE_PUBLIC}:8081

Traefik UI: http://34.216.150.37:8081


- You can see the URLs for the 6 instances of the web app that it has registered.
- This tab was pre-configured to point to the `nomad-client-1` tab since we knew in advance the `constraint` you would use.
- It accesses that node on port `8081` which is Traefik's admin port.

Right-click any of those URLs in the "`backend-webapp`" table
- Select "`Copy Link Address`".
- Then run a command like this
```shell
curl <your_url>
```
    - where `<your_url>` is the URL you copied.

In [174]:
ssh -i /tmp/Nomad/ssh_key/aws-key-pair \
  -o "StrictHostKeyChecking no" ubuntu@${TRAEFIK_NODE_PUBLIC} \
  curl http://172.31.33.53:22616

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:Welcome! You are on node 172.31.33.53:22616
100    44  100    44    0     0  65281      0 --:--:-- --:--:-- --:--:-- 44000


- You should see something like this:<br>
`Welcome! You are on node 10.132.0.66:20478`

By specifying the IP and the port that Nomad dynamically selected, you are hitting one of the webapp allocations directly just as Traefik does.

Next, run the following `curl` command:

In [175]:
for i in {1..6}; do
curl http://${TRAEFIK_NODE_PUBLIC}:8080/myapp
done
echo http://${TRAEFIK_NODE_PUBLIC}:8080/myapp

Welcome! You are on node 172.31.36.154:28165
Welcome! You are on node 172.31.33.53:22616
Welcome! You are on node 172.31.33.53:25028
Welcome! You are on node 172.31.33.53:26874
Welcome! You are on node 172.31.36.101:20772
Welcome! You are on node 172.31.36.154:27310
http://34.216.150.37:8080/myapp


This will return a similar message.

In this case, you are actually hitting Traefik on Nomad client 1 and it is load balancing your request to one of the 6 webapp instances. If you repeat the command a few times, you will see that the IP and port returned are different each time.

You can also visit the web app on the "`Web App UI`" tab.
- This tab also points to the `nomad-client-1` node but listens on port `8080` which is what Traefik is using to load balance requests to the web app.
- You will see the same message that the `curl` command gave.
- If you click the Instruqt refresh button (clockwise arrow) to the right of the "`Web App UI`" tab, the IP and port displayed will also change.

In the next challenge, you will use Nomad's spread stanza to distribute the allocations of your "`webapp.nomad`" job evenly across your 3 Nomad clients.

tabs:
- title: Traefik UI
type: service
hostname: nomad-client-1
port: 8081
- title: Web App UI
type: service
hostname: nomad-client-1
path: /myapp
port: 8080


- slug: use-spread
  
# Use Nomad's Spread stanza

  teaser: |
    Use Nomad's spread stanza to distribute load evenly across your Nomad clients.

notes:

In this challenge, you will update the logic that Nomad uses to distribute allocations of the web app to the 3 Nomad clients in your cluster.

Specifically, you will use the [spread](https://www.nomadproject.io/docs/job-specification/spread/) stanza to evenly distribute allocations of the web app across all 3 Nomad clients.

assignment: |-

In this challenge, you will use Nomad's [spread](https://www.nomadproject.io/docs/job-specification/spread/) stanza to spread the "`webapp.nomad`" job's allocations evenly across the 3 Nomad clients of your cluster.

This demonstrates how Nomad can increase the failure tolerance of applications.

The `spread` stanza allows operators to spread allocations over datacenters, availability zones, or even racks in a physical datacenter. By default, when using `spread`, the scheduler will attempt to place allocations equally among the available values of the given target.

## Edit the webapp.nomad Job

Navigate back to the `/tmp/nomad/jobs` directory on the "`Server`" tab:

In [177]:
cd /tmp/nomad_terraform/jobs

Edit the "`webapp.nomad`" job specification file on the "`Jobs`" tab, making the following changes:

Find the line that has `count = 6` and add the following spread stanza after it:

```go
spread {
  attribute = "${node.unique.name}"
}
```

If you prefer, you can do the editing with this command on the "`Server`" tab:

In [178]:
mv webapp.nomad.bak_$(date +%Y%m%d) webapp.nomad
ls -lrt

sed -i".bak_$(date +%Y%m%d)" \
  's/count = 6/count = 6\n\
    spread { \
      attribute = "${node.unique.name}" \
    }/g' \
  webapp.nomad
head -n 10  webapp.nomad

mv: cannot stat 'webapp.nomad.bak_20220907': No such file or directory
total 12
-rw-rw-r-- 1 pephan pephan  750 Sep  7 21:49 webapp.nomad
-rw-rw-r-- 1 pephan pephan 1155 Sep  7 21:50 traefik.nomad.bak_20220907
-rw-rw-r-- 1 pephan pephan 1267 Sep  7 21:51 traefik.nomad
job "webapp" {
  datacenters = ["dc1"]
  group "webapp" {
    count = 6

    spread { 
      attribute = "${node.unique.name}" 
    }
    network {
      port  "http" {}


**NOTE**:
- We do not specify a `value` the way we did in the `constraint` stanza in the last challenge.
- The whole point here is to spread allocations evenly across all Nomad clients based on their names.

**PP - ADD MORE INFO**

You can view the current allocations for the "`webapp`" job.
- select the `webapp` job in the Nomad UI
- click on the "`Allocations`" tab under the job
- In the Nomad UI, focus on the "`Client`" column.
- CLI
    - or by running `nomad job status webapp`
    - and looking at the "`Allocations`" section at the bottom of the output.
    ```text
    ...
    Allocations
    ID        Node ID   Task Group  Version  Desired  Status   Created    Modified
    36655ee2  69bb47a1  webapp      0        run      running  13m8s ago  12m29s ago
    37ea648c  71670de6  webapp      0        run      running  13m8s ago  12m32s ago
    49874897  f3d852dd  webapp      0        run      running  13m8s ago  12m33s ago
    73498c1a  71670de6  webapp      0        run      running  13m8s ago  12m31s ago
    90263f17  69bb47a1  webapp      0        run      running  13m8s ago  12m30s ago
    cddf807a  69bb47a1  webapp      0        run      running  13m8s ago  12m28s ago    
    ```
    - Focus on the "`Node ID`" column. The allocations might or might not be evenly distributed across the 3 Nomad clients

## Re-run the webapp.nomad job

Now let's re-run the "`webapp.nomad`" job and see the changes that occur:

In [179]:
time nomad job run webapp.nomad > /tmp/nomad_job_run.txt 2>&1 &

[1] 12580


In [182]:
(head -n 10 ; echo ; tail -n 15) < /tmp/nomad_job_run.txt

==> 2022-09-07T22:06:36-07:00: Monitoring evaluation "0a89b585"
    2022-09-07T22:06:36-07:00: Evaluation triggered by job "webapp"
==> 2022-09-07T22:06:37-07:00: Monitoring evaluation "0a89b585"
    2022-09-07T22:06:37-07:00: Evaluation within deployment: "3c970f9e"
    2022-09-07T22:06:37-07:00: Allocation "0a5bddcf" created: node "84c13196", group "webapp"
    2022-09-07T22:06:37-07:00: Evaluation status changed: "pending" -> "complete"
==> 2022-09-07T22:06:37-07:00: Evaluation "0a89b585" finished with status "complete"
==> 2022-09-07T22:06:37-07:00: Monitoring deployment "3c970f9e"
    
2022-09-07T22:06:37-07:00


Deployed
Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
webapp      6        2       1        0          2022-09-08T05:16:48Z
    
2022-09-07T22:07:01-07:00
ID          = 3c970f9e
Job ID      = webapp
Job Version = 1
Status      = running
Description = Deployment is running

Deployed
Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
we

- Monitor the new deployment of the job in the Nomad UI
- Or by periodically re-running `nomad job status webapp`.
- Pay particular attention to `running` allocations.

In [183]:
nomad job status webapp

[0mID            = webapp
Name          = webapp
Submit Date   = 2022-09-07T22:06:36-07:00
Type          = service
Priority      = 50
Datacenters   = dc1
Namespace     = default
Status        = running
Periodic      = false
Parameterized = false[0m
[0m
[1mSummary[0m[0m[0m
[0mTask Group  Queued  Starting  Running  Failed  Complete  Lost  Unknown
webapp      0       1         5        0       3         1     0[0m
[0m
[1mLatest Deployment[0m[0m[0m
[0mID          = 3c970f9e
Status      = running
Description = Deployment is running

[1mDeployed[0m
Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
webapp      6        3       2        0          2022-09-08T05:17:03Z[0m[0m
[0m
[1mAllocations[0m[0m[0m
[0mID        Node ID   Task Group  Version  Desired  Status    Created     Modified
84399f09  22f005c9  webapp      1        run      pending   1s ago      1s ago
36fcc5d1  84c13196  webapp      1        run      running   16s ago     3s ago
0a5bddcf  84c

After all six allocations are healthy, you should see 2 webapp allocations on each Nomad client.

Sample Output
```shell
Allocations
ID        Node ID   Task Group  Version  Desired  Status    Created     Modified
6d4e5a11  69bb47a1  webapp      1        run      running   8m ago      7m48s ago
83dd87eb  71670de6  webapp      1        run      running   8m16s ago   8m1s ago
c74711fe  f3d852dd  webapp      1        run      running   8m29s ago   8m17s ago
2e6adccf  69bb47a1  webapp      1        run      running   8m44s ago   8m31s ago
1fdb24b9  71670de6  webapp      1        run      running   8m58s ago   8m45s ago
a1340eb4  f3d852dd  webapp      1        run      running   9m13s ago   9m ago
36655ee2  69bb47a1  webapp      0        stop     complete  23m33s ago  8m ago
73498c1a  71670de6  webapp      0        stop     complete  23m33s ago  8m58s ago
90263f17  69bb47a1  webapp      0        stop     complete  23m33s ago  8m43s ago
37ea648c  71670de6  webapp      0        stop     complete  23m33s ago  8m15s ago
49874897  f3d852dd  webapp      0        stop     complete  23m33s ago  8m29s ago
cddf807a  69bb47a1  webapp      0        stop     complete  23m33s ago  9m13s ago
```

This shows that the `spread` stanza caused Nomad to spread the allocations evenly as expected.

In the next challenge, you will use the `affinity` stanza to express your preference on where Nomad should run the webapp allocations.

---

- slug: use-affinity

# Use Nomad's Affinity Stanza

teaser: |
Use Nomad's affinity stanza to to loosely control the placement of jobs.

## notes:

In this challenge, you will use Nomad's [affinity](https://www.nomadproject.io/docs/job-specification/affinity/) stanza to loosely control the placement of the "`webapp`" job.

You will specify a preference on where Nomad should run the job's allocations but let Nomad make the final decision which will factor in your affinity preferences along with Nomad's default job anti-affinity and bin packing algorithms.

assignment:

In this challenge, you will use Nomad's [affinity](https://www.nomadproject.io/docs/job-specification/affinity/) stanza to loosely control the placement of the "`webapp`" job. You will specify a preference on where Nomad should run the job's allocations but let Nomad make the final decision.

The `affinity` stanza allows operators to express placement preference for a set of nodes. Affinities may be expressed on attributes or client metadata. Additionally, affinities may be specified at the `job`, `group`, or `task` levels for ultimate flexibility.

For this challenge we will be utilizing the underlying host machine type to choose where to run the allocations of the "`webapp`" job. The machine types are as follows:

| client | type |
| ------- | --- |
| client1 | n1-standard-2
| client2 | n1-standard-1
| client3 | n1-standard-1

## Edit the webapp.nomad Job

Please navigate back to the `/root/nomad/jobs` directory on the "`Server`" tab:

In [184]:
cd /tmp/nomad_terraform/jobs

Edit the "`webapp.nomad`" job specification file on the "`Jobs`" tab, making the following changes:

Find the spread stanza and replace it with the following affinity stanza:
```go
affinity {
  attribute = "${attr.platform.gce.machine-type}"
  value     = "n1-standard-2"
  weight    = 100
}
```

This tells Nomad that you would like it to deploy all allocations of the "`webapp`" job to the "`client1`" node
- since that is the only Nomad client using the "`n1-standard-2`" machine type.
- We have set the `weight` of the affinity stanza to the highest possible value, `100`.

If you prefer, you can do the editing with these commands on the "`Server`" tab:

In [None]:
mv webapp.nomad.bak2_$(date +%Y%m%d) webapp.nomad
sed -i".bak2_$(date +%Y%m%d)" '6,8d' webapp.nomad   #delete spread

sed -i \
  's/count = 6/count = 6\n\
      affinity { \
        attribute = "${attr.platform.gce.machine-type}" \
        value     = "n1-standard-2" \
        weight    = 100 \
      }/g' \
  webapp.nomad
head -n 20 webapp.nomad

**NOTE**: Negative weights can be specified to indicate "`anti-affinities`".

To make it easier to track the new deployment of the "`webapp`" job, let's first stop it with this command:

In [None]:
nomad job stop -purge webapp

This will completely remove the "webapp" job from the list of jobs in the Nomad UI.

## Re-run the webapp.nomad job with affinity

Now, let's re-run the "`webapp`" job again:

In [None]:
time nomad job run webapp.nomad > /tmp/nomad_job_run.txt 2>&1 &

In [None]:
(head -n 10;tail -n 10) < /tmp/nomad_job_run.txt
# cat /tmp/nomad_job_run.txt | (sed -u 10q; echo; tail -n 2)

The job should deploy 6 new allocations, but probably will not deploy all of them to the "`client1`" node as you had requested.

You can check where the allocations were actually deployed by inspecting the "`webapp`" job in the the Nomad UI and looking at the "`Allocations`" tab of the job.
- You can sort the allocations by clicking on the "`Status`" column header until all the running allocations are at the top.

Check its status with the Nomad CLI using this command:

In [None]:
nomad job status webapp

Then, get detailed information on how Nomad decided where to deploy one of the allocations that was deployed to `client1`:
```shell
nomad alloc status -verbose <alloc>
```

In [None]:
nomad alloc status -verbose $(nomad job status webapp | tail -n 5 \
  | sort -k 2 | sed -n '3p' | awk '{print $1}')

- replacing `<alloc>` with one of the allocation IDs in the first column of the "`Allocations`" section at the bottom of the output for which the corresponding Node ID matches the ID of the "`client1`" node.
- You can determine the Node ID of `client1` on the `Clients` section of the Nomad UI.

If you look at the "`Placement Metrics`" section at the bottom, you will see various scores for each of the 3 Nomad clients. See this [section](https://www.nomadproject.io/docs/job-specification/affinity/#example-placement-metadata) for an explanation of the scores.

Sample Output
```text
Placement Metrics
Node                    binpack  job-anti-affinity  node-affinity  node-reschedule-penalty  final score
e4818dac-a9a3-500c-...  0.458    -0.833             1              0                        0.208
0acc8763-ee83-4ad6-...  0.203    0                  0              0                        0.203
e42317e7-510d-95a0-...  0.203    0                  0              0                        0.203
```

The placement score is affected by the following factors.

- `bin-packing` - Scores nodes according to how well they fit requirements.
    - Optimizes for using minimal number of nodes.
- `job-anti-affinity` - A penalty added for additional instances of the same job on a node, used to avoid having too many instances of a job on the same node.
- `node-reschedule-penalty` - Used when the job is being rescheduled.
    - Nomad adds a penalty to avoid placing the job on a node where it has failed to run before.
- `node-affinity` - Used when the criteria specified in the affinity stanza matches the node.

There are several reasons why Nomad might not deploy all allocations according to your `affinity` stanza's preferences:
* Nomad automatically applies a job anti-affinity rule which discourages co-locating multiple instances of a task group.
* Nomad applies a bin packing algorithm that attempts to optimize the resource utilization and density of applications in order to leave large blocks of resources available on some Nomad clients in case a future job attempts to schedule allocations that require large amounts of memory and CPU.

You can read more about both of these concepts in Nomad's [Scheduling](https://www.nomadproject.io/docs/internals/scheduling/scheduling/) documentation.

Congratulations on completing the Nomad Advanced Job Placement track!

### Command Summary

In [None]:
nomad job stop -purge webapp
nomad job run webapp.nomad
nomad job status webapp
nomad alloc status -verbose $(nomad job status webapp | tail -n 5 \
  | sort -k 2 | sed -n '3p' | awk '{print $1}')

# Clean Up

### terraform destroy

In [None]:
pushd /tmp/nomad_terraform/aws/env
time terragrunt destroy > /tmp/tf_destroy_nomad_out.txt 2>&1 &
popd

In [None]:
tail -n 50 /tmp/tf_destroy_nomad_out.txt

In [None]:
rm /tmp/Nomad/cluster/terraform.tfstate*

# Q&A

Hi SMEs! Is there any documentation around the behavior of updating the [scheduling algorithm](https://www.nomadproject.io/api-docs/operator/scheduler#update-scheduler-configuration) on a Nomad cluster from binpack to spread. Specifically, are allocs spread when redeployed or does Nomad try to balance things proactively after the setting is applied? :thank-you-2:



Correct, the scheduling change is only forward looking. The best answer to get things spread after the fact is a set of rolling-drains.
:thank-you-2:
1



Daniel Santos
:knife_fork_plate:  1 day ago
:this-really:  We did this in our Prod Nomad clusters a couple weeks ago and the spread behaviour applies only after new allocs / evals are placed