## Organizing pods with labels

At this point, you have two pods running in your cluster. When deploying actual
applications, most users will end up running many more pods. As the number of
pods increases, the need for categorizing them into subsets becomes more and
more evident.


For example, with microservices architectures, the number of deployed microservices
can easily exceed 20 or more. Those components will probably be replicated
(multiple copies of the same component will be deployed) and multiple versions or
releases (stable, beta, canary, and so on) will run concurrently. This can lead to hundreds
of pods in the system. Without a mechanism for organizing them, you end up
with a big, incomprehensible mess, such as the one shown in figure 3.6. The figure
shows pods of multiple microservices, with several running multiple replicas, and others
running different releases of the same microservice.


It’s evident you need a way of organizing them into smaller groups based on arbitrary
criteria, so every developer and system administrator dealing with your system can easily
see which pod is which. And you’ll want to operate on every pod belonging to a certain
group with a single action instead of having to perform the action for each pod
individually.

**Organizing pods and all other Kubernetes objects is done through labels.**

### Introducing labels

Labels are a simple, yet incredibly powerful, Kubernetes feature for organizing not
only pods, but all other Kubernetes resources. A label is an **arbitrary key-value pair** you
attach to a resource, which is then utilized when selecting resources using label selectors
(resources are filtered based on whether they include the label specified in the selector).
A resource can have more than one label, as long as the keys of those labels are
unique within that resource. You usually attach labels to resources when you create
them, but you can also add additional labels or even modify the values of existing
labels later without having to recreate the resource.

Let’s turn back to the microservices example from figure 3.6. By adding labels to
those pods, you get a much-better-organized system that everyone can easily make
sense of. Each pod is labeled with two labels:
- app, which specifies which app, component, or microservice the pod belongs to.
- rel, which shows whether the application running in the pod is a stable, beta,
or a canary release.

> DEFINITION A canary release is when you deploy a new version of an application
next to the stable version, and only let a small fraction of users hit the
new version to see how it behaves before rolling it out to all users. This prevents
bad releases from being exposed to too many users.

By adding these two labels, you’ve essentially organized your pods into two dimensions.

Every developer or ops person with access to your cluster can now easily see the system’s
structure and where each pod fits in by looking at the pod’s labels.

### Specifying labels when creating a pod

Now, you’ll see labels in action by creating a new pod with two labels. Create a new file
called kubia-manual-with-labels.yaml with the contents of the following listing.

```yml
apiVersion: v1
kind: Pod
metadata:
  name: kubia-manual-v2
  labels:
    creation_method: manual
    env: prod
spec:
  containers:
   - image: luksa/kubia
     name: kubia
     ports:
     - containerPort: 8080
       protocol: TCP
```

You’ve included the labels creation_method=manual and env=data.labels section.
You’ll create this pod now:

    kubectl apply -f ex02-kubia-manual-with-labels.yaml

The kubectl get pods command doesn’t list any labels by default, but you can see
them by using the --show-labels switch:

    kubectl get pod --show-labels

Instead of listing all labels, if you’re only interested in certain labels, you can specify
them with the -L switch and have each displayed in its own column. List pods again
and show the columns for the two labels you’ve attached to your kubia-manual-v2 pod:

    kubectl get pod -L creation_method,env

### Modifying labels of existing pods

Labels can also be added to and modified on existing pods. Because the kubia-manual
pod was also created manually, let’s add the creation_method=manual label to it:

    kubectl label pod kubia-manual creation_method=manual

Now, let’s also change the env=prod label to env=debug on the kubia-manual-v2 pod,
to see how existing labels can be changed.

> NOTE You need to use the --overwrite option when changing existing labels.

    kubectl label po kubia-manual-v2 env=debug --overwrite

List the pods again to see the updated labels:
    
    kubectl get po -L creation_method,env

As you can see, attaching labels to resources is trivial, and so is changing them on
existing resources. It may not be evident right now, but this is an incredibly powerful
feature, as you’ll see in the next chapter. But first, let’s see what you can do with these
labels, in addition to displaying them when listing pods.

## Listing subsets of pods through label selectors

Attaching labels to resources so you can see the labels next to each resource when listing
them isn’t that interesting. But labels go hand in hand with label selectors. Label
selectors allow you to select a subset of pods tagged with certain labels and perform an
operation on those pods. A label selector is a criterion, which filters resources based
on whether they include a certain label with a certain value.
A label selector can select resources based on whether the resource
- Contains (or doesn’t contain) a label with a certain key
- Contains a label with a certain key and value
- Contains a label with a certain key, but with a value not equal to the one you
specify

### Listing pods using a label selector

Let’s use label selectors on the pods you’ve created so far. To see all pods you created
manually (you labeled them with creation_method=manual), do the following:

    kubectl get po -l creation_method=manual
    
To list all pods that include the env label, whatever its value is:

    kubectl get po -l env

And those that don’t have the env label:

    kubectl get po -l '!env'

> NOTE Make sure to use single quotes around !env, so the bash shell doesn’t
evaluate the exclamation mark.

## Using labels and selectors to constrain pod scheduling

All the pods you’ve created so far have been scheduled pretty much randomly across
your worker nodes. As I’ve mentioned in the previous chapter, this is the proper way
of working in a Kubernetes cluster. Because Kubernetes exposes all the nodes in the
cluster as a single, large deployment platform, it shouldn’t matter to you what node a
pod is scheduled to. Because each pod gets the exact amount of computational
resources it requests (CPU, memory, and so on) and its accessibility from other pods
isn’t at all affected by the node the pod is scheduled to, usually there shouldn’t be any
need for you to tell Kubernetes exactly where to schedule your pods.

Certain cases exist, however, where you’ll want to have at least a little say in where
a pod should be scheduled. A good example is when your hardware infrastructure
isn’t homogenous. If part of your worker nodes have spinning hard drives, whereas
others have SSDs, you may want to schedule certain pods to one group of nodes and
the rest to the other. Another example is when you need to schedule pods performing
intensive GPU-based computation only to nodes that provide the required GPU
acceleration.

You never want to say specifically what node a pod should be scheduled to, because
that would couple the application to the infrastructure, whereas the whole idea of
Kubernetes is hiding the actual infrastructure from the apps that run on it. But if you
want to have a say in where a pod should be scheduled, instead of specifying an exact
node, you should describe the node requirements and then let Kubernetes select a
node that matches those requirements. This can be done through node labels and
node label selectors.

### Scheduling to one specific node

Similarly, you could also schedule a pod to an exact node, because each node also has
a unique label with the key kubernetes.io/hostname and value set to the actual hostname
of the node. But setting the nodeSelector to a specific node by the hostname
label may lead to the pod being unschedulable if the node is offline. You shouldn’t
think in terms of individual nodes. Always think about logical groups of nodes that satisfy
certain criteria specified through label selectors.

