# Labelling

Now that we have our input and out defined. For each corresponding input, we need to label the correct output target.

Labeling is one of the most important steps in the AI/ML lifecycle. A popular term is "garbage in, garbage out". The model is only as good as what you feed it.

Below is a simple NER labelling task


## Instructions

You are given titles of declared incidents.

We want to extract two key information from the titles if they exist

1. The service affected (SVC)
2. The environment affected (ENV)

The text will represented as a list of words.

You will label the first word of the relevant entity with: B-SVC or B-ENV

Any subsequent words will be labeled I-SVC or I-ENV

Unrelated words will be labeled 'O' by default.


Example:

```python
# proxy gateway is down in prod-us-east-3
labels = [
    [
        "B-SVC",  # proxy
        "I-SVC",  # gateway
        "O",  # is
        "O",  # down
        "O",  # in
        "B-ENV",  # prod-us-east-3
    ],
    ...
]
```

Note: this problem is known as a Named Entity Recognition (NER) problem.

In [1]:
data = [
    'Alertmanager cluster failed to send alerts',
    'alerting bypass dspermissions',
    'SM Probes Bad Deploy',
    'Grafana Loki improper scaledown',
    'Loki Prod Alerting',
    'Logs Production Authentication Issues',
    'Tempo Grafana 75 query',
    'Tempo dev-us-central-0 Writes',
    'OnCall Unable to Create Integrations due to Alertmanager',
    'Grafana OnCall sync_organization_async task is stuck',
]
labels = []

In [2]:
# Alertmanager cluster failed to send alerts
label1 = [
    "O",  # Alertmanager
    "O",  # cluster
    "O",  # failed
    "O",  # to
    "O",  # send
    "O",  # alerts
]
labels.append(label1)


In [3]:
# alerting bypass dspermissions
label2 = [
    "O",  # alerting
    "O",  # bypass
    "O",  # dspermissions
]
labels.append(label2)


In [4]:
# SM Probes Bad Deploy
label3 = [
    "O",  # SM
    "O",  # Probes
    "O",  # Bad
    "O",  # Deploy
]
labels.append(label3)


In [5]:
# Grafana Loki improper scaledown
label4 = [
    "O",  # Grafana
    "O",  # Loki
    "O",  # improper
    "O",  # scaledown
]
labels.append(label4)


In [6]:
# Loki Prod Alerting
label5 = [
    "O",  # Loki
    "O",  # Prod
    "O",  # Alerting
]
labels.append(label5)


In [7]:
# Logs Production Authentication Issues
label6 = [
    "O",  # Logs
    "O",  # Production
    "O",  # Authentication
    "O",  # Issues
]
labels.append(label6)


In [8]:
# Tempo Grafana 75 query
label7 = [
    "O",  # Tempo
    "O",  # Grafana
    "O",  # 75
    "O",  # query
]
labels.append(label7)

In [9]:
# Tempo dev-us-central-0 Writes
label8 = [
    "O",  # Tempo
    "O",  # dev-us-central-0
    "O",  # Writes
]
labels.append(label8)


In [10]:
# OnCall Unable to Create Integrations due to Alertmanager
label9 = [
    "O",  # OnCall
    "O",  # Unable
    "O",  # to
    "O",  # Create
    "O",  # Integrations
    "O",  # due
    "O",  # to
    "O",  # Alertmanager
]
labels.append(label9)


In [11]:
# Grafana OnCall sync_organization_async task is stuck
label10 = [
    "O",  # Grafana
    "O",  # OnCall
    "O",  # sync_organization_async
    "O",  # task
    "O",  # is
    "O",  # stuck
]
labels.append(label10)

In [12]:
actual = [
    ['B-SVC', 'O', 'O', 'O', 'O', 'O'],
    ['O', 'O', 'O'],
    ['B-SVC', 'O', 'O', 'O'],
    ['O', 'B-SVC', 'O', 'O'],
    ['B-SVC', 'B-ENV', 'O'],
    ['B-SVC', 'B-ENV', 'O', 'O'],
    ['B-SVC', 'B-SVC', 'O', 'O'],
    ['B-SVC', 'B-ENV', 'O'],
    ['B-SVC', 'O', 'O', 'O', 'O', 'O', 'O', 'B-SVC'],
    ['O', 'B-SVC', 'O', 'O', 'O', 'O'],
]

In [13]:
# calculate how many lines are correct
correct = [l == a for l, a in zip(labels, actual)]
print(sum(correct))

1


This may be a toy example but we can still some places where labelling can be challenging.

## 1. Context required:

There are several Grafana product/services that not everyone might be familiar with.

e.g. SM - Synthetic monitoring, Loki - log storage

### Strategy:

Try to reduce the context needed for the task. If that's not possible, provide context with the labels.

## 2. Multiple correct answers:

There are multiple correct ways to label.

e.g.
Labeling only "OnCall" as the service VS.
Labeling the entire "Grafana OnCall" as the service.

This might not be a big problem for us, but the model could get confused if the labeling is inconsistent.

If the model is consistent but the labels are not, we might be penalizing the model for doing the "right" thing.

### Strategy:

Take a look at the data first, do a test label run to see what might be some difficult cases. Put a strategy for how to be consistent in the guideline.


## 3. Scaling challenges:

Imagine labels hundred or thousands. Not only does it be come time consuming but the change or errors will increase.

### Strategy:

Have multiple annotators annotate every task, take the majority answer between anotators.

In [14]:
list(zip(data, actual))

[('Alertmanager cluster failed to send alerts',
  ['B-SVC', 'O', 'O', 'O', 'O', 'O']),
 ('alerting bypass dspermissions', ['O', 'O', 'O']),
 ('SM Probes Bad Deploy', ['B-SVC', 'O', 'O', 'O']),
 ('Grafana Loki improper scaledown', ['O', 'B-SVC', 'O', 'O']),
 ('Loki Prod Alerting', ['B-SVC', 'B-ENV', 'O']),
 ('Logs Production Authentication Issues', ['B-SVC', 'B-ENV', 'O', 'O']),
 ('Tempo Grafana 75 query', ['B-SVC', 'B-SVC', 'O', 'O']),
 ('Tempo dev-us-central-0 Writes', ['B-SVC', 'B-ENV', 'O']),
 ('OnCall Unable to Create Integrations due to Alertmanager',
  ['B-SVC', 'O', 'O', 'O', 'O', 'O', 'O', 'B-SVC']),
 ('Grafana OnCall sync_organization_async task is stuck',
  ['O', 'B-SVC', 'O', 'O', 'O', 'O'])]

Often times labeling is associated to training of a model. But the most important reason we label is actually to have data to evaluate the model.

Especially with GenAI where we don't really have the notion of training, developers often skip labeling all together and have no real idea how their model is actually doing.