# AIG230 NLP (Week 3 Lab) — Notebook 1: Text Representation

This notebook focuses on **turning raw text into numeric features** you can use in real-world ML systems.

You will build:
- a clean **train/test split**
- **Bag-of-Words** (binary and count)
- **Document-Term Matrix** (DTM)
- **TF-IDF** (with n-grams)
- **Hashing trick** (production-friendly)
- basic **retrieval** (cosine similarity) and a **baseline classifier**
- model **persistence** (save/load)

## 0) Setup


In [30]:
!pip install numpy pandas scikit-learn



In [31]:

import re
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer, HashingVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.metrics import classification_report, confusion_matrix
import joblib


## 1) A small, realistic dataset (you can replace with your own CSV)


In industry, text often comes with:
- an **ID**
- free-text **description**
- a **label** (category, priority, intent, topic) or a target (churn, fraud, etc.)

Here we create a toy dataset that looks like support tickets / ops incidents.  
Swap this section with a `pd.read_csv(...)` in your own workflows.


In [32]:

data = [
    ("T-001", "VPN keeps disconnecting every 10 minutes on Windows 11 after latest update", "network"),
    ("T-002", "Password reset link is expired and user cannot login to the portal", "auth"),
    ("T-003", "Email delivery delayed, outbound messages queued for hours", "messaging"),
    ("T-004", "Cannot install printer driver, installer fails with error code 1603", "device"),
    ("T-005", "MFA prompt never arrives on mobile app, user stuck at login", "auth"),
    ("T-006", "WiFi signal drops in meeting rooms, access point reboot helps temporarily", "network"),
    ("T-007", "Outlook search not returning results, index seems corrupted", "messaging"),
    ("T-008", "Laptop battery drains fast after BIOS update, power settings unchanged", "device"),
    ("T-009", "Portal shows 500 error when submitting form, happened after deployment", "app"),
    ("T-010", "API requests timing out, latency spike observed in last hour", "app"),
    ("T-011", "User cannot access shared drive, permission denied though in correct group", "auth"),
    ("T-012", "Teams calls have choppy audio, jitter high on corporate network", "network"),
    ("T-013", "Push notifications not working on Android for the app", "app"),
    ("T-014", "Mailbox is full and cannot receive emails, auto-archive not running", "messaging"),
    ("T-015", "Bluetooth mouse not pairing after restart, device shows as unknown", "device"),
]

df = pd.DataFrame(data, columns=["ticket_id", "text", "label"])
df


Unnamed: 0,ticket_id,text,label
0,T-001,VPN keeps disconnecting every 10 minutes on Wi...,network
1,T-002,Password reset link is expired and user cannot...,auth
2,T-003,"Email delivery delayed, outbound messages queu...",messaging
3,T-004,"Cannot install printer driver, installer fails...",device
4,T-005,"MFA prompt never arrives on mobile app, user s...",auth
5,T-006,"WiFi signal drops in meeting rooms, access poi...",network
6,T-007,"Outlook search not returning results, index se...",messaging
7,T-008,"Laptop battery drains fast after BIOS update, ...",device
8,T-009,"Portal shows 500 error when submitting form, h...",app
9,T-010,"API requests timing out, latency spike observe...",app


### Train/test split


In [33]:

X_train, X_test, y_train, y_test = train_test_split(
    df["text"], df["label"], test_size=0.33, random_state=42, stratify=df["label"]
)

print("Train size:", len(X_train))
print("Test size:", len(X_test))


Train size: 10
Test size: 5


## 2) Tokenization basics and normalization (lightweight, practical)


In production pipelines you typically do **minimal, safe normalization**:
- lowercase
- normalize whitespace
- optionally strip obvious punctuation
- keep numbers when they carry meaning (error codes, versions, dates)

Heavy normalization (stemming, aggressive regexes) can hurt when your text includes:
error codes, product names, IDs, or domain terminology.


In [34]:

def simple_normalize(text: str) -> str:
    text = text.lower()
    text = re.sub(r"\s+", " ", text).strip()
    return text

df["text_norm"] = df["text"].map(simple_normalize)
df[["ticket_id","text_norm","label"]].head()


Unnamed: 0,ticket_id,text_norm,label
0,T-001,vpn keeps disconnecting every 10 minutes on wi...,network
1,T-002,password reset link is expired and user cannot...,auth
2,T-003,"email delivery delayed, outbound messages queu...",messaging
3,T-004,"cannot install printer driver, installer fails...",device
4,T-005,"mfa prompt never arrives on mobile app, user s...",auth


## 3) Vocabulary + Document-Term Matrix (DTM) with CountVectorizer


**CountVectorizer** builds:
- a vocabulary (token → column index)
- a sparse matrix where rows are documents and columns are tokens

This is the classic **Document-Term Matrix** representation.


In [35]:

count_vec = CountVectorizer(
    lowercase=True,
    token_pattern=r"(?u)\b\w+\b",  # keeps tokens like "500", "1603", "mfa"
    min_df=1
)

X_train_counts = count_vec.fit_transform(X_train)
X_test_counts  = count_vec.transform(X_test)

print("DTM shape (train):", X_train_counts.shape)
print("Vocabulary size:", len(count_vec.vocabulary_))


DTM shape (train): (10, 92)
Vocabulary size: 92


### Inspect the vocabulary and a single row


In [9]:

# Show a small slice of the vocabulary (token -> index)
vocab_items = sorted(count_vec.vocabulary_.items(), key=lambda x: x[1])[:25]
vocab_items


[('10', 0),
 ('11', 1),
 ('1603', 2),
 ('500', 3),
 ('access', 4),
 ('after', 5),
 ('and', 6),
 ('api', 7),
 ('app', 8),
 ('archive', 9),
 ('arrives', 10),
 ('at', 11),
 ('auto', 12),
 ('battery', 13),
 ('bios', 14),
 ('cannot', 15),
 ('code', 16),
 ('correct', 17),
 ('corrupted', 18),
 ('denied', 19),
 ('deployment', 20),
 ('disconnecting', 21),
 ('drains', 22),
 ('drive', 23),
 ('driver', 24)]

In [10]:

# Look at a specific document row: non-zero entries (token counts)
row_id = 0
row = X_train_counts[row_id]
inv_vocab = {idx: tok for tok, idx in count_vec.vocabulary_.items()}

nz_cols = row.nonzero()[1]
tokens_counts = sorted([(inv_vocab[c], int(row[0, c])) for c in nz_cols], key=lambda x: -x[1])
tokens_counts[:20]


[('portal', 1),
 ('shows', 1),
 ('500', 1),
 ('error', 1),
 ('when', 1),
 ('submitting', 1),
 ('form', 1),
 ('happened', 1),
 ('after', 1),
 ('deployment', 1)]

## 4) Binary vs Count-based Bag-of-Words


Binary BoW: token present or not (good for short texts and some classification tasks)  
Count BoW: raw frequency (baseline for many pipelines)

Both discard word order.


In [11]:

binary_vec = CountVectorizer(binary=True,token_pattern=r"(?u)\b\w+\b")
X_train_binary = binary_vec.fit_transform(X_train)


In [12]:
X_train_binary.shape


(10, 92)

In [13]:
print(X_train_binary)

<Compressed Sparse Row sparse matrix of dtype 'int64'
	with 104 stored elements and shape (10, 92)>
  Coords	Values
  (0, 61)	1
  (0, 76)	1
  (0, 3)	1
  (0, 27)	1
  (0, 88)	1
  (0, 80)	1
  (0, 31)	1
  (0, 34)	1
  (0, 5)	1
  (0, 20)	1
  (1, 7)	1
  (1, 67)	1
  (1, 83)	1
  (1, 57)	1
  (1, 45)	1
  (1, 78)	1
  (1, 55)	1
  (1, 37)	1
  (1, 44)	1
  (1, 36)	1
  (2, 5)	1
  (2, 87)	1
  (2, 42)	1
  (2, 21)	1
  (2, 28)	1
  :	:
  (7, 49)	1
  (7, 70)	1
  (7, 60)	1
  (7, 65)	1
  (7, 35)	1
  (7, 81)	1
  (8, 54)	1
  (8, 58)	1
  (8, 72)	1
  (8, 69)	1
  (8, 68)	1
  (8, 38)	1
  (8, 73)	1
  (8, 18)	1
  (9, 56)	1
  (9, 86)	1
  (9, 50)	1
  (9, 64)	1
  (9, 53)	1
  (9, 10)	1
  (9, 52)	1
  (9, 8)	1
  (9, 79)	1
  (9, 11)	1
  (9, 47)	1


## 5) TF-IDF (a refinement, not a replacement)


TF-IDF downweights very common tokens and upweights tokens that are more distinctive.

In industry, TF-IDF with **n-grams** is a strong baseline for:
- ticket routing
- intent detection
- spam detection
- incident clustering


In [14]:

tfidf_vec = TfidfVectorizer(ngram_range=(1,2), token_pattern=r"(?u)\b\w+\b", min_df=1,sublinear_tf=True)
X_train_tfidf = tfidf_vec.fit_transform(X_train)
X_test_tfidf  = tfidf_vec.transform(X_test)

In [15]:
X_train_tfidf.shape


(10, 186)

## 6) Quick retrieval: 'find similar tickets' with cosine similarity


A very common industry use case is **nearest neighbor retrieval** for:
- deduplication
- suggesting knowledge base articles
- finding similar past incidents


In [16]:
X_all = tfidf_vec.fit_transform(df["text"])

def search_similar(query: str, X_corpus, top_k: int = 5):
    query_vec = tfidf_vec.transform([query])
    sims = cosine_similarity(query_vec, X_all).flatten()
    top_indices = np.argsort(-sims)[:top_k]
    return df.loc[top_indices, ["ticket_id", "text", "label"]].assign(similarity= sims[top_indices])

search_similar("login mfa not working on phone",X_all,top_k=5) 


Unnamed: 0,ticket_id,text,label,similarity
12,T-013,Push notifications not working on Android for ...,app,0.426113
4,T-005,"MFA prompt never arrives on mobile app, user s...",auth,0.21186
1,T-002,Password reset link is expired and user cannot...,auth,0.069304
6,T-007,"Outlook search not returning results, index se...",messaging,0.054095
14,T-015,"Bluetooth mouse not pairing after restart, dev...",device,0.048894


## 7) Classification baseline (Logistic Regression)


For text classification, a strong baseline is:

**TF-IDF → Linear model (LogReg / Linear SVM)**

This is fast, reliable, easy to explain, and often hard to beat without deep learning.


In [17]:

clf = LogisticRegression(max_iter=2000)

pipeline = Pipeline([
    ("tfidf", TfidfVectorizer(
        ngram_range=(1,2),
        token_pattern=r"(?u)\b\w+\b",
        sublinear_tf=True
    )),
    ("model", clf)
])

pipeline.fit(X_train, y_train)
pred = pipeline.predict(X_test)

print(classification_report(y_test, pred))
print("Confusion matrix:\n", confusion_matrix(y_test, pred))


              precision    recall  f1-score   support

         app       0.00      0.00      0.00         1
        auth       0.50      1.00      0.67         1
      device       0.00      0.00      0.00         1
   messaging       0.00      0.00      0.00         1
     network       1.00      1.00      1.00         1

    accuracy                           0.40         5
   macro avg       0.30      0.40      0.33         5
weighted avg       0.30      0.40      0.33         5

Confusion matrix:
 [[0 1 0 0 0]
 [0 1 0 0 0]
 [1 0 0 0 0]
 [1 0 0 0 0]
 [0 0 0 0 1]]


  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])


## 8) Production pattern: HashingVectorizer (no stored vocab)


In production, you may need:
- constant memory usage
- privacy (no vocabulary inspection)
- streaming support
- easier deployment across services

**HashingVectorizer** avoids building a vocabulary. Tradeoff: collisions.


In [18]:


hash_pipeline = Pipeline([
    ("hash", HashingVectorizer(
        n_features=2**18,          # we use larger number of features to reduce possible collisions
        alternate_sign=False,      # makes sure all features are positive, better for LogisticRegression
        norm="l2",                 
        token_pattern=r"(?u)\b\w+\b",
        ngram_range=(1,2)          
    )),
    ("model", LogisticRegression(max_iter=2000))
])

hash_pipeline.fit(X_train, y_train)
pred_hash = hash_pipeline.predict(X_test)

print(classification_report(y_test, pred_hash))
print("Confusion matrix:\n", confusion_matrix(y_test, pred_hash))



              precision    recall  f1-score   support

         app       0.00      0.00      0.00         1
        auth       1.00      1.00      1.00         1
      device       0.00      0.00      0.00         1
   messaging       0.00      0.00      0.00         1
     network       1.00      1.00      1.00         1

    accuracy                           0.40         5
   macro avg       0.40      0.40      0.40         5
weighted avg       0.40      0.40      0.40         5

Confusion matrix:
 [[0 0 0 1 0]
 [0 1 0 0 0]
 [1 0 0 0 0]
 [1 0 0 0 0]
 [0 0 0 0 1]]


  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])


## 9) Save and load the model (typical deployment step)


In [19]:
# Save the trained pipeline to a file

joblib.dump(pipeline, "ticket_classifier.pkl")


['ticket_classifier.pkl']

In [20]:
# Load the pipeline from the file
loaded_pipeline = joblib.load("ticket_classifier.pkl")


In [21]:
# Test new text with the loaded pipeline
sample_text = [
    "VPN disconnects after update and network drops frequently"
]

loaded_pipeline.predict(sample_text)


array(['network'], dtype=object)

## Exercises (do these during lab)
1) Add 10 more tickets to `data` with realistic wording and labels. Re-train and compare results.  
2) Try `ngram_range=(1,3)` and observe what changes.  
3) For retrieval, test at least 3 queries and explain why the top result makes sense.  
4) Replace the dataset with a CSV you create (columns: `text`, `label`) and rerun the notebook.


In [22]:
# 1) Add 10 more tickets to `data` with realistic wording and labels. Re-train and compare results. 

data = [
    ("T-001", "VPN keeps disconnecting every 10 minutes on Windows 11 after latest update", "network"),
    ("T-002", "Password reset link is expired and user cannot login to the portal", "auth"),
    ("T-003", "Email delivery delayed, outbound messages queued for hours", "messaging"),
    ("T-004", "Cannot install printer driver, installer fails with error code 1603", "device"),
    ("T-005", "MFA prompt never arrives on mobile app, user stuck at login", "auth"),
    ("T-006", "WiFi signal drops in meeting rooms, access point reboot helps temporarily", "network"),
    ("T-007", "Outlook search not returning results, index seems corrupted", "messaging"),
    ("T-008", "Laptop battery drains fast after BIOS update, power settings unchanged", "device"),
    ("T-009", "Portal shows 500 error when submitting form, happened after deployment", "app"),
    ("T-010", "API requests timing out, latency spike observed in last hour", "app"),
    ("T-011", "User cannot access shared drive, permission denied though in correct group", "auth"),
    ("T-012", "Teams calls have choppy audio, jitter high on corporate network", "network"),
    ("T-013", "Push notifications not working on Android for the app", "app"),
    ("T-014", "Mailbox is full and cannot receive emails, auto-archive not running", "messaging"),
    ("T-015", "Bluetooth mouse not pairing after restart, device shows as unknown", "device"),
    # New tickets added below    
    ("T-016", "User account locked after too many failed login attempts, cannot unlock via self-service", "auth"),
    ("T-017", "DNS resolution failing intermittently, websites not loading for multiple users", "network"),
    ("T-018", "Mobile app crashes on launch after latest update, shows blank screen then closes", "app"),
    ("T-019", "Cannot connect to VPN, authentication fails with 'invalid credentials' though password works elsewhere", "auth"),
    ("T-020", "Email attachments not downloading in Outlook, stuck on 'Downloading' for large files", "messaging"),
    ("T-021", "WiFi connected but no internet access, renew IP fixes temporarily", "network"),
    ("T-022", "Printer prints gibberish characters, correct driver installed but issue persists", "device"),
    ("T-023", "API returns 502 Bad Gateway under load, started after recent config change", "app"),
    ("T-024", "Teams desktop app not opening, hangs on loading screen after Windows update", "app"),
    ("T-025", "Mailbox search very slow, indexing status shows incomplete for several hours", "messaging"),


]

newdf = pd.DataFrame(data, columns=["ticket_id", "text", "label"])


X_train, X_test, y_train, y_test = train_test_split(
    newdf["text"], newdf["label"], test_size=0.33, random_state=42, stratify=newdf["label"]
)




clf = LogisticRegression(max_iter=2000)

pipeline = Pipeline([
    ("tfidf", TfidfVectorizer(
        ngram_range=(1,2),
        token_pattern=r"(?u)\b\w+\b",
        sublinear_tf=True
    )),
    ("model", clf)
])

pipeline.fit(X_train, y_train)
pred = pipeline.predict(X_test)

print(classification_report(y_test, pred))
print("Confusion matrix:\n", confusion_matrix(y_test, pred))



              precision    recall  f1-score   support

         app       0.29      1.00      0.44         2
        auth       1.00      0.50      0.67         2
      device       0.00      0.00      0.00         1
   messaging       0.00      0.00      0.00         2
     network       0.00      0.00      0.00         2

    accuracy                           0.33         9
   macro avg       0.26      0.30      0.22         9
weighted avg       0.29      0.33      0.25         9

Confusion matrix:
 [[2 0 0 0 0]
 [0 1 1 0 0]
 [1 0 0 0 0]
 [2 0 0 0 0]
 [2 0 0 0 0]]


  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])


After adding 10 more tickets, the accuracy dropped from about 40% to around 33%. This is mainly because the dataset is still very small and the test set only contains a few samples, so one wrong prediction changes the score a lot.
The new tickets also added more varied wording and vocabulary, but there are still very few examples per label, especially for some classes like device and messaging. With five categories and limited data, the model cannot learn stable patterns yet.
Overall, adding a small amount of data does not always improve performance immediately. With a larger and more balanced dataset, we would expect the results to become more stable and improve.

In [25]:
# 2) Try `ngram_range=(1,3)` and observe what changes.  

pipeline = Pipeline([
    ("tfidf", TfidfVectorizer(
        ngram_range=(1,3),
        token_pattern=r"(?u)\b\w+\b",
        sublinear_tf=True
    )),
    ("model", clf)
])

pipeline.fit(X_train, y_train)
pred = pipeline.predict(X_test)

print(classification_report(y_test, pred))
print("Confusion matrix:\n", confusion_matrix(y_test, pred))


              precision    recall  f1-score   support

         app       0.22      1.00      0.36         2
        auth       0.00      0.00      0.00         2
      device       0.00      0.00      0.00         1
   messaging       0.00      0.00      0.00         2
     network       0.00      0.00      0.00         2

    accuracy                           0.22         9
   macro avg       0.04      0.20      0.07         9
weighted avg       0.05      0.22      0.08         9

Confusion matrix:
 [[2 0 0 0 0]
 [2 0 0 0 0]
 [1 0 0 0 0]
 [2 0 0 0 0]
 [2 0 0 0 0]]


  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])


Using ngram range (1,3) reduced accuracy because the feature space became much larger while the dataset remained small. This likely caused overfitting and unstable predictions.

In [26]:
# 3) For retrieval, test at least 3 queries and explain why the top result makes sense.  

# --- Build retrieval index (use the full dataset df) ---
retrieval_vec = TfidfVectorizer(
    ngram_range=(1,2),
    token_pattern=r"(?u)\b\w+\b",
    sublinear_tf=True
)

doc_texts = df["text"].tolist()
doc_matrix = retrieval_vec.fit_transform(doc_texts)  # TF-IDF vectors for all tickets

def retrieve_top_k(query, k=3):
    q_vec = retrieval_vec.transform([query])
    sims = cosine_similarity(q_vec, doc_matrix).ravel()  # similarity vs all tickets
    top_idx = np.argsort(sims)[::-1][:k]
    results = []
    for i in top_idx:
        results.append({
            "ticket_id": df.iloc[i]["ticket_id"],
            "label": df.iloc[i]["label"],
            "text": df.iloc[i]["text"],
            "score": float(sims[i])
        })
    return results

# --- Test at least 3 queries ---
queries = [
    "VPN disconnects after update every few minutes",
    "Outlook search not working and emails are not showing in results",
    "500 error when submitting the portal form after deployment"
]

for q in queries:
    print("\nQUERY:", q)
    top = retrieve_top_k(q, k=3)
    print("Top result:")
    print(" ", top[0]["ticket_id"], "|", top[0]["label"], "| score:", round(top[0]["score"], 3))
    print(" ", top[0]["text"])
    print("\nNext results:")
    for r in top[1:]:
        print(" ", r["ticket_id"], "|", r["label"], "| score:", round(r["score"], 3))
        print("  ", r["text"])



QUERY: VPN disconnects after update every few minutes
Top result:
  T-001 | network | score: 0.442
  VPN keeps disconnecting every 10 minutes on Windows 11 after latest update

Next results:
  T-008 | device | score: 0.142
   Laptop battery drains fast after BIOS update, power settings unchanged
  T-015 | device | score: 0.057
   Bluetooth mouse not pairing after restart, device shows as unknown

QUERY: Outlook search not working and emails are not showing in results
Top result:
  T-007 | messaging | score: 0.467
  Outlook search not returning results, index seems corrupted

Next results:
  T-013 | app | score: 0.221
   Push notifications not working on Android for the app
  T-014 | messaging | score: 0.179
   Mailbox is full and cannot receive emails, auto-archive not running

QUERY: 500 error when submitting the portal form after deployment
Top result:
  T-009 | app | score: 0.731
  Portal shows 500 error when submitting form, happened after deployment

Next results:
  T-002 | auth 

Query 1: VPN disconnects after update every few minutes
The top result was ticket T-001 (network). This makes sense because both the query and the ticket mention key terms such as “VPN”, “disconnecting”, and “update”, and they describe the same type of network connectivity problem.

Query 2: Outlook search not working and emails are not showing in results
The top result was ticket T-007 (messaging). This is relevant because both texts focus on Outlook search failures and missing results, which clearly belong to a messaging related issue.

Query 3: 500 error when submitting the portal form after deployment
The top result showed weaker similarity compared to the other queries, but the returned ticket still referenced portal access and login related problems. This partially overlaps with the query, explaining why the similarity score was lower and the match was not as strong as in the first two cases.

In [29]:
# 4) Replace the dataset with a CSV you create (columns: `text`, `label`) and rerun the notebook.
df = pd.read_csv("tickets2.csv")
df["ticket_id"] = [f"T-{i:03d}" for i in range(1, len(df) + 1)]

X_train, X_test, y_train, y_test = train_test_split(
    df["text"], df["label"], test_size=0.33, random_state=42, stratify=df["label"]
)

print("Train size:", len(X_train), "Test size:", len(X_test))
print(pd.Series(y_train).value_counts())
print(pd.Series(y_test).value_counts())



clf = LogisticRegression(max_iter=2000)

pipeline = Pipeline([
    ("tfidf", TfidfVectorizer(
        ngram_range=(1,2),
        token_pattern=r"(?u)\b\w+\b",
        sublinear_tf=True
    )),
    ("model", clf)
])

pipeline.fit(X_train, y_train)
pred = pipeline.predict(X_test)

print(classification_report(y_test, pred))
print("Confusion matrix:\n", confusion_matrix(y_test, pred))


Train size: 16 Test size: 9
label
app          4
network      3
auth         3
device       3
messaging    3
Name: count, dtype: int64
label
app          2
messaging    2
network      2
auth         2
device       1
Name: count, dtype: int64
              precision    recall  f1-score   support

         app       0.29      1.00      0.44         2
        auth       1.00      0.50      0.67         2
      device       0.00      0.00      0.00         1
   messaging       0.00      0.00      0.00         2
     network       0.00      0.00      0.00         2

    accuracy                           0.33         9
   macro avg       0.26      0.30      0.22         9
weighted avg       0.29      0.33      0.25         9

Confusion matrix:
 [[2 0 0 0 0]
 [0 1 1 0 0]
 [1 0 0 0 0]
 [2 0 0 0 0]
 [2 0 0 0 0]]


  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
