<a href="https://colab.research.google.com/github/octavioeac/LLM-AZ/blob/main/prototype_llm.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## "SparseHeadClassifier: Multi-Sparse Attention Transformer for Text Quality Classification in Technical QA Descriptions"

# Abstract:
In this work, we introduce SparseHeadClassifier, a Transformer-based architecture designed for evaluating the quality of textual descriptions in technical QA environments such as JIRA. Our model leverages a novel multi-sparse attention mechanism, where each attention head applies a distinct sparsity strategy—such as top-k, local, or full attention—allowing the model to capture diverse contextual patterns efficiently. Each attention head is followed by a dimensionality-reducing compression step before concatenation, improving computational efficiency and preserving salient features. The final classification layer outputs a quality score (Good, Regular, or Poor), enabling automated evaluation of QA documentation. Experiments show that the SparseHeadClassifier balances expressiveness and efficiency, and can serve as a robust tool for assessing the consistency and clarity of issue tracking text data in real-world software projects. *texto en cursiva*

SparseHeadClassifier is a Transformer model that automatically analyzes text descriptions from platforms like JIRA and classifies them as Good, Regular, or Poor. What makes it unique is that each attention head uses a different sparsity strategy—some focus only on the top-k important tokens, others look at nearby tokens, and some attend fully. This gives the model a richer view of the text while being more efficient. Each head compresses its output, and then everything is passed to a neural network that makes the final quality prediction. It's ideal for identifying low-quality, inconsistent, or vague bug or test descriptions in software teams.

In [2]:
import pandas as pd

In [12]:
import pandas as pd

# URL al archivo raw en GitLab
url = "/content/drive/MyDrive/autozone/AutoZone JIRA 2025-07-01T21_24_24-0500(in).csv"

# Leer el archivo CSV directamente desde GitLab
df = pd.read_csv(url)

# Mostrar las primeras filas para verificar
print(df.head())

                                             Summary Issue key  Issue id  \
0  Verify Autozoner SelfCheckout Actions Courtesy...  QR-26591    746670   
1            Verify GapToEarn Qualifies No AzRewards  QR-26590    746697   
2  Verify Login to Autozone Rewards account with ...  QR-26589    746647   
3  Verify Age Restricted Modal not shown by addin...  QR-26588    746612   
4  Verify SaveCart button is enabled by opening A...  QR-26587    746719   

  Issue Type                      Labels                    Labels.1  \
0       Test  POS_Regression_Handover_II                    Reviewed   
1       Test  POS_Regression_Handover_II                    Reviewed   
2       Test  POS_Regression_Handover_II                    Reviewed   
3       Test        AgilityJIRASyncissue  POS_Regression_Handover_II   
4       Test  POS_Regression_Handover_II                    Reviewed   

                         Labels.2                        Labels.3 Labels.4  \
0  Standard_Regression_Automatio

In [13]:
headers = list(df.columns)
print(headers)

['Summary', 'Issue key', 'Issue id', 'Issue Type', 'Labels', 'Labels.1', 'Labels.2', 'Labels.3', 'Labels.4', 'Labels.5', 'Description', 'Custom field (API Environment)', 'Custom field (ARS #)', 'Custom field (ARS Title)', 'Custom field (Agility URL)', 'Custom field (Application Name)', 'Custom field (Business Process)', 'Custom field (Classifications)', 'Custom field (Connection String)', 'Custom field (Country)', 'Custom field (Database Name)', 'Custom field (Difference between versions)', 'Custom field (Environment)', 'Custom field (Epic Link)', 'Custom field (Expected Outcome)', 'Custom field (Expected Output)', 'Custom field (Experience Type)', 'Custom field (Feature/Menu)', 'Custom field (Feature/Menu).1', 'Custom field (Flagged)', 'Custom field (Hardware)', 'Custom field (Issue Owner)', 'Custom field (Manual Test Case)', 'Custom field (Original story points)', 'Custom field (PO Review)', 'Custom field (Parent Link)', 'Custom field (Platform/Framework)', 'Custom field (Pre-Deploym

In [14]:
columns_to_keep = [
    'Issue key',
    'Summary',
    'Issue Type',
    'Labels',
    'Labels.1', 'Labels.2', 'Labels.3', 'Labels.4', 'Labels.5',
    'Description',
    'Custom field (Expected Outcome)',
    'Custom field (Environment)',
    'Custom field (Severity)',
    'Custom field (Test Identifier)',
    'Custom field (Zephyr Teststep)',
    'Sprint',
    'Custom field (Team)'
]

# Filtrar solo las columnas que existen en el archivo (para evitar errores si faltan algunas)
columns_present = [col for col in columns_to_keep if col in df.columns]

# Crear DataFrame reducido
df_reduced = df[columns_present]

# Verificar resultado
print("\nColumnas mantenidas:")
print(df_reduced.columns)

print("\nPrimeras filas del DataFrame limpio:")
print(df_reduced.head())


Columnas mantenidas:
Index(['Issue key', 'Summary', 'Issue Type', 'Labels', 'Labels.1', 'Labels.2',
       'Labels.3', 'Labels.4', 'Labels.5', 'Description',
       'Custom field (Expected Outcome)', 'Custom field (Environment)',
       'Custom field (Severity)', 'Custom field (Test Identifier)',
       'Custom field (Zephyr Teststep)', 'Sprint', 'Custom field (Team)'],
      dtype='object')

Primeras filas del DataFrame limpio:
  Issue key                                            Summary Issue Type  \
0  QR-26591  Verify Autozoner SelfCheckout Actions Courtesy...       Test   
1  QR-26590            Verify GapToEarn Qualifies No AzRewards       Test   
2  QR-26589  Verify Login to Autozone Rewards account with ...       Test   
3  QR-26588  Verify Age Restricted Modal not shown by addin...       Test   
4  QR-26587  Verify SaveCart button is enabled by opening A...       Test   

                       Labels                    Labels.1  \
0  POS_Regression_Handover_II             

In [16]:
#Limpieza de toda la información para que podamos hacer un promot con todo lo funcional
def has_meaningful_data(row):
    has_summary = pd.notna(row.get('Summary')) and str(row['Summary']).strip() != ""
    has_description = pd.notna(row.get('Description')) and str(row['Description']).strip() != ""
    has_expected = pd.notna(row.get('Custom field (Expected Outcome)')) and str(row['Custom field (Expected Outcome)']).strip() != ""
    has_labels = any(pd.notna(row.get(label)) and str(row[label]).strip() != "" for label in ['Labels', 'Labels.1', 'Labels.2', 'Labels.3', 'Labels.4', 'Labels.5'])
    return has_summary and (has_description or has_expected or has_labels)

# Aplica el filtro
df_filtered = df_reduced[df_reduced.apply(has_meaningful_data, axis=1)]

# Verifica cuántas filas quedaron
print(f"Total de casos antes de limpiar: {len(df_reduced)}")
print(f"Total de casos con información completa: {len(df_filtered)}")

Total de casos antes de limpiar: 120
Total de casos con información completa: 120


In [17]:
# Construye los prompts para los primeros 10 casos
for idx, row in df_filtered.head(10).iterrows():
    prompt_parts = []

    # Summary y Description primero
    if pd.notna(row['Summary']) and str(row['Summary']).strip():
        prompt_parts.append(f"Summary: {row['Summary']}")
    if pd.notna(row.get('Description')) and str(row['Description']).strip():
        prompt_parts.append(f"Description: {row['Description']}")

    # Labels
    labels = [str(row[label]) for label in ['Labels', 'Labels.1', 'Labels.2', 'Labels.3', 'Labels.4', 'Labels.5'] if pd.notna(row.get(label)) and str(row[label]).strip()]
    if labels:
        prompt_parts.append(f"Labels: {', '.join(labels)}")

    # Otros campos contextuales clave
    for field in ['Custom field (Expected Outcome)', 'Custom field (Environment)', 'Custom field (Severity)', 'Custom field (Test Identifier)', 'Custom field (Zephyr Teststep)', 'Sprint', 'Custom field (Team)']:
        if pd.notna(row.get(field)) and str(row[field]).strip():
            # Mostrar el nombre del campo entre paréntesis si es "Custom field (...)"
            pretty_name = field.split('(')[-1].strip(')') if '(' in field else field
            prompt_parts.append(f"{pretty_name}: {row[field]}")

    prompt = "Genera un caso de prueba con los siguientes datos:\n" + "\n".join(prompt_parts)

    print(f"\n--- Prompt #{idx + 1} ---")
    print(prompt)



--- Prompt #1 ---
Genera un caso de prueba con los siguientes datos:
Summary: Verify Autozoner SelfCheckout Actions Courtesy Discount with GreyShirt password
Labels: POS_Regression_Handover_II, Reviewed, Standard_Regression_Automation
Test Identifier: Automated_Selenium

--- Prompt #2 ---
Genera un caso de prueba con los siguientes datos:
Summary: Verify GapToEarn Qualifies No AzRewards
Description: Verify Gap To Earn section is displayed with the SCO changes and Sign-In button is working as expected
Labels: POS_Regression_Handover_II, Reviewed, Standard_Regression_Automation
Test Identifier: Automated_Selenium

--- Prompt #3 ---
Genera un caso de prueba con los siguientes datos:
Summary: Verify Login to Autozone Rewards account with invalid Rewards Id for Espanol
Description: Verify Login to Autozone Rewards account with invalid Rewards Id for Espanol
Labels: POS_Regression_Handover_II, Reviewed, Standard_Regression_Automation
Test Identifier: Automated_Selenium

--- Prompt #4 ---
Ge

In [1]:
pip install torch

Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch)
  Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-curand-cu12==10.3.5.147 (from torch)
  Downloading nvidia_curand_cu12-10.3.5