<a href="https://colab.research.google.com/github/kesterlyn-wilson/applied-bioinformatics/blob/main/KiA_algo2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# README

An internal support toolkit for monitoring and improving  knowledge capture and retrival.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------
# Requirements (for running in terminal, will transition over to VPN):
* Python 3.8+
* Libraries: `pandas`, `scikit-learn`, `sentence-transformers`

run `pip install -r requirements.txt` in your terminal.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------

# Module:

* `kba_4u`: KBA recommender of existing KBAs for ECES tickets based on context similarity. Intended for knowledge capture.

    **Inputs:**
  - `tickets.csv` — Jira export
  - `kba_index.csv` — title + tags of KBs

  **Outputs:**
  - Suggested KBA match (>0.2)
  - If no high matches, suggested title for new article (coming soon)
  - Terminal output (HTML output coming soon)

# Run it

In [None]:
!pip install sentence-transformers
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
import pandas as pd

# Load model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Load data
tickets = pd.read_csv('/content/Resolved Tickets (Jira).csv') #CHANGE
kbas = pd.read_csv('/content/kba_index.csv', encoding='latin1') #CHANGE

# Prepare and embed KBA texts
kba_texts = (
    kbas['Title'].fillna('') + ' ' +
    kbas['Summary'].fillna('') + ' ' +
    kbas['Initial Diagnosis'].fillna('') + ' ' +
    kbas['Troubleshooting & Repair'].fillna('')
).tolist()
kba_vecs = model.encode(kba_texts, convert_to_tensor=True)

# Loop through tickets
for idx, row in tickets.iterrows():
    ticket_text = (
        str(row.get('Summary', '')) + ' ' +
        str(row.get('Description', '')) + ' ' +
        str(row.get('Actions', '')) + ' ' +
        str(row.get('Error Message (if applicable)', '')) + ' ' +
        str(row.get('SW Version (if applicable)', '')) + ' ' +
        str(row.get('Comments', ''))
    )

    ticket_vec = model.encode([ticket_text], convert_to_tensor=True)

    # Compute similarity
    sims = cosine_similarity(ticket_vec, kba_vecs)[0]
    best_idx = sims.argmax()
    best_score = sims[best_idx]

    # Always print ticket info
    print(f"\nTicket: {row['Issue key']} — {row['Summary']}")

    # Threshold check
    if best_score > 0.5:
        print(f"Recommended KBA: {kbas.iloc[best_idx]['Title']} (Score: {best_score:.2f})")
    else:
        print("No strong match. Suggest creating new KBA.")

    # Next 3 best matches (excluding the top match)
    ranked_indices = sims.argsort()[::-1]
    next_best_indices = ranked_indices[1:4]  # Skip top 1, take next 3

    print("Next Best Matches:")
    for i in next_best_indices:
        print(f"   - {kbas.iloc[i]['Title']} (Score: {sims[i]:.2f})")
