## String Comparisons with Spacy

### Installs

In [17]:
!pip install spacy fuzzywuzzy

Collecting fuzzywuzzy
  Downloading fuzzywuzzy-0.18.0-py2.py3-none-any.whl (18 kB)
Installing collected packages: fuzzywuzzy
Successfully installed fuzzywuzzy-0.18.0


### Imports

In [37]:
import spacy

try:
    nlp = spacy.load("en_core_web_lg")
except OSError:
    print("Downloading the 'en_core_web_lg' model...")
    spacy.cli.download("en_core_web_lg")
    nlp = spacy.load("en_core_web_lg")


Downloading the 'en_core_web_lg' model...
Collecting en-core-web-lg==3.5.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.5.0/en_core_web_lg-3.5.0-py3-none-any.whl (587.7 MB)
[2K     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 587.7/587.7 MB 6.3 MB/s eta 0:00:00
Installing collected packages: en-core-web-lg
Successfully installed en-core-web-lg-3.5.0
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_lg')


### v1: Compare Similarity of Two Strings

In [15]:
alert = nlp("Increase in FCIs on /api/v1/azure/status over the past 5 minutes")
changes = [
    "CHNG559732: Recent deployment on azuredatabasenodeserv for manifest ID azuredatabasenodeserv-062723230148699022",
    "CHNG494690: Recent deployment on legacycassandradbnodeserv for manifest ID legacycassandradbnodeserv-062723230148700108",
    "CHNG336829: Recent deployment on oldpostgresadapternodeweb for manifest ID oldpostgresadapternodeweb-062723230148703462"
]

# Calculate similarity with each change string
for change in changes:
    change_doc = nlp(change)
    similarity = alert.similarity(change_doc)
    print(alert, "<->", change_doc)
    print("Similarity:", similarity)
    print()


Increase in FCIs on /api/v1/azure/status over the past 5 minutes <-> CHNG559732: Recent deployment on azuredatabasenodeserv for manifest ID azuredatabasenodeserv-062723230148699022
Similarity: 0.5732614955585756

Increase in FCIs on /api/v1/azure/status over the past 5 minutes <-> CHNG494690: Recent deployment on legacycassandradbnodeserv for manifest ID legacycassandradbnodeserv-062723230148700108
Similarity: 0.5732614955585756

Increase in FCIs on /api/v1/azure/status over the past 5 minutes <-> CHNG336829: Recent deployment on oldpostgresadapternodeweb for manifest ID oldpostgresadapternodeweb-062723230148703462
Similarity: 0.5732614955585756



### v2

In [44]:
import spacy
from fuzzywuzzy import fuzz

def extract_entities(text):
    nlp = spacy.load("en_core_web_lg")
    doc = nlp(text)
    print(doc)
    entities = [ent.text for ent in doc.ents]
    print(entities)
    return entities

def find_relationships(alert, change_tickets):
    alert_entities = extract_entities(alert)
    relationships = []

    for ticket in change_tickets:
        ticket_entities = extract_entities(ticket)

        for alert_entity in alert_entities:
            for ticket_entity in ticket_entities:
                similarity = fuzz.ratio(alert_entity.lower(), ticket_entity.lower())
                if similarity >= 10:  # Adjust the similarity threshold as needed
                    relationships.append((alert_entity, ticket_entity, ticket))

    return relationships

def main():
    alert = "Increase in FCIs on /api/v1/foo/status over the past 5 minutes"

    changes = [
        "CHNG559732: Recent deployment on foodatabasenodeserv for manifest ID azuredatabasenodeserv-062723230148699022",
        "CHNG494690: Recent deployment on legacycassandradbnodeserv for manifest ID legacycassandradbnodeserv-062723230148700108",
        "CHNG336829: Recent deployment on oldpostgresadapternodeweb for manifest ID oldpostgresadapternodeweb-062723230148703462",
        # Add more change ticket titles as needed
    ]

    relationships = find_relationships(alert, changes)

    if relationships:
        print("Relationships found:")
        for alert_entity, ticket_entity, ticket in relationships:
            print(f"Alert Entity: {alert_entity}")
            print(f"Ticket Entity: {ticket_entity}")
            print(f"Change Ticket: {ticket}")
            print()
    else:
        print("No relationships found.")

if __name__ == "__main__":
    main()


Increase in FCIs on /api/v1/foo/status over the past 5 minutes
['/api', 'the past 5 minutes']
CHNG559732: Recent deployment on foodatabasenodeserv for manifest ID azuredatabasenodeserv-062723230148699022
['CHNG559732']
CHNG494690: Recent deployment on legacycassandradbnodeserv for manifest ID legacycassandradbnodeserv-062723230148700108
[]
CHNG336829: Recent deployment on oldpostgresadapternodeweb for manifest ID oldpostgresadapternodeweb-062723230148703462
['CHNG336829', 'oldpostgresadapternodeweb-062723230148703462']
Relationships found:
Alert Entity: the past 5 minutes
Ticket Entity: CHNG559732
Change Ticket: CHNG559732: Recent deployment on foodatabasenodeserv for manifest ID azuredatabasenodeserv-062723230148699022

Alert Entity: the past 5 minutes
Ticket Entity: CHNG336829
Change Ticket: CHNG336829: Recent deployment on oldpostgresadapternodeweb for manifest ID oldpostgresadapternodeweb-062723230148703462

Alert Entity: the past 5 minutes
Ticket Entity: oldpostgresadapternodeweb-