<a href="https://colab.research.google.com/github/osman-mo94/Sarcopenia-NLP-project/blob/main/synthetic_letters17_07_22.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Install and import packages

In [102]:
!pip install docx2txt

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [103]:
!pip install spacy


Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [104]:
!pip install negspacy

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [105]:
!pip install scispacy

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [106]:
!pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.0/en_ner_bc5cdr_md-0.5.0.tar.gz

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.0/en_ner_bc5cdr_md-0.5.0.tar.gz
  Using cached https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.0/en_ner_bc5cdr_md-0.5.0.tar.gz (120.2 MB)


In [107]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import docx2txt
import spacy
from spacy.matcher import PhraseMatcher
from spacy.pipeline import EntityRuler
from negspacy.negation import Negex
from negspacy.termsets import termset
from spacy.tokens import Span
import scispacy
from scispacy.abbreviation import AbbreviationDetector
from spacy import displacy

In [108]:
!python -m spacy download en_core_web_sm

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting en-core-web-sm==3.2.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.2.0/en_core_web_sm-3.2.0-py3-none-any.whl (13.9 MB)
[K     |████████████████████████████████| 13.9 MB 8.1 MB/s 
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')


In [109]:
#Initialize nlp pipeline with scispacy model (for processing biomedical, scientific and clinical text)
nlp = spacy.load("en_core_web_sm", exclude=["ner"])

#Add abbreviation detector for medical abbreviations
#nlp.add_pipe("abbreviation_detector")

#I have not included the scispacy model or abbreviation detector in this version as no current benefit

In [110]:
#View components of nlp pipeline
nlp.component_names

['tok2vec', 'tagger', 'parser', 'senter', 'attribute_ruler', 'lemmatizer']

In [111]:
#Mount google drive so that colab can access files in my google drive
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


# Import letters for analysis

In [112]:
#Import letters (note that these letters do not refer to real patients)
letter_A = docx2txt.process('/content/drive/MyDrive/NLP projects/Dummy letters/dummy letters/Letter A.docx')
letter_B = docx2txt.process('/content/drive/MyDrive/NLP projects/Dummy letters/dummy letters/Letter B.docx')

In [113]:
print(letter_A)

Mr A Smith

567 Ghengis Khan Drive, Newcastle NE4 5XX



Diagnoses:

Poor mobility due to chronic pain, low confidence and previous falls 

Weight loss, anaemia and raised inflammatory markers of unknown aetiology 

Low mood secondary to poor mobility 

Breathlessness and elevated BNP awaiting echo 

Chronic back pain with degenerative changes on MRI 

Urinary frequency and incontinence 



Other diagnoses:

Complex partial epileptic seizures

Hypertension 

Osteoarthritis with bilateral total hip replacements

Atrial fibrillation 

Asthma 

Patent foramen ovale 

Previous cerebellar stroke 



Medications:

Atorvastatin

Docusate

Ferrous fumarate

Vitamin-D3

Furosemide

Gaviscon

Flutiform inhaler

Salbutamol inhaler

Lansoprazole

Losartan 

Paracetamol 

Phenytoin

Tegretol slow release

Codeine

Warfarin



Suggested changes to medication 

Reduce codeine 15 mg dose but try and use regularly 2-3 times per day 



Follow up arrangements

 I will organise ultrasound of the abdomen 

In [114]:
print(letter_B)

Mrs B Smith

Flat 1, Farringdon Road, Newcastle NE2 5DH

Date of Birth: 01/01/1932





Diagnoses: 

Falls due to gait and balance disorder 

Hyperthyroidism due to thyroxine over-replacement

Sarcopenia 

Orthostatic hypotension 



Existing diagnoses: 

Hypertension 

Hypothyroidism 

Vitamin B12 deficiency 

Previous fractured wrist

Visual impairment due to cataracts 



Medications: 

Alendronic acid 

Calcium and vitamin-D 

Bendroflumethiazide 

Vitamin B12 

Simvastatin 

Ramipril 



Medication changes: 

Please reduce thyroxine dose to 75mcg once daily



Follow up arrangements: 

I will write back when I see the results of her 24 hour electrocardiogram



For Primary care 

Could you please forward me a copy of the 24 hour blood pressure monitor that she says she had in your surgery recently? 



I saw Mrs Smith for a face-to-face appointment at the Belsay Clinic today; she was accompanied by her son. She gives a history of two falls over the last few months and feels unstea

In [115]:
#Apply nlp pipeline to letter A
doc_A = nlp(letter_A)

In [116]:
#Apply nlp pipeline to letter B
doc_B = nlp(letter_B)

# Build PhraseMatcher

Create a PhraseMatcher to identify terms that are related to Muscle Weakness - these terms are based on a crowd-sourced list created by clinicians that frequently encounter Sarcopenia.

In [117]:
#Import crowd-sourced list of sarcopenia terms
sarcopenia_terms_list = pd.read_csv("/content/drive/MyDrive/NLP projects/weakness terms/list_of_sarcopenia_terms170622.csv")
print(sarcopenia_terms_list)

             Muscle weakness
0                       Weak
1               Uses a stick
2       Uses a walking stick
3               Uses a frame
4        Uses a zimmer frame
..                       ...
85              Cannot stand
86           Unable to stand
87       No rehab potential 
88  Limited rehab potential 
89           Mechanical fall

[90 rows x 1 columns]


In [118]:
#Convert the above list to a python list: 
sarcopenia_terms_list = sarcopenia_terms_list["Muscle weakness"].tolist()
print(sarcopenia_terms_list)

['Weak', 'Uses a stick', 'Uses a walking stick', 'Uses a frame', 'Uses a zimmer frame', 'Uses a walker', 'Uses walking aid', 'Furniture walks', 'Difficulty mobilising', 'Difficulty walking', 'Difficulty standing', 'Difficulty climbing stairs', 'Cannot climb stairs', 'Bedbound', 'Hoist transfer', 'Slowed up', 'Limited mobility', 'Needs assistance', 'Difficulty carrying', 'Multiple falls', 'Fallen x times', 'Mob with AO2/1', 'Transfer with AO2/1', 'Sara stedy', 'Rotastand', 'WZF (wheeled zimmer frame)', 'ZF (zimmer frame)', '4WW', '3WW', 'Delta frame (3WW)', 'POC', 'Raised toilet seat', 'Bed lever', 'Combined toilet seat and frame', 'Skandia', 'Mowbray', 'Free standing toilet frame', 'Etwell trolley', 'Kitchen trolley', 'Riser recliner', 'Raised chair ', 'Commode', 'Mobile commode', 'Stairlift', 'Stair assessment', 'Off legs', 'FLOF', 'multifactorial fall', 'CCA or pendant alarm', 'fractured pubic rami/ramus', 'rami/ramus #', '#NOF', 'Fractured NOF', 'Fractured neck of femur', 'Strugglin

In [119]:
#Add "muscle weakness" which has been missed from list
sarcopenia_terms_list.append("muscle weakness")
print(sarcopenia_terms_list)

['Weak', 'Uses a stick', 'Uses a walking stick', 'Uses a frame', 'Uses a zimmer frame', 'Uses a walker', 'Uses walking aid', 'Furniture walks', 'Difficulty mobilising', 'Difficulty walking', 'Difficulty standing', 'Difficulty climbing stairs', 'Cannot climb stairs', 'Bedbound', 'Hoist transfer', 'Slowed up', 'Limited mobility', 'Needs assistance', 'Difficulty carrying', 'Multiple falls', 'Fallen x times', 'Mob with AO2/1', 'Transfer with AO2/1', 'Sara stedy', 'Rotastand', 'WZF (wheeled zimmer frame)', 'ZF (zimmer frame)', '4WW', '3WW', 'Delta frame (3WW)', 'POC', 'Raised toilet seat', 'Bed lever', 'Combined toilet seat and frame', 'Skandia', 'Mowbray', 'Free standing toilet frame', 'Etwell trolley', 'Kitchen trolley', 'Riser recliner', 'Raised chair ', 'Commode', 'Mobile commode', 'Stairlift', 'Stair assessment', 'Off legs', 'FLOF', 'multifactorial fall', 'CCA or pendant alarm', 'fractured pubic rami/ramus', 'rami/ramus #', '#NOF', 'Fractured NOF', 'Fractured neck of femur', 'Strugglin

In [120]:
#Now convert all items in the list to the lower case form
for item in range(len(sarcopenia_terms_list)):
  sarcopenia_terms_list[item] = sarcopenia_terms_list[item].lower()

print(sarcopenia_terms_list)

['weak', 'uses a stick', 'uses a walking stick', 'uses a frame', 'uses a zimmer frame', 'uses a walker', 'uses walking aid', 'furniture walks', 'difficulty mobilising', 'difficulty walking', 'difficulty standing', 'difficulty climbing stairs', 'cannot climb stairs', 'bedbound', 'hoist transfer', 'slowed up', 'limited mobility', 'needs assistance', 'difficulty carrying', 'multiple falls', 'fallen x times', 'mob with ao2/1', 'transfer with ao2/1', 'sara stedy', 'rotastand', 'wzf (wheeled zimmer frame)', 'zf (zimmer frame)', '4ww', '3ww', 'delta frame (3ww)', 'poc', 'raised toilet seat', 'bed lever', 'combined toilet seat and frame', 'skandia', 'mowbray', 'free standing toilet frame', 'etwell trolley', 'kitchen trolley', 'riser recliner', 'raised chair ', 'commode', 'mobile commode', 'stairlift', 'stair assessment', 'off legs', 'flof', 'multifactorial fall', 'cca or pendant alarm', 'fractured pubic rami/ramus', 'rami/ramus #', '#nof', 'fractured nof', 'fractured neck of femur', 'strugglin

In [121]:
#Define a list of terms indicative of muscle weakness
weakness_list = ["muscle weakness", "weak", "uses a stick", "uses a walking stick", 
                    "uses a frame", "uses a zimmer frame", "uses a walker", "uses a walking aid",
                    "furniture walks", "difficulty mobilising", "difficulty walking", "wheelchair"
                    "difficulty standing", "difficulty climbing stairs", "cannot climb stairs", "housebound",
                    "bedbound", "hoist transfer", "slowed up", "limited mobility", "poor mobility"
                    "needs assistance", "difficulty carrying", "falls", "fallen",
                    "found on floor", "long lie"]

In [122]:
#append "weakness list" with crowd-sourced "Sarcopenia_terms_list"
weakness_list.extend(sarcopenia_terms_list)
print(weakness_list)

['muscle weakness', 'weak', 'uses a stick', 'uses a walking stick', 'uses a frame', 'uses a zimmer frame', 'uses a walker', 'uses a walking aid', 'furniture walks', 'difficulty mobilising', 'difficulty walking', 'wheelchairdifficulty standing', 'difficulty climbing stairs', 'cannot climb stairs', 'housebound', 'bedbound', 'hoist transfer', 'slowed up', 'limited mobility', 'poor mobilityneeds assistance', 'difficulty carrying', 'falls', 'fallen', 'found on floor', 'long lie', 'weak', 'uses a stick', 'uses a walking stick', 'uses a frame', 'uses a zimmer frame', 'uses a walker', 'uses walking aid', 'furniture walks', 'difficulty mobilising', 'difficulty walking', 'difficulty standing', 'difficulty climbing stairs', 'cannot climb stairs', 'bedbound', 'hoist transfer', 'slowed up', 'limited mobility', 'needs assistance', 'difficulty carrying', 'multiple falls', 'fallen x times', 'mob with ao2/1', 'transfer with ao2/1', 'sara stedy', 'rotastand', 'wzf (wheeled zimmer frame)', 'zf (zimmer fr

In [123]:
#Initialize matcher
matcher = PhraseMatcher(nlp.vocab)

#Apply spaCy nlp pipeline to list of weakness terms
weakness_terms = [nlp(i) for i in weakness_list]


In [124]:
#Add weakness terms to PhraseMatcher
matcher.add("WEAKNESS TERM", weakness_terms)

In [125]:
#Add pattern for SARC-F score
sarcf_list = ["SARC-F", "SARC F", "SARCF", "sarc-f", "sarc f", "Sarc f", "Sarc F", "Sarc-f", "Sarc-F"]

sarcf_terms = [nlp(i) for i in sarcf_list]

matcher.add("SARC-F", sarcf_terms)


In [126]:
#Add pattern for Sarcopenia diagnosis
sarcopenia_diagnosis = ["Sarcopenia", "sarcopenia"]

sarcopenia_terms = [nlp(i) for i in sarcopenia_diagnosis]

#Add to matcher
matcher.add("Sarcopenia", sarcopenia_terms)


In [127]:
#Apply matcher to letter A
matchesA = matcher(doc_A)

for match_id, start, end in matchesA: 
  span = doc_A[start:end]
  match_id_string = nlp.vocab.strings[match_id]
  print("Match:",match_id_string, "-", span.text, "( Location = ", start, end, ")")

Match: WEAKNESS TERM - falls ( Location =  27 28 )
Match: WEAKNESS TERM - falls ( Location =  255 256 )
Match: WEAKNESS TERM - fallen ( Location =  303 304 )
Match: WEAKNESS TERM - housebound ( Location =  321 322 )
Match: SARC-F - SARC-F ( Location =  982 985 )


The matcher applied to letter A has identified: 
4x weakness terms
1x SARC-F term

In [128]:
#Apply matcher to letter B
matchesB = matcher(doc_B)

for match_id, start, end in matchesB: 
  span = doc_B[start:end]
  match_id_string = nlp.vocab.strings[match_id]
  print("Match:",match_id_string, "-", span.text, "( Location = ", start, end, ")")

Match: Sarcopenia - Sarcopenia ( Location =  39 40 )
Match: WEAKNESS TERM - falls ( Location =  180 181 )
Match: WEAKNESS TERM - falls ( Location =  211 212 )
Match: SARC-F - Sarc-F ( Location =  540 543 )
Match: WEAKNESS TERM - falls ( Location =  676 677 )
Match: Sarcopenia - sarcopenia ( Location =  687 688 )
Match: WEAKNESS TERM - falls ( Location =  732 733 )
Match: Sarcopenia - sarcopenia ( Location =  802 803 )


The matcher applied to letter B has identified:
4x weakness terms
1x SARC-F term
2x Sarcopenia terms

Using the PhraseMatcher only identifies exact matches. A rule-based matcher with more flexibility is likely to identify more matches.

# Try a rule-based matcher

In [129]:
#Import rule-based matcher
from spacy.matcher import Matcher

In [130]:
#Initialize matcher
rb_matcher = Matcher(nlp.vocab)

#Add patterns for weakness
weakness_pattern = [
                    [{"LEMMA": "fall"}], [{"LEMMA": "weak"}], [{"LOWER": "housebound"}], [{"LOWER": "bedbound"}],
                    [{"LEMMA": "use"}, {"LOWER": "a", "OP": "?"}, {"LEMMA": "walk", "OP": "?"}, {"LOWER": "stick"}],
                    [{"LEMMA": "use"}, {"LOWER": "a", "OP": "?"}, {"LEMMA": "walk", "OP": "?"}, {"LOWER": "zimmer", "OP": "?"}, {"LOWER": "frame"}],
                    [{"LEMMA": "use"}, {"LOWER": "a", "OP": "?"}, {"LEMMA": "walk"}, {"LOWER": "aid", "OP": "?"}],
                    [{"LOWER": "furniture"}, {"LEMMA": "walk"}], [{"LEMMA": "difficult"}, {"LEMMA": "walk"}],
                    [{"LEMMA": "difficult"}, {"LEMMA": "mobilise"}], [{"LEMMA": "difficult"}, {"LEMMA": "stand"}],
                    [{"LEMMA": "difficult"}, {"LOWER": "with", "OP": "?"}, {"LEMMA": "climb", "OP": "?"}, {"LEMMA": "stair"}],
                    [{"LOWER": "cannot"}, {"LEMMA": "climb", "OP": "?"}, {"LEMMA": "stair"}],
                    [{"LOWER": "can't"}, {"LEMMA": "climb", "OP": "?"}, {"LEMMA": "stair"}],
                    [{"LEMMA": "hoist"}, {"LEMMA": "transfer"}], [{"LEMMA": "slow"}, {"LOWER": "up"}],
                    [{"LEMMA": "limit"}, {"LOWER": "mobility"}],  [{"LOWER": "poor"}, {"LOWER": "mobility"}],
                    [{"LEMMA": "need"}, {"LEMMA": "assist"}], [{"LEMMA": "require"}, {"LEMMA": "assist"}],
                    [{"LEMMA": "difficult"}, {"LEMMA": "carry"}], [{"LOWER": "found"}, {"LOWER": "on"}, {"LOWER": "floor"}],
                    [{"LOWER": "long"}, {"LOWER": "lie"}], [{"LEMMA": "lack"}, {"LOWER": "of", "OP": "?"}, {"LOWER": "mobility"}],
                    [{"LEMMA": "lack"}, {"LOWER": "of", "OP": "?"}, {"LEMMA": "strength"}],
                    [{"LEMMA": "mob"}, {"LOWER": "with", "OP": "?"}, {"LOWER": "AO2"}],
                    [{"LEMMA": "mob"}, {"LOWER": "with", "OP": "?"}, {"LOWER": "AO1"}],
                    [{"LEMMA": "transfer"}, {"LOWER": "with", "OP": "?"}, {"LOWER": "AO2"}],
                    [{"LEMMA": "transfer"}, {"LOWER": "with", "OP": "?"}, {"LOWER": "AO1"}],
                    [{"LOWER": "sara stedy"}], [{"LOWER": "sara-stedy"}], [{"LOWER": "rotastand"}],
                    [{"LOWER": "wzf"}], [{"LOWER": "zf"}], [{"LOWER": "4ww"}], [{"LOWER": "3ww"}],
                    [{"LOWER": "delta-frame"}], [{"LOWER": "delta frame"}], [{"LOWER": "poc"}], 
                    [{"LEMMA": "pack"}, {"LOWER": "of", "OP": "?"}, {"LEMMA": "care"}],
                    [{"LEMMA": "raise"}, {"LOWER": "toilet", "OP": "?"}, {"LOWER": "seat"}],
                    [{"LOWER": "bed"}, {"LOWER": "lever"}], [{"LOWER": "skandia"}], [{"LOWER": "mowbray"}],
                    [{"LOWER": "free standing toilet frame"}], [{"LOWER": "etwell trolley"}], 
                    [{"LOWER": "kitchen trolley"}], [{"LOWER": "riser recliner"}], [{"LOWER": "raised chair"}],
                    [{"LOWER": "mobile", "OP": "?"}, {"LOWER": "commode"}], [{"LOWER": "stairlift"}], 
                    [{"LOWER": "stair assessment"}], [{"LOWER": "off legs"}], [{"LOWER": "flof"}],
                    [{"LOWER": "cca alarm"}], [{"LOWER": "pendant alarm"}], 
                    [{"LEMMA": "fracture"}, {"LOWER": "pubic", "OP": "?"}, {"LEMMA": "ramus"}],
                    [{"TEXT": "#"}, {"LOWER": "pubic", "OP": "?"}, {"LEMMA": "ramus"}],
                    [{"TEXT": "#NOF"}], [{"TEXT": "#"}, {"LOWER": "nof"}],
                    [{"TEXT": "#"}, {"LOWER": "neck of femur"}], [{"LEMMA": "fracture"}, {"LOWER": "nof"}],
                    [{"LEMMA": "fracture"}, {"LOWER": "nof"}], [{"LEMMA": "fracture"}, {"LOWER": "neck of femur"}],
                    [{"LEMMA": "struggle"}, {"IS_ALPHA": True, "OP": "?"}, {"LEMMA": "stair"}],
                    [{"LEMMA": "deteriorate"}, {"LEMMA": "mobile"}], [{"LEMMA": "frail"}],
                    [{"LEMMA": "downstair"}, {"LEMMA": "living"}], [{"LOWER": "cga"}],
                    [{"LOWER": "comprehensive geriatric assessment"}], [{"LOWER": "chairbound"}],
                    [{"LEMMA": "decondition"}], [{"LEMMA": "mobile"}, {"LEMMA": "decline"}],
                    [{"LOWER": "has"}, {"LEMMA": "care"}], [{"LOWER": "respite care"}], [{"LOWER": "gaunt"}],
                    [{"LEMMA": "muscle"}, {"LEMMA": "waste"}], [{"LEMMA": "cachexia"}],
                    [{"LEMMA": "muscle"}, {"LEMMA": "atrophy"}], [{"LEMMA": "loss"}, {"IS_ALPHA": True, "OP": "?"}, {"LEMMA": "function"}],
                    [{"LEMMA": "difficult"}, {"IS_ALPHA": True, "OP": "?"}, {"LOWER": "adl"}],
                    [{"LEMMA": "depend"}, {"IS_ALPHA": True, "OP": "?"}, {"LOWER": "care"}],
                    [{"LEMMA": "fragile"}], [{"LOWER": "immobile"}], [{"LOWER": "anorexia"}],
                    [{"LEMMA": "reduce"}, {"LEMMA": "strength"}], [{"LEMMA": "limit"}, {"LEMMA": "strength"}], 
                    [{"LEMMA": "quad"}, {"LEMMA": "waste"}], [{"LOWER": "low"}, {"LEMMA": "muscle"}, {"LOWER": "bulk"}],
                    [{"LOWER": "loss"}, {"LOWER": "of"}, {"LEMMA": "quad", "OP": "?"}, {"LOWER": "bulk"}],
                    [{"LOWER": "stuck"}, {"IS_ALPHA": True, "OP": "?"}, {"LOWER": "toilet"}],
                    [{"LEMMA": "loss"}, {"IS_ALPHA": True, "OP": "?"}, {"LOWER": "power"}],
                    [{"LOWER": "can't"}, {"LOWER": "stand"}], [{"LOWER": "cannot"}, {"LOWER": "stand"}],
                    [{"LOWER": "unable"}, {"IS_ALPHA": True, "OP": "?"}, {"LOWER": "stand"}],
                    [{"LOWER": "no"}, {"LEMMA": "rehab"}, {"LOWER": "potential"}],
                    [{"LOWER": "limited"}, {"LEMMA": "rehab"}, {"LOWER": "potential"}],
                    [{"LOWER": "mechanical fall"}]
]


#Add patternS to matcher
rb_matcher.add("WEAKNESS TERM", weakness_pattern)


In [131]:
#Apply rb_matcher to letter A

rb_matchesA = rb_matcher(doc_A)

for match_id, start, end in rb_matchesA: 
  span = doc_A[start:end]
  match_id_string = nlp.vocab.strings[match_id]
  print("Match:",match_id_string, "-", span.text, "( Location = ", start, end, ")")

Match: WEAKNESS TERM - Poor mobility ( Location =  16 18 )
Match: WEAKNESS TERM - falls ( Location =  27 28 )
Match: WEAKNESS TERM - poor mobility ( Location =  45 47 )
Match: WEAKNESS TERM - poor mobility ( Location =  252 254 )
Match: WEAKNESS TERM - falls ( Location =  255 256 )
Match: WEAKNESS TERM - fallen ( Location =  303 304 )
Match: WEAKNESS TERM - lack of mobility ( Location =  311 314 )
Match: WEAKNESS TERM - housebound ( Location =  321 322 )
Match: WEAKNESS TERM - lacks strength ( Location =  363 365 )
Match: WEAKNESS TERM - falling ( Location =  370 371 )
Match: WEAKNESS TERM - fall ( Location =  398 399 )
Match: WEAKNESS TERM - lack of strength ( Location =  1100 1103 )


In [132]:
#Apply rb_matcher to letter B
rb_matchesB = rb_matcher(doc_B)

for match_id, start, end in rb_matchesB: 
  span = doc_B[start:end]
  match_id_string = nlp.vocab.strings[match_id]
  print("Match:",match_id_string, "-", span.text, "( Location = ", start, end, ")")

Match: WEAKNESS TERM - falls ( Location =  180 181 )
Match: WEAKNESS TERM - falls ( Location =  211 212 )
Match: WEAKNESS TERM - fell ( Location =  233 234 )
Match: WEAKNESS TERM - fell ( Location =  333 334 )
Match: WEAKNESS TERM - falling ( Location =  596 597 )
Match: WEAKNESS TERM - fell ( Location =  615 616 )
Match: WEAKNESS TERM - falls ( Location =  676 677 )
Match: WEAKNESS TERM - falls ( Location =  732 733 )


The rule-based matcher was successful in identifying more terms. 



# Split "WEAKNESS TERM" into 2 lists: weakness vs. loss of muscle bulk

In [133]:
#Re-define list of weakness paterns
weakness_pattern = [
                    [{"LEMMA": "weak"}], [{"LOWER": "housebound"}], [{"LOWER": "bedbound"}],
                    [{"LEMMA": "use"}, {"LOWER": "a", "OP": "?"}, {"LEMMA": "walk", "OP": "?"}, {"LOWER": "stick"}],
                    [{"LEMMA": "use"}, {"LOWER": "a", "OP": "?"}, {"LEMMA": "walk", "OP": "?"}, {"LOWER": "zimmer", "OP": "?"}, {"LOWER": "frame"}],
                    [{"LEMMA": "use"}, {"LOWER": "a", "OP": "?"}, {"LEMMA": "walk"}, {"LOWER": "aid", "OP": "?"}],
                    [{"LOWER": "furniture"}, {"LEMMA": "walk"}], [{"LEMMA": "difficult"}, {"LEMMA": "walk"}],
                    [{"LEMMA": "difficult"}, {"LEMMA": "mobilise"}], [{"LEMMA": "difficult"}, {"LEMMA": "stand"}],
                    [{"LEMMA": "difficult"}, {"LOWER": "with", "OP": "?"}, {"LEMMA": "climb", "OP": "?"}, {"LEMMA": "stair"}],
                    [{"LOWER": "cannot"}, {"LEMMA": "climb", "OP": "?"}, {"LEMMA": "stair"}],
                    [{"LOWER": "can't"}, {"LEMMA": "climb", "OP": "?"}, {"LEMMA": "stair"}],
                    [{"LEMMA": "hoist"}, {"LEMMA": "transfer"}], [{"LEMMA": "slow"}, {"LOWER": "up"}],
                    [{"LEMMA": "limit"}, {"LOWER": "mobility"}],  [{"LOWER": "poor"}, {"LOWER": "mobility"}],
                    [{"LEMMA": "need"}, {"LEMMA": "assist"}], [{"LEMMA": "require"}, {"LEMMA": "assist"}],
                    [{"LEMMA": "difficult"}, {"LEMMA": "carry"}], [{"LOWER": "found"}, {"LOWER": "on"}, {"LOWER": "floor"}],
                    [{"LOWER": "long"}, {"LOWER": "lie"}], [{"LEMMA": "lack"}, {"LOWER": "of", "OP": "?"}, {"LOWER": "mobility"}],
                    [{"LEMMA": "lack"}, {"LOWER": "of", "OP": "?"}, {"LEMMA": "strength"}],
                    [{"LEMMA": "mob"}, {"LOWER": "with", "OP": "?"}, {"LOWER": "AO2"}],
                    [{"LEMMA": "mob"}, {"LOWER": "with", "OP": "?"}, {"LOWER": "AO1"}],
                    [{"LEMMA": "transfer"}, {"LOWER": "with", "OP": "?"}, {"LOWER": "AO2"}],
                    [{"LEMMA": "transfer"}, {"LOWER": "with", "OP": "?"}, {"LOWER": "AO1"}],
                    [{"LOWER": "sara stedy"}], [{"LOWER": "sara-stedy"}], [{"LOWER": "rotastand"}],
                    [{"LOWER": "wzf"}], [{"LOWER": "zf"}], [{"LOWER": "4ww"}], [{"LOWER": "3ww"}],
                    [{"LOWER": "delta-frame"}], [{"LOWER": "delta frame"}], [{"LOWER": "poc"}], 
                    [{"LEMMA": "pack"}, {"LOWER": "of", "OP": "?"}, {"LEMMA": "care"}],
                    [{"LEMMA": "raise"}, {"LOWER": "toilet", "OP": "?"}, {"LOWER": "seat"}],
                    [{"LOWER": "bed"}, {"LOWER": "lever"}], [{"LOWER": "skandia"}], [{"LOWER": "mowbray"}],
                    [{"LOWER": "free standing toilet frame"}], [{"LOWER": "etwell trolley"}], 
                    [{"LOWER": "kitchen trolley"}], [{"LOWER": "riser recliner"}], [{"LOWER": "raised chair"}],
                    [{"LOWER": "mobile", "OP": "?"}, {"LOWER": "commode"}], [{"LOWER": "stairlift"}], 
                    [{"LOWER": "stair assessment"}], [{"LOWER": "off legs"}], [{"LOWER": "flof"}],
                    [{"LOWER": "cca alarm"}], [{"LOWER": "pendant alarm"}], 
                    [{"LEMMA": "fracture"}, {"LOWER": "pubic", "OP": "?"}, {"LEMMA": "ramus"}],
                    [{"TEXT": "#"}, {"LOWER": "pubic", "OP": "?"}, {"LEMMA": "ramus"}],
                    [{"TEXT": "#NOF"}], [{"TEXT": "#"}, {"LOWER": "nof"}],
                    [{"TEXT": "#"}, {"LOWER": "neck of femur"}], [{"LEMMA": "fracture"}, {"LOWER": "nof"}],
                    [{"LEMMA": "fracture"}, {"LOWER": "nof"}], [{"LEMMA": "fracture"}, {"LOWER": "neck of femur"}],
                    [{"LEMMA": "struggle"}, {"IS_ALPHA": True, "OP": "?"}, {"LEMMA": "stair"}],
                    [{"LEMMA": "deteriorate"}, {"LEMMA": "mobile"}], [{"LEMMA": "frail"}],
                    [{"LEMMA": "downstair"}, {"LEMMA": "living"}], [{"LOWER": "cga"}],
                    [{"LOWER": "comprehensive geriatric assessment"}], [{"LOWER": "chairbound"}],
                    [{"LEMMA": "decondition"}], [{"LEMMA": "mobile"}, {"LEMMA": "decline"}],
                    [{"LOWER": "has"}, {"LEMMA": "care"}], [{"LOWER": "respite care"}],  
                    [{"LEMMA": "loss"}, {"IS_ALPHA": True, "OP": "?"}, {"LEMMA": "function"}],
                    [{"LEMMA": "difficult"}, {"IS_ALPHA": True, "OP": "?"}, {"LOWER": "adl"}],
                    [{"LEMMA": "depend"}, {"IS_ALPHA": True, "OP": "?"}, {"LOWER": "care"}],
                    [{"LEMMA": "fragile"}], [{"LOWER": "immobile"}], 
                    [{"LEMMA": "reduce"}, {"LEMMA": "strength"}], [{"LEMMA": "limit"}, {"LEMMA": "strength"}], 
                    [{"LOWER": "stuck"}, {"IS_ALPHA": True, "OP": "?"}, {"LOWER": "toilet"}],
                    [{"LEMMA": "loss"}, {"IS_ALPHA": True, "OP": "?"}, {"LOWER": "power"}],
                    [{"LOWER": "can't"}, {"LOWER": "stand"}], [{"LOWER": "cannot"}, {"LOWER": "stand"}],
                    [{"LOWER": "unable"}, {"IS_ALPHA": True, "OP": "?"}, {"LOWER": "stand"}],
                    [{"LOWER": "no"}, {"LEMMA": "rehab"}, {"LOWER": "potential"}],
                    [{"LOWER": "limited"}, {"LEMMA": "rehab"}, {"LOWER": "potential"}],
                    [{"LOWER": "mechanical fall"}]
]

#Define list for loss of muscle bulk
muscle_loss_list = [
                    [{"LOWER": "gaunt"}],
                    [{"LEMMA": "muscle"}, {"LEMMA": "waste"}], [{"LEMMA": "cachexia"}],
                    [{"LEMMA": "muscle"}, {"LEMMA": "atrophy"}], [{"LOWER": "anorexia"}],
                    [{"LEMMA": "quad"}, {"LEMMA": "waste"}], [{"LOWER": "low"}, {"LEMMA": "muscle"}, {"LOWER": "bulk"}],
                    [{"LOWER": "loss"}, {"LOWER": "of"}, {"LEMMA": "quad", "OP": "?"}, {"LOWER": "bulk"}]
]

#Add patternS to matcher
rb_matcher.add("WEAKNESS TERM", weakness_pattern)
rb_matcher.add("MUSCLE BULK LOSS", muscle_loss_list)


# Entity-ruler for visualization

In [134]:
#Initialize NER ruler

ruler = nlp.add_pipe("entity_ruler", after = "lemmatizer")

In [135]:
#Add weakness patterns to NER ruler
for item in weakness_pattern:
  ruler.add_patterns([{"label": "WEAKNESS TERM", "pattern": item}])

#Add muscle bulk liss patterns to the NER ruler
for item in muscle_loss_list:
  ruler.add_patterns([{"label": "MUSCLE BULK LOSS", "pattern": item}])

#Add sarcopenia diagnosis to NER ruler
for item in sarcopenia_diagnosis:
  ruler.add_patterns([{"label": "SARCOPENIA", "pattern": item}])

#Add SARC-F to NER ruler
for item in sarcf_list:
  ruler.add_patterns([{"label": "SARC-F", "pattern": item}])


In [136]:
#Apply nlp pipeline to letter A
doc_A = nlp(letter_A)

In [137]:
#Visualise weakness entities in Letter A
#Gradient colors from css-gradient.com

def get_entity_options():
  entities = ["WEAKNESS TERM", "MUSCLE BULK LOSS", "SARCOPENIA", "SARC-F"]
  colors = {"WEAKNESS TERM": 'linear-gradient(90deg, #ffff66, #ff6600)', "SARCOPENIA": 'linear-gradient(90deg, #aa9cfc, #fc9ce7)',
            "SARC-F": 'linear-gradient(180deg, #66ffcc, #abf763)'}
  options = {"ents": entities, "colors": colors}
  return options
options = get_entity_options()

displacy.render(doc_A, style = 'ent', options=options, jupyter = True)

In [138]:
#Apply nlp pipeline to letter B
doc_B = nlp(letter_B)

In [139]:
#Visualise weakness entities in Letter B

def get_entity_options():
  entities = ["WEAKNESS TERM", "SARCOPENIA", "SARC-F"]
  colors = {"WEAKNESS TERM": 'linear-gradient(90deg, #ffff66, #ff6600)', "MUSCLE BULK LOSS": 'linear-gradient(90deg, #e28ac8, #c889dd)', "SARCOPENIA": 'linear-gradient(90deg, #aa9cfc, #fc9ce7)',
            "SARC-F": 'linear-gradient(180deg, #66ffcc, #abf763)'}
  options = {"ents": entities, "colors": colors}
  return options
options = get_entity_options()

displacy.render(doc_B, style = 'ent', options=options, jupyter = True)

# Negation detection

In [140]:
#Define termset as clinical
ts = termset("en_clinical_sensitive")

#Add negex to nlp pipeline
nlp.add_pipe("negex", config={
    "ent_types":["SARCOPENIA", "MUSCLE BULK LOSS", "WEAKNESS TERM","SARC-F"],
    "neg_termset":ts.get_patterns()
})

<negspacy.negation.Negex at 0x7f9fa96459d0>

In [141]:
#View termset patterns in use
print(ts.get_patterns())

{'pseudo_negations': ['not able to be', 'not certain if', 'not certain whether', 'not necessarily', 'without any further', 'without difficulty', 'without further', 'might not', 'not only', 'no increase', 'no significant change', 'no change', 'no definite change', 'not extend', 'not cause', 'gram negative', 'not rule out', 'not ruled out', 'not been ruled out', 'not drain', 'no suspicious change', 'no interval change', 'no significant interval change'], 'preceding_negations': ['monitor for', "don't", "isn't", 'rule out', 'no sign of', 'symptoms atypical', 'free of', 'not demonstrate', 'ruled patient out', 'test for', 'without sign of', 'arent', 'werent', 'without any reactions or signs of', 'never developed', 'did not exhibit', 'rule the patient out', 'rules out', 'doubt', "couldn't", 'absence of', 'negative for', 'without', 'not', 'rule patient out', "can't", 'if you get', 'no further', 'no evidence of', "weren't", 'couldnt', 'tested for', 'never', 'denies', 'r/o', 'without indication 

In [142]:
#Negation termset needs some modification as it is not fully accurate
ts.remove_patterns({"pseudo_negations": ["no further"]})
ts.add_patterns({"preceding_negations": ["no further"]})
ts.remove_patterns({"preceding_negations": ["educating the patient", "concern for", "history of", "taught the patient", "teach the patient", "h/o", "teaching the patient", "educated the patient", "versus", "supposed", "leads to", "educate the patient"]})

#Check that termset has been modified
print(ts.get_patterns())

{'pseudo_negations': ['not able to be', 'not certain if', 'not certain whether', 'not necessarily', 'without any further', 'without difficulty', 'without further', 'might not', 'not only', 'no increase', 'no significant change', 'no change', 'no definite change', 'not extend', 'not cause', 'gram negative', 'not rule out', 'not ruled out', 'not been ruled out', 'not drain', 'no suspicious change', 'no interval change', 'no significant interval change'], 'preceding_negations': ['absence of', 'cant', 'monitor for', 'negative for', 'without', "don't", "isn't", 'not', 'rule him out', 'rule patient out', 'denies', 'rule out', 'no complaints of', 'r/o', 'no sign of', 'if you experience', 'without indication of', 'no signs of', "didn't", "can't", 'ro', 'monitored for', 'if you get', 'declined', 'symptoms atypical', 'free of', 'not demonstrate', 'ruled patient out', "wasn't", 'ruled him out', 'ruled the patient out', 'test for', 'no further', 'without sign of', 'isnt', 'ruled out', "aren't", 'a

In [143]:
# View any negations in letter A, True indicates a negation
for e in doc_A.ents:
  print(e.text, e._.negex)

Poor mobility False
poor mobility False
poor mobility False
lack of mobility False
housebound False
lacks strength False
SARC-F False
lack of strength False


In [144]:
# View any negations in letter B, True indicates a negation
for e in doc_B.ents:
  print(e.text, e._.negex)

Sarcopenia False
Sarc-F False
sarcopenia False
sarcopenia False


As there are no negative entities available in letters A and B, I will create a modified letter with negations, called Letter nA. 

In [145]:
#Create letter with some negative entities that have been added
letter_nA = '''Diagnoses: Poor mobility due to chronic pain, low confidence and no previous falls. Does not have sarcopenia.
The patient report no chronic pain,
Weight loss, anaemia and raised inflammatory markers of unknown aetiology 
Low mood secondary to poor mobility 
Breathlessness and elevated BNP awaiting echo 
Chronic back pain with degenerative changes on MRI 
Urinary frequency and incontinence 

The patient shows no difficulty walking.

The patient has low muscle bulk.

Thank you for referring Mr Smith who attended for a face-to-face assessment at the Belsay Clinic accompanied by his niece today. He gives a history of poor mobility and falls; both these problems are longstanding and indeed he was assessed at the Belsay clinic by my colleague Dr Boyle back in 2019 and had a course of physiotherapy at the time. More recently he has lost confidence, his balance is worse and he has fallen more. He is clear that the lack of mobility is his greatest frustration; he is not housebound unless he can be accompanied out of the house by his niece and even then he gets out only in a wheelchair. He thinks that a combination of things are stopping him being more mobile: he feels that he lacks strength, has a fear of falling and also has pain in the front and back of his legs. This is worse at night and keeps him awake at times. His last fall was two months ago. He notes feeling unsteady on standing but on close questioning this sensation did not appear to be consistent with vertigo. 
He also complains of pain across her shoulders starting in the right arm and going across the shoulders to the left arm. This has been present for about six months and is not related to exertion. It is no worse in the morning and he does not feel particularly stiff. He also describes stabbing pain over his right eye that then migrates over the top of his head. This does not seem to be related to stressful events. He has lost a considerable amount of weight - 8kg from March to August this year, and another five kg since as his weight in clinic today was 71.5 kg. His appetite is not as good as it usually is but he denies nausea or vomiting; he avoids constipation by taking laxatives. He does not complain of toothache, does not choke on food or drink and says that he eats reasonably well. He has seen a dietitian. He complains of occasional breathlessness and cough at night and brings up a small quantity of phlegm but this is little changed. He has not noticed any blood in his stools. He complains of a rash on his legs, more on the right than the left, and that her legs are often cool. He notes that this rash has been present for at least 10 years and I also note recent vascular duplex studies that suggest good arterial flow in the legs. He has urinary frequency and is often not aware of when he needs to go to the loo; he also complains of a few minutes of crampy lower abdominal pain after micturition. Perhaps unsurprisingly he is somewhat low in mood but still enjoys going out and seeing people. He denies suicidal ideation or early morning waking. He admits that he tends to live in the past more nowadays. He does not complain of any subjective memory problems. 
On examination, he was alert and engaged well with the consultation. There was no jaundice, anaemia, cyanosis, clubbing or lymphadenopathy. There was a confluent discolouration on both lower legs which was cool to the touch, not raised or tender and was not blanching. Similar changes were seen above the knees but looked more petechial in nature. Heart sounds were normal, JVP was not raised and the chest was clear. Abdominal examination revealed a 2cm liver edge but no ascites or masses. There was an old scar noted on his left upper arm where a basal cell carcinoma has been removed previously. He was mildly tender across the shoulders and in the thoracic paraspinal muscles; tenderness was not confined to the spine. He was able to raise her arms above her head. Neurological examination revealed normal tone, power and coordination. Cranial nerve examination was normal with no nystagmus. He was able to rise unaided from a chair quite quickly but was very reluctant to take a step forward. However he could walk a few paces with support from one person albeit unsteadily. There was no bradykinesia or tremor noted. 
His SarcScreen revealed a SARC-F score of 10/10, and Fried frailty score of 2/5 denoting prefrailty. He was unable to attempt the 3m walk but his maximum grip strength was 22 kg. GDS was 10/15 and MMSE was 28/30. An active stand showed a blood pressure of 141/85 with no significant drop although he was unsteady on standing. His 12 lead electrocardiogram showed atrial fibrillation at 80 per minute with nil else of note. 
There are clearly a complex set of interlocking problems here. He is rather stronger than he thinks he is on although his balance is poor, I think it is a lack of confidence rather than a lack of strength that is preventing him from mobilising more. He accepts this and is happy for us to refer to physiotherapy for strength and balance training which I think will help build his confidence. His mood is low but not sufficiently low to denote depression and I think if we can improve his mobility, his mood will improve. Some of his chronic back pain is undoubtedly due to the degenerative changes seen on MRI and I understand that surgery is not going to be an option for this. I would be reluctant to change her painkillers much at the moment though given that he is already on antiepileptic medications and so adding in other agents for chronic pain may not help much but might interact with these medications. 
We discussed the fine balance between benefits and side effects of all of these medications today. What would perhaps be helpful though is to reduce the codeine to 15 mg so that he can try and take this more regularly; he finds that the 30 mg dose makes him feel very unsteady on her feet. Having said all of this, it is clear that the blood tests suggest some issues that require further investigation. His breathlessness on exertion and raised BNP suggest that investigation for LV systolic dysfunction causing heart failure would be beneficial and I note that he has already had an echo requested. His weight loss is quite dramatic and this, together with the admittedly longstanding rash on his legs, anaemia, raised ESR and high platelet counts suggest that there is an inflammatory disorder present. Whether this is autoimmune or due to another aetiology is unclear. I have requested bloods today including repeat U&Es, liver function tests, calcium, myeloma screen, CRP and ESR, autoimmune screen, creatine kinase and iron studies. I have also requested an ultrasound of the abdomen to investigate the liver edge that I could feel; I note that a recent chest x-ray showed normal heart size and clear lung fields. Despite the possible finding of horizontal nystagmus by Community Nursing colleagues I could not find any evidence of this today and I don’t think we need to progress to brain scanning at present. 
Once we have got the blood and ultrasound results, I will phone Mr Smith again and discuss the best way forward. It may be that if we do not find another cause for his inflammatory condition, a trial of steroids might be warranted as some of the features including the shoulder discomfort could be consistent with polymyalgia rheumatica. I have explained this today but also explained that I think we need to investigate further before diving in with treatment in case this is not the diagnosis. "
'''

#apply nlp pipeline to letter_nA
doc_nA = nlp(letter_nA)

In [146]:
print(doc_nA)

Diagnoses: Poor mobility due to chronic pain, low confidence and no previous falls. Does not have sarcopenia.
The patient report no chronic pain,
Weight loss, anaemia and raised inflammatory markers of unknown aetiology 
Low mood secondary to poor mobility 
Breathlessness and elevated BNP awaiting echo 
Chronic back pain with degenerative changes on MRI 
Urinary frequency and incontinence 

The patient shows no difficulty walking.

The patient has low muscle bulk.

Thank you for referring Mr Smith who attended for a face-to-face assessment at the Belsay Clinic accompanied by his niece today. He gives a history of poor mobility and falls; both these problems are longstanding and indeed he was assessed at the Belsay clinic by my colleague Dr Boyle back in 2019 and had a course of physiotherapy at the time. More recently he has lost confidence, his balance is worse and he has fallen more. He is clear that the lack of mobility is his greatest frustration; he is not housebound unless he can

In [147]:
#Visualise weakness entities in doc_nA

def get_entity_options():
  entities = ["WEAKNESS TERM", "MUSCLE BULK LOSS", "SARCOPENIA", "SARC-F"]
  colors = {"WEAKNESS TERM": 'linear-gradient(90deg, #ffff66, #ff6600)', "MUSCLE BULK LOSS": 'linear-gradient(90deg, #b6daf9, #95b5e2)', "SARCOPENIA": 'linear-gradient(90deg, #aa9cfc, #fc9ce7)',
            "SARC-F": 'linear-gradient(180deg, #66ffcc, #abf763)'}
  options = {"ents": entities, "colors": colors}
  return options
options = get_entity_options()

displacy.render(doc_nA, style = 'ent', options=options, jupyter = True)

In [148]:
# View any negations in letter_nA, True indicates a negation
for e in doc_nA.ents:
  print(e.text, e._.negex)

Poor mobility False
sarcopenia True
poor mobility False
low muscle bulk False
poor mobility False
lack of mobility False
housebound True
lacks strength False
SARC-F False
lack of strength False


Now highlight negative entities

In [149]:
def add_neg_entities(doc):
    new_ents = []
    for ent in doc.ents:
        # Only check for entity if negex is true
        if ent._.negex:
            #print(ent.label_)
            #new_ent = Span(doc, ent.start, ent.end, label=ent.label, label_="NEG_ENTITY")
            ent.label_="NEG_ENTITY"
            
        new_ents.append(ent)
    
    doc.ents = new_ents
    return doc
    
doc_nAA = add_neg_entities(doc_nA)

In [150]:
#Visualise entities in Letter nA - weakness related phrases are recongized
#function to modify options for displacy NER visualization
def get_entity_options():
    entities = ["WEAKNESS TERM", "SARCOPENIA", "SARC-F","NEG_ENTITY"]
    colors = {"WEAKNESS TERM": 'linear-gradient(90deg, #ffff66, #ff6600)', "MUSCLE BULK LOSS": 'linear-gradient(90deg, #e28ac8, #c889dd)', "SARCOPENIA": 'linear-gradient(90deg, #aa9cfc, #fc9ce7)',
            "SARC-F": 'linear-gradient(180deg, #66ffcc, #abf763)', "NEG_ENTITY":'linear-gradient(0deg, rgba(255,0,0,0), rgba(255,0,0,1))'}
    options = {"ents": entities, "colors": colors}    
    return options
options = get_entity_options()

displacy.render(doc_nAA, style='ent', options=options, jupyter = True)

In [151]:
#Create list of entities in doc_nA
ent_nA_list = []

for ent in doc_nA.ents:
  print(ent.text, ent.label_)

Poor mobility WEAKNESS TERM
sarcopenia NEG_ENTITY
poor mobility WEAKNESS TERM
low muscle bulk MUSCLE BULK LOSS
poor mobility WEAKNESS TERM
lack of mobility WEAKNESS TERM
housebound NEG_ENTITY
lacks strength WEAKNESS TERM
SARC-F SARC-F
lack of strength WEAKNESS TERM


# Create dataframe with document features

In [153]:
#Append lists for feature text and labels
feature_text = []
feature_label = []

for ent in doc_nA.ents:
  feature_text.append(ent.text)
  feature_label.append(ent.label_)

#Zip these lists together and use to create a dataframe
features_list = list(zip(feature_text, feature_label))

document_features = pd.DataFrame(features_list, columns=["Feature text", "Feature label"])

#View head of dataframe
document_features

Unnamed: 0,Feature text,Feature label
0,Poor mobility,WEAKNESS TERM
1,sarcopenia,NEG_ENTITY
2,poor mobility,WEAKNESS TERM
3,low muscle bulk,MUSCLE BULK LOSS
4,poor mobility,WEAKNESS TERM
5,lack of mobility,WEAKNESS TERM
6,housebound,NEG_ENTITY
7,lacks strength,WEAKNESS TERM
8,SARC-F,SARC-F
9,lack of strength,WEAKNESS TERM
