## Task : Automatically Identify Numerical and Nominal Entities,Age from free text (clinical EHR)


### NLP(Natural Language Processing)technique can be used to identify Entities from free text.SpaCy is an open-source python library used for Natural Language Processing(NLP). 
Named Entity Recognition(NER) is the NLP task that recognizes entities in a given text. NER is a model which performs two tasks: Detect and Categorize

### Build a Custom NER model using spaCy 3.0  

Followed the blog and the link to it is [https://turbolab.in/build-a-custom-ner-model-using-spacy-3-0/]


### Load a spaCy model and check if it has ner pipeline

#### Install Spacy

In [4]:
#!pip install spacy
#!python -m spacy download en

In [11]:
## import the library
#import spacy
#import collections
#from collections import Counter

#### Check the spaCy Version

In [46]:
!python -m spacy info

[1m

spaCy version    3.1.4                         
Location         /home/sobha/anaconda3/lib/python3.7/site-packages/spacy
Platform         Linux-5.11.0-44-generic-x86_64-with-debian-bullseye-sid
Python version   3.7.6                         
Pipelines        en_core_web_sm (3.1.0)        



SpaCy provides the following four pre-trained models with MIT license for the English language:

en_core_web_sm(12 mb)
en_core_web_md(43 mb)
en_core_web_lg(741 mb)
en_core_web_trf(438 mb)

### Load spaCy's en_core_web_sm' model

In [2]:
#!python -m spacy download en_core_web_sm


In [48]:
# import the language training model
import spacy 
nlp = spacy.load('en_core_web_sm')
nlp.pipe_names

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']

## ner is in the pipeline, let’s test how the entity detection will work on a sentence.

In [49]:
sentence = 'Aged 20 or older, myocardial ischemia, able to undergo PTCA, stenting and CABG'
doc = nlp(sentence)

from spacy import displacy
displacy.render(doc, style='ent', jupyter=True)

Currently the spacy 'en_core_web_sm' model recognizes the phrase' 20 or older' as DATE but the current goal is to recognise 'Aged 20 or older ' as Numeric Entity and 'myocardial ischemia' as Nominal Entity.So inorder to achieve this a custom NER model has to be built.

### Observing doc to see how entities are being identified/tagged by the model.

In [50]:
[(X, X.ent_iob_, X.ent_type_) for X in doc if X.ent_type_]

[(20, 'B', 'DATE'),
 (or, 'I', 'DATE'),
 (older, 'I', 'DATE'),
 (PTCA, 'B', 'ORG')]

## Steps to build the custom NER model for detecting the Numeric and Nominal Entities in spaCy 3.0:

### 1. Annotate the data to train the model.

#### In order to do annotate the data i have used SpaCy NER annotation tool by agateteam- http://agateteam.org/spacynerannotate/ 

#### Asssigned the annotated data to "trainData"

In [66]:
trainData=[
("Aged 22 and older, undergoing 1 or 2 level spinal decompression",{"entities":[(0,18,"Numeric")]}),
("Age 50 or over Diagnosed Giant Cell Arthritis Headache, jaw pain, vision loss Shoulder and/or hip pain",{"entities":[(0,15,"Numeric"),(25,46,"Nominal")]}),
("Aged between 50-80.Have been diagnosed with Polymyalgia Rheumatica (PMR).Have experienced improvement in PMR symptoms with Prednisone.",{"entities":[(0,18,"Numeric"),(44,72,"Nominal")]}),
("Aged 18 years or older Diagnosis of Ankylosing Spondylitis or Axial Spondyloarthritis",{"entities":[(0,23,"Numeric"),(36,85,"Nominal")]}),
("Aged over 18 years Confirmed diagnosis of bronchiectasis within 5 years Pulmonary exacerbation requiring antibiotics in the past 12 months",{"entities":[(0,18,"Numeric"),(42,56,"Nominal")]}),
("Aged between 40 and 85 years Diagnosed with COPD A history of cardiovascular disease, including heart failure, ischaemic heart disease, tachyarrhythmias, and hypertension",{"entities":[(0,29,"Numeric"),(44,48,"Nominal")]}),
("Aged 18-85, diagnosed non-cystic fibrosis bronchiectasis, two or more exacerbations in the past 12 months.",{"entities":[(0,10,"Numeric"),(22,56,"Nominal")]}),
("Aged 18 or older, diagnosis of IgAN, on a stable dose of an ACE inhibitor or ARB / unable to tolerate this therapy.",{"entities":[(0,16,"Numeric"),(31,35,"Nominal")]}),
("Aged 18 years or older, hypertensive, no more than 3 current blood-pressure lowering medications.",{"entities":[(0,23,"Numeric"),(24,36,"Nominal")]}),
(" 18 years of age, undergoing a surgical procedure resulting in a closed approximately linear incision",{"entities":[(1,16,"Numeric")]}),
("Are 18-50 years of age. Have a current/recent skin infection.No ongoing illness or history of serious chronic/progressive disease.",{"entities":[(0,22,"Numeric"),(24,60,"Nominal")]}),
("Healthy woman aged between 18 and 49 years of age Pregnant with no known increased risk for complications Singleton pregnancy",{"entities":[(14,49,"Numeric")]}),
("Aged between 5 - 14 years Newly diagnosed with Acute rheumatic fever",{"entities":[(0,25,"Numeric"),(47,68,"Nominal")]}),
("Viral respiratory disease Acute presentation to Middlemore Emergency Department or Intensive care unit",{"entities":[(0,26,"Nominal")]}),
("Age>=18 Diagnosis of Essential Thrombocythemia Failure of standard therapy, e.g. Hydroxyurea",{"entities":[(0,7,"Numeric"),(31,46,"Nominal")]}),
("Age >= 18 Diagnosed with Relapsed or Refractory Multiple Myeloma.",{"entities":[(0,10,"Numeric"),(25,65,"Nominal")]}),
("Age >= 18 Diagnosed with Myelodysplastic Syndrome (MDS)Anemia",{"entities":[(0,9,"Numeric"),(25,61,"Nominal")]}),
("age >= 18 Newly diagnosed multiple myeloma Ineligible for autologous stem cell transplant No prior treatment for multiple myeloma except localised radiotherapy or a short course of steroids",{"entities":[(0,9,"Numeric"),(26,43,"Nominal")]}),
("Age>=18 Diagnosed with Relapsed or Refractory Multiple Myeloma No long-term use of Prednisone",{"entities":[(0,7,"Numeric"),(23,63,"Nominal")]}),
("Aged 18 to 65 Chronic Hepatitis B infection On stable NUC treatment (e.g Entecavir orTenofovir)",{"entities":[(0,13,"Numeric"),(13,43,"Nominal")]}),
("Age less than 65 years, have diagnosed ulcerative colitis and experiencing increase in daily stool frequency and/or rectal bleeding, no allergy to antibiotic vancomycin, willing to undergo two colonoscopy procedures as part of the study (3 months apart)",{"entities":[(0,22,"Numeric"),(39,57,"Nominal")]}),
("Aged between 18-70 years old Diagnosis of Non-alcoholic Fatty Liver Disease Have a BMI between 25 to 50 kg/m2,",{"entities":[(0,28,"Numeric"),(56,75,"Nominal"),(83,109,"Numeric")]}),
("Male or postmenopausal female, aged 18-80 years old, diagnosed Nonalcoholic Steatohepatitis, no other chronic liver disease",{"entities":[(31,91,"Nominal")]}),
 ("Type 1 Myocardial infarction Have at least two coronary artery territories of disease > 50% Be on treatment for Diabetes",{"entities":[(0,29,"Nominal"),(47,91,"Nominal"),(112,120,"Nominal")]}),
("Aged 20 or older, myocardial ischemia, able to undergo PTCA, stenting and CABG,recent travel history, able to speak English",{"entities":[(0,16,"Numeric"),(18,37,"Nominal")]}),
("Patients with biopsy-proven metastatic carcinoid tumors or other neuroendocrine tumors (Islet cell, Gastrinomas and VIPomas) with at least one measurable lesion (other than bone) that has either not been previously irradiated or if previously irradiated has demonstrated progression since the radiation therapy ",{"entities":[(14,20,"Nominal"),(28,55,"Nominal"),(65,87,"Nominal"),(87,124,"Nominal"),]}),
("Female patients must have a negative serum pregnancy test at screening. (Not applicable to patients with bilateral oophorectomy and/or hysterectomy or to those patients who are postmenopausal.) ",{"entities":[(37,52,"Nominal"),(105,127,"Nominal"),(135,147,"Nominal"),(177,191,"Nominal")]}),
("Must have a life expectancy of greater than three (3) months ",{"entities":[(11,60,"Numeric")]}),
("Patients on Sandostatin Lar (long acting somatostatin analogue) must be on a stable dose for 30 days prior to study entry and short acting somatostatin analogues must be judged to be on a clinically stable dose by the investigator prior to study entry ",{"entities":[(12,27,"Nominal"),(29,62,"Nominal"),(139,151,"Nominal")]}),
("The patient has no major impairment of renal or hepatic function, as defined by the following laboratory parameters: total bilirubin <1.5 X ULN; AST, ALT<2.5X ULN (<5 X ULN if liver metastases are present) ",{"entities":[(39,44,"Nominal"),(48,64,"Nominal"),(117,172,"Numeric")]}),
("Patients with biopsy-proven metastatic carcinoid tumors or other neuroendocrine tumors (Islet cell, Gastrinomas and VIPomas) with at least one measurable lesion (other than bone) that has either not been previously irradiated or if previously irradiated has demonstrated progression since the radiation therapy ",{"entities":[(14,20,"Nominal"),(28,55,"Nominal"),(65,87,"Nominal"),(87,124,"Nominal")]}),
("Have had one prior platinum-based chemotherapy regimen for the treatment of primary disease. ",{"entities":[(19,54,"Nominal")]}),
("Females of childbearing potential: negative serum or urine pregnancy test ",{"entities":[(0,49,"Numeric"),(72,153,"Numeric"),(181,244,"Numeric")]}),
("Serum creatinine less than or equal to 2.0 mg/dL (Note: Patients with a serum creatinine greater than or equal to 1.4 and less than or equal to 2.0 mg/dL must demonstrate a 24-hour urinary creatinine clearance greater than or equal to 50 mL/min) ",{"entities":[(0,49,"Numeric")]}),
("Serum bilirubin less than or equal to 1.5 x institutional upper limit of normal (ULN) ",{"entities":[(0,85,"Numeric")]}),
("Platelet count greater than or equal to 100 x 10^9/L ",{"entities":[(0,52,"Numeric")]}),
("Absolute neutrophil count (ANC) greater than or equal to 1.5 x 10^9/L without growth factor use in the 2 weeks before study randomization ",{"entities":[(0,70,"Numeric"),(103,110,"Numeric")]}),
("18 years of age or older ",{"entities":[(0,24,"Numeric")]}),
("Life expectancy greater than or equal to 6 months ",{"entities":[(0,49,"Numeric")]}),
("Unresectable (locally advanced) stage IIIa or IIIb disease ",{"entities":[(32,58,"Nominal")]}),
("Patients with a histologically or cytologically proven diagnosis of NSCLC ",{"entities":[(68,73,"Nominal")]}),
("Meet DSM-IV criteria for BPD as assessed by the Structured Clinical Interview for DSM-IV Personality Disorders (SCID-II). ",{"entities":[(5,20,"Nominal"),(25,28,"Nominal"),(82,121,"Nominal")]}),
("Be between age 18 and 55 years ",{"entities":[(3,30,"Numeric")]}),
("Completion of a 14-week open label trial of one the following SRI's: fluoxetine 80 mg/day, paroxetine 60 mg/day, fluvoxamine 300 mg/day, clomipramine 250 mg/day, sertraline 200 mg/day, citalopram 60 mg/day, escitalopram 30 mg/day and demonstrating a non or partial responses to SRI treatment (CGI-I of 3 or 4, Y-BOCS reduction of < 35%) ",{"entities":[(69,89,"Numeric"),(91,111,"Numeric"),(113,135,"Numeric"),(137,160,"Numeric"),(162,183,"Numeric"),(185,205,"Numeric"),(207,229,"Numeric")]}),
("Outpatient with primary DSM- IV OCD ",{"entities":[(24,35,"Nominal")]}),
("Patients must have a predicted life expectancy of at least 12 weeks. ",{"entities":[(30,68,"Numeric")]}),
("Tissue from tumor must be available. This may be paraffin embedded tissue from previous biopsy/resection or if it is not available, a repeat biopsy must be performed. The requirement for biopsy may be waived if alpha-fetoprotein is greater than 500 ng/mL and in the investigators, opinion not explained by a concurrent hepatic inflammatory process. ",{"entities":[(12,18,"Nominal"),(211,255,"Numeric")]}),
("Patients must have a pre-treatment granulocyte count (i.e., segmented neutrophils + bands) of greater than or equal to 1,500/mm3, a hemoglobin level of greater than or equal to 9 gm/dl, and platelet count greater than or equal to 50,000/mm3. The granulocyte requirement may be waived if in the investigator's opinion the lower count reflects hypersplenism with adequate bone marrow reserves. ",{"entities":[(35,128,"Numeric"),(132,184,"Numeric"),(190,240,"Numeric")]}),
("Patients must have a predicted life expectancy of at least 12 weeks. ",{"entities":[(30,68,"Numeric")]}),
("Tissue from tumor must be available. This may be paraffin embedded tissue from previous biopsy/resection or if it is not available, a repeat biopsy must be performed. The requirement for biopsy may be waived if alpha-fetoprotein is greater than 500 ng/mL and in the investigators opinion not explained by a concurrent hepatic inflammatory process. ",{"entities":[(12,18,"Nominal"),(211,255,"Numeric")]}),
("7. Significant stenosis has been defined as a stenosis of more than 50% in luminal diameter (in at least one view, on visual interpretation or preferably by QCA); ",{"entities":[(15,23,"Nominal")]}),
("6. Total occluded vessels. One total occluded major epicardial vessel or side branch can be included and targeted as long as one other major vessel has a significant stenosis amenable for SA, provided the age of occlusion is less than one month e.g. recent instability, infarction with ECG changes in the area subtended by the occluded vessel. Patients with total occluded vessels of unknown duration or existing longer than one month and a reference over 1.50 mm should not be included, not even as a third or fourth vessel to be dilated; ",{"entities":[(0,24,"Nominal"),(26,42,"Nominal"),(47,67,"Nominal")]}),
("Single or twin pregnancies ",{"entities":[(0,26,"Nominal")]}),
("Pregnant women with abdomen discumfort and ultrasound diagnosis of polyhydramnios (AFI>25cm) ",{"entities":[(0,8,"Nominal")]}),
("No prior treatment with Ventavis or other active treatments for primary pulmonary hypertension within 6 weeks of date of study inclusion (unless otherwise advised by Bayer Schering Pharma) ",{"entities":[(24,32,"Nominal")]}),
("Patient with primary pulmonary hypertension (i.e. Idiopathic Pulmonary Arterial Hypertension or Familial Pulmonary Arterial Hypertension) and classified as NYHA functional class III (NYHA = New York Heart Association) ",{"entities":[(20,43,"Nominal"),(50,92,"Nominal"),(95,136,"Nominal")]}),
("The treating physician has chosen Ventavis as a suitable long-term treatment for the patient ",{"entities":[(34,42,"Nominal"),(43,92,"Nominal")]}),
("Patients suspected to have vitamin B12 deficiency defined as a plasma vitamin B12 below the reference interval (<200 pmol/L). ",{"entities":[(27,49,"Nominal"),(50,124,"Nominal")]}),
("Body mass index 25-35 kg/m2 ",{"entities":[(0,27,"Numeric")]}),
("Aged at least 18 years with an ability and willingness to give written informed consent. ",{"entities":[(0,22,"Numeric")]}),
("Sufficient number of umbilical cord blood units available for transplantation ",{"entities":[(16,26,"Nominal"),(30,40,"Nominal")]}),
("Women between 40 to 70 years of age. ",{"entities":[(6,35,"Numeric")]}),
("Histologically proven recurrent or persistent endometrial cancer that is not amenable to curative treatment with surgery and/or radiation therapy AND has failed 2 previous treatment regimens ",{"entities":[(22,64,"Nominal")]}),
("Measurable metastatic disease ",{"entities":[(11,29,"Nominal")]}),
("Primary tumor must have been diagnosed histologically as either epithelial ovarian cancer, fallopian tube cancer, or primary peritoneal cancer (not borderline or low malignant potential epithelial carcinoma). ",{"entities":[(0,14,"Nominal"),(64,89,"Nominal"),(91,112,"Nominal"),(117,142,"Nominal"),(186,206,"Nominal")]}),
("Measurable metastatic disease as defined by Response Evaluation Criteria in Solid Tumors (RECIST) ",{"entities":[(11,29,"Nominal")]}),
("Progression on prior therapy with a hormonal agent if estrogen receptor or progesterone receptor positive, and/or with trastuzumab if HER2-neu positive. If patient has progressed through hormone or trastuzumab therapy only, must have received one chemotherapy regimen. ",{"entities":[(119,130,"Nominal")]}),
("Metastatic cervical cancer (CX) ",{"entities":[(0,31,"Nominal")]}),
("Metastatic endometrial cancer (EM) ",{"entities":[(0,34,"Nominal")]}),
("Metastatic ovarian cancer (OV) ",{"entities":[(0,30,"Nominal")]}),
("Metastatic breast cancer (BR)",{"entities":[(0,29,"Nominal")]}),
("Diagnosis of one of the following malignancies: ",{"entities":[(34,46,"Nominal")]}),
("Body Mass Index (BMI) >21 kg/m^2 and <35 kg/m^2. ",{"entities":[(0,48,"Numeric")]}),
("Treated with a stable dose of one of the following for at least 3 months prior to screening: * >=1000 mg/day immediate-release metformin; or metformin >=1000 mg/day and sulfonylurea; or sulfonylurea/metformin combination therapy. ",{"entities":[(94,136,"Numeric"),(141,164,"Numeric"),(169,181,"Nominal"),(186,198,"Nominal"),(199,228,"Nominal")]}),
(" Aged 20 or older,",{"entities":[(1,17,"Numeric")]}),
(" AGE>18 years",{"entities":[(1,13,"Numeric")]}),
(" Age>18 years age ,",{"entities":[(1,18,"Numeric")]}),
(" age>18 years age ,",{"entities":[(1,18,"Numeric")]}),
(" Aged above 18 years and under 65 years",{"entities":[(1,39,"Numeric")]}),
(" Aged over 18 years and under 65 years",{"entities":[]}),
(" Aged above 18 years and below 65 years",{"entities":[(1,39,"Numeric")]}),
(" aged above 18 years and below 65 years",{"entities":[(1,39,"Numeric")]}),
(" aged above 18 and under 65",{"entities":[(1,27,"Numeric")]}),
(" aged above 18 and below 65",{"entities":[(1,27,"Numeric")]}),
(" aged above 18",{"entities":[(1,14,"Numeric")]}),
(" AGE above 10",{"entities":[(1,13,"Numeric")]}),
(" age above 10",{"entities":[(1,13,"Numeric")]}),
(" AGE below 10",{"entities":[(1,13,"Numeric")]}),
(" age below 10",{"entities":[(1,13,"Numeric")]}),
(" aged below 20",{"entities":[(1,14,"Numeric")]}),
(" aged 60 and under",{"entities":[(1,18,"Numeric")]}),
(" aged 20 or younger",{"entities":[(1,19,"Numeric")]}),
("aged 20",{"entities":[(0,7,"Numeric")]}),
(" Age<65",{"entities":[(0,6,"Numeric")]}),
(" age less than 65 years old",{"entities":[(1,27,"Numeric")]}),
(" age less than 65 yrs",{"entities":[(1,21,"Numeric")]}),
(" Age less than 65 years,",{"entities":[(1,24,"Numeric")]}),
(" age 18 to 80 years old",{"entities":[(1,23,"Numeric")]}),
(" Aged 18-80 years",{"entities":[(1,17,"Numeric")]}),
(" aged 18-80 years old,",{"entities":[(1,21,"Numeric")]}),
(" age under 66 yrs",{"entities":[(1,17,"Numeric")]}),
(" AGE under 66",{"entities":[(1,13,"Numeric")]}),
(" Age under 66",{"entities":[(1,13,"Numeric")]}),
(" age under 66",{"entities":[(1,13,"Numeric")]}),
(" age under 66 years",{"entities":[(1,19,"Numeric")]}),
(" aged over 18 yrs",{"entities":[(1,17,"Numeric")]}),
(" aged over 18 years",{"entities":[(1,19,"Numeric")]}),
(" aged over 18",{"entities":[(1,13,"Numeric")]}),
(" Aged over 18",{"entities":[(1,13,"Numeric")]}),
(" age over 18",{"entities":[(1,12,"Numeric")]}),
(" Age over 18",{"entities":[(1,12,"Numeric")]}),
(" age <=65 yrs",{"entities":[(1,13,"Numeric")]}),
(" age<=65yrs",{"entities":[(1,11,"Numeric")]}),
(" age<18yrs",{"entities":[(1,10,"Numeric")]}),
(" age>=18yrs",{"entities":[(1,11,"Numeric")]}),
(" age>=18 yrs",{"entities":[(1,12,"Numeric")]}),
(" age >18 yrs",{"entities":[(1,12,"Numeric")]}),
(" age greater than 18 years and less than 65 years",{"entities":[(1,49,"Numeric")]}),
(" age greater than 18 and less than 65 years",{"entities":[(1,43,"Numeric")]}),
(" age less than 65 years",{"entities":[(1,23,"Numeric")]}),
(" age lesser than 45 years",{"entities":[(1,25,"Numeric")]}),
(" age greater than 18 years",{"entities":[(1,26,"Numeric")]}),
(" age greater than 18",{"entities":[(1,20,"Numeric")]}),
(" aged greater than 18",{"entities":[(1,21,"Numeric")]}),
(" Age greater than 18",{"entities":[(1,20,"Numeric")]}),
(" Aged above 18 years and under 65 years",{"entities":[(1,39,"Numeric")]}),
("Aged over 18 years and under 65 years",{"entities":[(0,37,"Numeric")]}),
(" aged between 18 and 65 years",{"entities":[(1,29,"Numeric")]}),
(" aged between 18 and 65",{"entities":[(1,23,"Numeric")]}),
(" Aged between 18 to 65",{"entities":[(1,22,"Numeric")]}),
(" aged 18 to 65",{"entities":[(1,14,"Numeric")]}),
(" Aged 18 to 65,",{"entities":[(1,14,"Numeric")]}),
(" AGE<=65",{"entities":[(1,8,"Numeric")]}),
(" age<=65",{"entities":[(1,8,"Numeric")]}),
(" age<65",{"entities":[(1,7,"Numeric")]}),
(" Age<=65",{"entities":[(1,8,"Numeric")]}),
(" Age<65",{"entities":[(1,7,"Numeric")]}),
(" aged between 5-65",{"entities":[(1,18,"Numeric")]}),
(" age between 5-65",{"entities":[(1,17,"Numeric")]}),
(" aged between 5-64 years",{"entities":[(1,24,"Numeric")]}),
(" Aged between 5-64 years,",{"entities":[(1,24,"Numeric")]}),
(" age > 18 < 65 years",{"entities":[(1,20,"Numeric")]}),
(" age>18<65years",{"entities":[(1,15,"Numeric")]}),
(" age>18<65",{"entities":[(1,10,"Numeric")]}),
(" age < = 65 years",{"entities":[(1,17,"Numeric")]}),
(" age < = 18 years",{"entities":[(1,17,"Numeric")]}),
(" age > = 18 years old",{"entities":[(1,21,"Numeric")]}),
(" age > = 18 years",{"entities":[(1,17,"Numeric")]}),
(" age > = 18",{"entities":[(1,11,"Numeric")]}),
(" AGE>=18",{"entities":[(1,8,"Numeric")]}),
(" age >= 18",{"entities":[(1,10,"Numeric")]}),
(" age>=18",{"entities":[(1,8,"Numeric")]}),
(" Age 18,",{"entities":[(2,8,"Numeric")]}),
(" age from 18-65 years",{"entities":[(1,21,"Numeric")]}),
(" age from 18-65",{"entities":[(1,15,"Numeric")]}),
(" aged from 18 to 65",{"entities":[(1,19,"Numeric")]}),
("AGED from 18 to 65",{"entities":[(0,18,"Numeric")]}),
(" Aged 18 years or older, ",{"entities":[(1,23,"Numeric")]}),
(" Aged between 5 - 14 years,",{"entities":[(1,26,"Numeric")]}),
(" aged from 18 to 65 years",{"entities":[(1,25,"Numeric")]}),
(" ages from 18 to 65 years",{"entities":[(1,25,"Numeric")]}),
(" ages from 18 to 65",{"entities":[(1,19,"Numeric")]}),
(" ages 18-65",{"entities":[(1,11,"Numeric")]}),
(" age 50-65",{"entities":[(1,10,"Numeric")]}),
("age 10-15",{"entities":[(0,9,"Numeric")]}),
(" ages 18 or younger",{"entities":[(1,19,"Numeric")]}),
(" Aged 18 years or younger, ",{"entities":[(1,25,"Numeric")]}),
("between ages 18-85 years of age",{"entities":[(0,31,"Numeric")]}),
("Male and females between ages 18-85 years of age",{"entities":[(17,47,"Numeric")]}),   
("type 2 diabetic, age 18 and over, informed consent, ",{"entities":[(0,15,"Nominal"),(17,32,"Numeric")]}),
("Body Mass Index (BMI) >21 kg/m^2 and <35 kg/m^2.",{"entities":[(0,47,"Numeric")]}),
("Estimated life expectancy of more than 3 months ",{"entities":[(0,47,"Numeric")]}),
("Age 18 to 70 years old",{"entities":[(0,22,"Numeric")]}),
("Pathologically proven unresectable adenocarcinoma of stomach ",{"entities":[(34,60,"Nominal")]}),
("Adequate bone marrow function(absolute neutrophil count [ANC] 1,500/L, hemoglobin 9.0 g/dL,and platelets 100,000/L)Adequate kidney function (serum creatinine < 1.5 mg/dL)",{"entities":[(39,69,"Numeric"),(71,90,"Numeric"),(94,115,"Numeric"),(141,169,"Numeric"),(0,29,"Nominal"),(115,140,"Nominal")]}),
("Estimated life expectancy of more than 3 months ",{"entities":[(0,47,"Numeric")]}),
("Age 18 to 70 years old",{"entities":[(0,22,"Numeric")]}),
("Pathologically proven unresectable adenocarcinoma of stomach ",{"entities":[(34,60,"Nominal")]}),
("No prior radiation therapy for at least 4 weeks before enrollment in the study ",{"entities":[(9,27,"Nominal")]}),
("No prior chemotherapy but prior adjuvant chemotherapy finished at least 6 months before enrollment was allowed. (but, prior adjuvant chemotherapy with capecitabine or S-1 or camptothecin analogues was excluded) ",{"entities":[(9,21,"Nominal"),(32,53,"Nominal")]}),
("Adequate liver function (serum total bilirubin < 2 times the upper normal limit (UNL); serum transaminases levels <3 times [<5 times for patients with liver metastasis] UNL) ",{"entities":[(0,23,"Nominal"),(25,85,"Numeric"),(87,123,"Numeric"),(151,167,"Nominal")]}),
("Adequate bone marrow function(absolute neutrophil count [ANC] 1,500/L, hemoglobin 9.0 g/dL,and platelets 100,000/L)Adequate kidney function (serum creatinine < 1.5 mg/dL)",{"entities":[(39,69,"Numeric"),(71,90,"Numeric"),(94,115,"Numeric"),(141,169,"Numeric"),(0,29,"Nominal"),(115,140,"Nominal")]}),
("Estimated life expectancy of more than 3 months ",{"entities":[(0,47,"Numeric")]}),
("Age 18 to 70 years old",{"entities":[(0,22,"Numeric")]}),
("Pathologically proven unresectable adenocarcinoma of stomach ",{"entities":[(34,60,"Nominal")]}),
("Male or female between, and including, 6-12 weeks (42 to 90 days) of age at the time of the first vaccination. ",{"entities":[(0,5,"Nominal"),(8,14,"Nominal"),(38,72,"Numeric")]}),
("Born after a gestation period between 36 and 42 weeks",{"entities":[(0,54,"Numeric")]}),
("(Born after a gestation period between 36 and 42 weeks.,{entities:[]}),",{"entities":[(0,53,"Numeric")]}),
("Subjects for whom the investigator believes that their parents/guardians can and will comply with the requirements of the protocol ",{"entities":[(0,53,"Numeric")]}),
("Male or female between, and including, 6-12 weeks (42 to 90 days) of age at the time of the first vaccination. ",{"entities":[(0,5,"Nominal"),(8,14,"Nominal"),(38,72,"Numeric")]}),
("Women at any age with early stage breast cancer (stage I-II) and American Society of Anesthesiologists (ASA) score of I-II. ",{"entities":[(0,5,"Nominal"),(9,17,"Numeric"),(34,47,"Nominal"),(65,123,"Numeric")]}),
("over 18 years ",{"entities":[(0,13,"Numeric")]}),
("Age: 18 years ",{"entities":[(1,14,"Numeric")]}),
("Adequate liver function - serum total bilirubin concentration less than 1.5 x upper limit of normal value ",{"entities":[(2,63,"Numeric"),(131,155,"Nominal"),(158,236,"Numeric")]}),
("Eastern Cooperative Oncology Group (ECOG) score of 0, 1 or 2 (patients that spend less than 50% of time in bed during the day) ",{"entities":[(0,52,"Numeric")]}),
("Previous chemotherapy or radiotherapy must have been performed 8 weeks prior to study entry. ",{"entities":[(9,22,"Nominal"),(25,38,"Nominal")]}),
("Patients without prostatectomy: 2 consecutive rises in PSA levels relative to a previous reference value, separated by one month. The first measurement must occur one month after the reference value and must be above the reference value. The second confirmatory measurement taken one month after the first measurement must be greater than the first measurement. ",{"entities":[(17,30,"Nominal")]}),
("Patients who have undergone prostatectomy: any rise in PSA or ",{"entities":[(28,41,"Nominal"),(55,58,"Nominal")]}),
("PSA criteria: ",{"entities":[(0,3,"Nominal")]}),
("Prostate cancer patients with a rise in PSA under hormone therapy. ",{"entities":[(0,15,"Nominal"),(50,65,"Nominal")]}),
("Patients with histologically confirmed diagnosis of prostate cancer who have not yet developed bone metastases ",{"entities":[(52,68,"Nominal"),(95,110,"Nominal")]}),("Patient has given written informed consent prior to any study-specific procedures. Patients with psychiatric or addictive disorders which prevent them from giving their informed consent must not enter the study. ",{"entities":[(97,109,"Nominal"),(112,132,"Nominal")]}),
("Age: 18 years ",{"entities":[(1,14,"Numeric")]}),
("Adequate liver function - serum total bilirubin concentration less than 1.5 x upper limit of normal value ",{"entities":[(2,63,"Numeric"),(131,155,"Nominal"),(158,236,"Numeric")]}),
("Eastern Cooperative Oncology Group (ECOG) score of 0, 1 or 2 (patients that spend less than 50% of time in bed during the day) ",{"entities":[(0,52,"Numeric")]}),
("Previous chemotherapy or radiotherapy must have been performed 8 weeks prior to study entry. ",{"entities":[(9,22,"Nominal"),(25,38,"Nominal")]}),
("Patients without prostatectomy: 2 consecutive rises in PSA levels relative to a previous reference value, separated by one month. The first measurement must occur one month after the reference value and must be above the reference value. The second confirmatory measurement taken one month after the first measurement must be greater than the first measurement. ",{"entities":[(17,30,"Nominal")]}),
("Patients who have undergone prostatectomy: any rise in PSA or ",{"entities":[(28,41,"Nominal"),(55,58,"Nominal")]}),
("PSA criteria: ",{"entities":[(0,3,"Nominal")]}),
("Prostate cancer patients with a rise in PSA under hormone therapy. ",{"entities":[(0,15,"Nominal"),(50,65,"Nominal")]}),
("Patients with histologically confirmed diagnosis of prostate cancer who have not yet developed bone metastases ",{"entities":[(52,68,"Nominal"),(95,110,"Nominal")]}),
("Life expectancy 3 months ",{"entities":[(1,25,"Numeric")]}),
("Life expectancy 3 months ",{"entities":[(1,25,"Numeric")]}),
("Cohort 2: Newly-diagnosed high-grade glioma (World Health Organization [WHO] grade 3 or 4) ",{"entities":[(26,43,"Nominal")]}),
 
    
("Pregnant or lactating females. ",{"entities":[(0,8,"Nominal"),(12,21,"Nominal"),(22,29,"Nominal")]}),
("HIV+ patients ",{"entities":[(0,4,"Nominal")]}),
("Patients with a medical or psychiatric illness that would preclude study or informed consent and/or history of noncompliance to medical regimens or inability or unwillingness to return for all scheduled visits ",{"entities":[(16,46,"Nominal")]}),
("Patients with active or suspected acute or chronic uncontrolled infection including abcesses or fistulae ",{"entities":[(34,73,"Nominal"),(84,93,"Nominal"),(95,104,"Nominal")]}),
("Patients taking any experimental therapies history of another malignancy within 5 years prior to study entry except curatively treated non-melanoma skin cancer, prostate cancer, or cervical cancer in situ ",{"entities":[(62,72,"Nominal"),(148,159,"Nominal"),(161,176,"Nominal"),(181,204,"Nominal"),(139,147,"Nominal")]}),
("Patients with severe cardiac insufficiency patients taking Coumadin or other warfarin-containing agents with the exception of low dose warfarin (1 mg or less) for the maintenance of in-dwelling lines or ports ",{"entities":[(21,42,"Nominal")]}),
("Patients with any peripheral neuropathy or unresolved diarrhea greater than Grade 1 ",{"entities":[(18,39,"Nominal"),(43,63,"Nominal")]}),
("Patients who have been previously treated with epothilone ",{"entities":[]}),
("Patients who have been previously treated with radioactive directed therapies ",{"entities":[(47,77,"Nominal")]}),
("Patients with hepatic artery chemoembolization within the last 6 months (one month if there are other sites of measurable disease) ",{"entities":[(14,46,"Nominal")]}),
("Patients with bone metastases as the only site(s) of measurable disease ",{"entities":[(14,29,"Nominal")]}),
("Patients with known brain metastases, unless these metastases have been treated and/or have been stable for at least six months prior to study start. Subjects with a history of brain metastases must have a head CT with contrast to document either response or progression. ",{"entities":[(20,36,"Nominal"),(51,62,"Nominal"),(177,193,"Nominal")]}),
("Patients with symptomatic CNS metastases or leptomeningeal involvement ",{"entities":[(26,40,"Nominal"),(44,70,"Nominal")]}),
("Concurrent severe medical problems unrelated to the malignancy which would limit full compliance with the study. ",{"entities":[(11,34,"Nominal"),(52,63,"Nominal")]}),
("Active uncontrolled infection requiring antibiotics. ",{"entities":[(6,29,"Nominal")]}),
("Concomitant or previous malignancies with the exception of adequately treated basal cell or squamous cell skin cancer, in situ cervical cancer, incidental carcinoid, or other cancer from which the patient has been disease free for 5 years. ",{"entities":[(0,11,"Nominal"),(24,36,"Nominal"),(78,88,"Nominal"),(92,117,"Nominal"),(122,142,"Nominal"),(144,164,"Nominal"),(169,182,"Nominal")]}),
("Received more than one primary chemotherapy regimen. ",{"entities":[(30,43,"Nominal")]}),
("Pregnant or lactating. ",{"entities":[(0,8,"Nominal"),(12,21,"Nominal")]}),
("Women of child-bearing potential that do not practice adequate contraception. ",{"entities":[(0,5,"Nominal"),(63,76,"Nominal"),(9,32,"Nominal")]}),
("(History of pancreatitis",{"entities":[(11,24,"Nominal")]}),
("Previous treatment on this study or with a fibroblast growth factor ,",{"entities":[(42,67,"Nominal")]}),
("Psychological, social, familial, or geographical reasons that would prevent regular follow-up ",{"entities":[(0,13,"Nominal"),(15,56,"Nominal")]}),
("Pregnant or breastfeeding women ",{"entities":[(0,8,"Nominal"),(12,25,"Nominal"),(26,31,"Nominal")]}),
("Known to be sero-positive for human immunodeficiency virus (HIV), hepatitis C virus (HCV), or hepatitis B virus (HBV) ",{"entities":[(30,64,"Nominal"),(66,89,"Nominal"),(94,117,"Nominal")]}),
("Four weeks or less since completion of treatment using an investigational product/device in another clinical study or presence of any unresolved toxicity from previous treatment ",{"entities":[(0,23,"Nominal")]}),
("Presence or history of dysphagia or conditions predisposing to dysphagia (eg, uncontrolled gastroesophageal reflux disease [GERD], dyspepsia, etc) ",{"entities":[(12,32,"Nominal"),(63,73,"Nominal"),(78,122,"Nominal"),(124,128,"Nominal"),(131,140,"Nominal")]}),
("Prior invasive malignancy during the past 3 years other than non-melanomatous skin cancer. Note: Patients with prior surgically-cured malignancies [eg, stage I breast cancer or prostate cancer, in-situ carcinoma of the cervix, etc] are not excluded; however, sponsor approval must be obtained before patient is randomized. ",{"entities":[(15,25,"Nominal"),(61,89,"Nominal"),(134,147,"Nominal"),(153,173,"Nominal"),(177,192,"Nominal"),(194,225,"Nominal")]}),
("Prior chemotherapy, radiotherapy, or surgery for NSCLC ",{"entities":[(6,18,"Nominal"),(20,32,"Nominal"),(49,54,"Nominal")]}),
("Shielding of any part of the esophagus during radiotherapy (including posterior spinal cord shielding) ",{"entities":[(70,101,"Nominal")]}),
("Plan to remove the tumor surgically before completing the protocol chemo/radiotherapy course ",{"entities":[(18,24,"Nominal"),(67,85,"Nominal")]}),
("Pleural or pericardial effusion greater than 100 ml in volume as documented by appropriate imaging (positron emission tomography [PET], computed tomography [CT] scan or ultrasound). If an effusion greater than 100 ml is documented by cytology to be free from malignancy and the investigator feels the patient is capable of receiving chemo/radiotherapy for their primary disease/ NSCLC, the investigator should discuss the patient with the study physician at Amgen. Effusions smaller than 100 ml would be acceptable, unless the investigator suspects that the effusion is malignant, in which case the effusions should be evaluated by cytology. Sponsor approval must be obtained before patient is randomized. ",{"entities":[(0,61,"Numeric"),(188,217,"Numeric"),(259,269,"Nominal")]}),
("Metastatic disease (M1)/stage 4 NSCLC ",{"entities":[(0,37,"Nominal")]}),
("Have an Axis I diagnosis of Schizophrenia, Schizoaffective Disorder, Schizophreniform Disorder or Bipolar I Disorder as diagnosed by the Structured Clinical Interview for DSM-IV Axis I Disorders (SCID-I), and pertinent subsequent for ruling out exclusionary diagnoses. ",{"entities":[(8,41,"Nominal"),(43,67,"Nominal"),(69,95,"Nominal"),(98,116,"Nominal"),(137,203,"Nominal")]}),
("Are undergoing an acute withdrawal syndrome from drugs or alcohol.",{"entities":[(18,44,"Nominal")]}),
("Are pregnant or lactating. ",{"entities":[(3,13,"Nominal"),(16,25,"Nominal")]}),
("Patients who are currently receiving quetiapine therapy may not undergo a washout period and then restart quetiapine in the study. ",{"entities":[(37,55,"Nominal")]}),
("Patients cannot begin psychotherapy during the study period, but may continue if started prior to the study. ",{"entities":[(22,35,"Nominal")]}),
("Had an unsatisfactory response to a previous adequate trial of quetiapine as judged by a study investigator. ",{"entities":[]}),
("Have an unstable medical disorder as determined by physical examination or laboratory testing. The primary investigator will be responsible for making this judgment based on the above. ",{"entities":[(17,33,"Nominal")]}),
("History of kidney stones",{"entities":[(10,24,"Nominal")]}),
("BMI < 20 ",{"entities":[(0,8,"Numeric")]}),
("Allergy or hypersensitivity to topiramate ",{"entities":[(0,7,"Nominal")]}),
("Cognitive behavioural therapy or additional psychotherapy in past four months ",{"entities":[(0,30,"Nominal"),(44,57,"Nominal")]}),
("Comorbid major depressive disorder diagnosis which predates OCD diagnosis ",{"entities":[(0,35,"Nominal"),(60,73,"Nominal")]}),
("A previous adequate trial of topiramate ",{"entities":[(29,39,"Nominal")]}),
("Any other primary DSM-IV diagnosis; DSM-IV criteria for body dysmorphic disorder, bipolar affective disorder, schizophrenia, psychotic disorder, current alcohol/substance abuse. ",{"entities":[(18,34,"Nominal"),(56,80,"Nominal"),(82,108,"Nominal"),(110,123,"Nominal"),(125,143,"Nominal"),(153,176,"Nominal"),(36,42,"Nominal")]}),
("Patients with any other severe concurrent disease, which in the judgment of the investigator, would make the patient inappropriate for entry into this study. ",{"entities":[(25,49,"Nominal")]}),
("Patients with psychiatric disorders that would interfere with consent or follow-up. Pregnant or lactating women. Men and women of reproductive potential may not participate unless they have agreed to use an effective contraceptive method. ",{"entities":[(14,35,"Nominal"),(84,92,"Nominal"),(96,105,"Nominal"),(106,111,"Nominal"),(113,116,"Nominal"),(121,126,"Nominal"),(130,153,"Nominal"),(217,230,"Nominal")]}),
("Patients with any active or uncontrolled infection, including known HIV infection. (Patients with active hepatitis B will be placed on lamivudine. Patients with active hepatitis C will be eligible if liver tests qualify (5.1.9) ",{"entities":[(18,50,"Nominal"),(68,81,"Nominal"),(97,117,"Nominal"),(161,179,"Nominal"),(200,227,"Numeric")]}),
("tients who have received prior chemotherapy for unresectable disease ",{"entities":[(31,43,"Nominal"),(48,68,"Nominal")]}),
("History of kidney stones",{"entities":[(10,24,"Nominal")]}),
("BMI < 20 ",{"entities":[(0,8,"Numeric")]}),
("Allergy or hypersensitivity to topiramate ",{"entities":[(0,7,"Nominal")]}),
("Cognitive behavioural therapy or additional psychotherapy in past four months ",{"entities":[(0,30,"Nominal"),(44,57,"Nominal")]}),
("Comorbid major depressive disorder diagnosis which predates OCD diagnosis ",{"entities":[(0,35,"Nominal"),(60,73,"Nominal")]}),
("A previous adequate trial of topiramate ",{"entities":[(29,39,"Nominal")]}),
("Any other primary DSM-IV diagnosis; DSM-IV criteria for body dysmorphic disorder, bipolar affective disorder, schizophrenia, psychotic disorder, current alcohol/substance abuse. ",{"entities":[(18,34,"Nominal"),(56,80,"Nominal"),(82,108,"Nominal"),(110,123,"Nominal"),(125,143,"Nominal"),(153,176,"Nominal"),(36,42,"Nominal")]}),
("type 1 diabetic or non-diabetic ",{"entities":[(0,15,"Nominal"),(19,31,"Nominal")]}),
("9. Intention to treat more than 1 totally occluded major epicardial vessel; ",{"entities":[(4,44,"Nominal")]}),
("8. Left main stenosis of 50% or more; ",{"entities":[(3,36,"Numeric")]}),
("7. History of any cerebrovascular accident; ",{"entities":[(18,42,"Nominal")]}),
("6. Chest pain lasting longer than 30 minutes within 12 hours pre-procedure, if CK enzymes positive ( 2x the normal upper limit). ",{"entities":[(2,13,"Nominal"),(76,128,"Numeric")]}),
("5. Transmural myocardial infarction within the previous seven days and CK has not returned to normal; ",{"entities":[(3,35,"Nominal")]}),
("4. Congenital heart disease;",{"entities":[(3,27,"Nominal")]}),
("2. CABG or Percutaneous Coronary Intervention (PCI) procedure; ",{"entities":[(3,7,"Nominal"),(11,51,"Nominal")]}),
("1. Congestive heart failure; ",{"entities":[(3,27,"Nominal")]}),
("type 1 diabetic or non-diabetic ",{"entities":[(0,15,"Nominal"),(19,31,"Nominal")]}),
("Pregnancy complicated with pre-eclampsia ",{"entities":[(0,10,"Nominal"),(27,40,"Nominal")]}),
("Maternal history of placental abruptio ",{"entities":[(0,38,"Nominal")]}),
("Multiple pregnancy (more than 3 fetuses) ",{"entities":[(0,18,"Nominal")]}),
("9. Intention to treat more than 1 totally occluded major epicardial vessel; ",{"entities":[(4,44,"Nominal")]}),
("8. Left main stenosis of 50% or more; ",{"entities":[(3,36,"Numeric")]}),
("7. History of any cerebrovascular accident; ",{"entities":[(18,42,"Nominal")]}),
("6. Chest pain lasting longer than 30 minutes within 12 hours pre-procedure, if CK enzymes positive ( 2x the normal upper limit). ",{"entities":[(2,13,"Nominal"),(76,128,"Numeric")]}),
("5. Transmural myocardial infarction within the previous seven days and CK has not returned to normal; ",{"entities":[(3,35,"Nominal")]}),
("4. Congenital heart disease;",{"entities":[(3,27,"Nominal")]}),
("2. CABG or Percutaneous Coronary Intervention (PCI) procedure; ",{"entities":[(3,7,"Nominal"),(11,51,"Nominal")]}),
("1. Congestive heart failure; ",{"entities":[(3,27,"Nominal")]}),
("type 1 diabetic or non-diabetic ",{"entities":[(0,15,"Nominal"),(19,31,"Nominal")]}),
("Any condition that prevents participation in the study, including pregnancy and other contraindications for Ventavis treatment (as listed in the current Ventavis Summary of Product Characteristics and patient package insert) ",{"entities":[(66,75,"Nominal")]}),
("Pregnancy complicated with pre-eclampsia ",{"entities":[(0,10,"Nominal"),(27,40,"Nominal")]}),
("Maternal history of placental abruptio ",{"entities":[(0,38,"Nominal")]}),
("Multiple pregnancy (more than 3 fetuses) ",{"entities":[(0,18,"Nominal")]}),
("9. Intention to treat more than 1 totally occluded major epicardial vessel; ",{"entities":[(4,44,"Nominal")]}),
("8. Left main stenosis of 50% or more; ",{"entities":[(3,36,"Numeric")]}),
("7. History of any cerebrovascular accident; ",{"entities":[(18,42,"Nominal")]}),
("6. Chest pain lasting longer than 30 minutes within 12 hours pre-procedure, if CK enzymes positive ( 2x the normal upper limit). ",{"entities":[(2,13,"Nominal"),(76,128,"Numeric")]}),
("5. Transmural myocardial infarction within the previous seven days and CK has not returned to normal; ",{"entities":[(3,35,"Nominal")]}),
("4. Congenital heart disease;",{"entities":[(3,27,"Nominal")]}),
("2. CABG or Percutaneous Coronary Intervention (PCI) procedure; ",{"entities":[(3,7,"Nominal"),(11,51,"Nominal")]}),
("1. Congestive heart failure; ",{"entities":[(3,27,"Nominal")]}),
("type 1 diabetic or non-diabetic ",{"entities":[(0,15,"Nominal"),(19,31,"Nominal")]}),
("Patients who were pregnant, nursing or not able to give written informed consent were excluded. ",{"entities":[(18,26,"Nominal"),(28,35,"Nominal")]}),
("Present alcoholism or drug abuse or use of medications that could interfere with the treatment including bronchodilators, quinolone antibiotics, monoamine oxidase inhibitors, anxiolytics, ranitidine, corticosteroids, growth hormone, antihypertensives. ",{"entities":[(8,18,"Nominal"),(22,32,"Nominal"),(105,120,"Nominal"),(122,143,"Nominal"),(145,173,"Nominal"),(175,198,"Nominal"),(200,215,"Nominal"),(217,231,"Nominal"),(233,250,"Nominal")]}),
("Abnormal hepatic function (liver function test > twice the normal range), abnormal renal function (creatinine > 1.1 mg/dl), fasting plasma glucose in the diabetic range (>/= 126 mg/dl), or blood pressure > 140/90 mmHg. ",{"entities":[(0,26,"Nominal"),(74,98,"Nominal"),(124,184,"Numeric"),(99,122,"Numeric"),(189,217,"Numeric")]}),
("Any condition/illness that may affect the study outcomes or would make participation potentially harmful such as pregnancy or breastfeeding, diabetes mellitus, heart disease, stroke, hypertension, malabsorption syndromes, GERD, a history of ulcer, according to a detailed medical history. ",{"entities":[(113,122,"Nominal"),(141,158,"Nominal"),(160,173,"Nominal"),(175,181,"Nominal"),(183,195,"Nominal"),(197,220,"Nominal"),(222,226,"Nominal"),(241,246,"Nominal"),(126,139,"Nominal")]}),
("Pregnant ",{"entities":[(0,8,"Nominal")]}),
("Diagnosed with a medical or psychiatric illness that may interfere with study participation ",{"entities":[(17,48,"Nominal")]}),
("Undergoing Interleukin-2 (IL-2) therapy within 8 weeks of study entry ",{"entities":[(11,39,"Nominal")]}),
("Cigarette smoking ",{"entities":[(0,17,"Nominal")]}),
("Palpable fibroids or uterine prolapse: Grade 2 or 3. ",{"entities":[(0,17,"Nominal"),(20,51,"Nominal")]}),
("Use of alternative therapies or natural products to treat postmenopausal symptoms in the four weeks prior to randomization. ",{"entities":[]}),
("Use of any systemic estrogen, progestin, or DHEA in the eight weeks prior to randomization. ",{"entities":[]}),
("Use of steroids or drugs that interfere with the metabolism of estrogen. ",{"entities":[]}),
("Diagnosis of cancer. ",{"entities":[(13,19,"Nominal")]}),
("Significant metabolic and endocrine diseases. ",{"entities":[(12,44,"Nominal")]}),
("Body mass index (BMI) of 35 kg/m2 or more. ",{"entities":[(0,41,"Numeric")]}),
("Any history of brain metastases or any other active central nervous system (CNS) disease ",{"entities":[(15,31,"Nominal"),(52,88,"Nominal")]}),
("Concurrent malignancy (if in remission, at least 5 years disease free) except for localized (in-situ) disease, basal carcinomas and cutaneous squamous cell carcinomas that have been adequately treated ",{"entities":[(11,21,"Nominal"),(92,109,"Nominal"),(111,128,"Nominal"),(132,166,"Nominal")]}),
("Pregnant or lactating ",{"entities":[(0,8,"Nominal"),(11,21,"Nominal")]}),
("Active coagulation disorder not controlled with medication ",{"entities":[(7,27,"Nominal")]}),
("Active autoimmune disease requiring immunosuppressive therapy within 30 days ",{"entities":[(6,25,"Nominal"),(36,61,"Nominal")]}),
("Uncontrolled intercurrent or chronic illness ",{"entities":[(0,44,"Nominal")]}),
("History of, or clinical evidence of, a condition which, in the opinion of the investigator, could confound the results of the study or put the subject at undue risk ",{"entities":[]}),
("Cardiac ischemia, cardiac arrhythmias or congestive heart failure uncontrolled by medication ",{"entities":[(0,16,"Nominal"),(18,37,"Nominal"),(41,66,"Nominal")]}),
("Active fungal infection or pulmonary infiltrates (prior treated disease stable for 2 weeks is allowable) ",{"entities":[(7,24,"Nominal"),(27,48,"Nominal")]}),
("i. Warfarin, phenprocoumon: increase bleeding tendency ii. Increase blood concentration of phenytoin iii. sorivudine: inhibit DPD -> increase toxicity according to fluoropyrimidine iv. allopurinol : decrease activity of S-1 ",{"entities":[(3,11,"Nominal"),(13,26,"Nominal"),(37,54,"Nominal")]}),
("concomitant drug medication; The following drugs cause drug interaction with S-1. ",{"entities":[]}),
("Concomitant administration of any other experimental drug under investigation, or concomitant chemotherapy, hormonal therapy, or immunotherapy ",{"entities":[(82,106,"Nominal"),(108,124,"Nominal"),(129,142,"Nominal")]}),
("Other serious underlying medical conditions which could impair the ability of the patient to participate in the study ",{"entities":[]}),
("Active uncontrolled infection ",{"entities":[(7,29,"Nominal")]}),
("History of significant neurologic or psychiatric disorders including dementia or seizures ",{"entities":[(23,33,"Nominal"),(37,58,"Nominal"),(0,0,"Nominal"),(69,77,"Nominal"),(80,89,"Nominal")]}),
("Unstable cardiac disease despite treatment, myocardial infarction within 6 months prior to study entry ",{"entities":[(8,25,"Nominal"),(44,65,"Nominal")]}),
("Other serious illness or medical conditions ",{"entities":[(6,21,"Nominal"),(25,43,"Nominal")]}),
("Pregnant or lactating women, women of childbearing potential not employing adequate contraception ",{"entities":[(0,8,"Nominal"),(12,21,"Nominal"),(22,27,"Nominal"),(29,34,"Nominal"),(38,50,"Nominal"),(0,0,"Nominal"),(75,97,"Nominal")]}),
("Past or concurrent history of neoplasm other than stomach cancer, except for curatively treated non-melanoma skin cancer or in situ carcinoma of the cervix uteri ",{"entities":[(8,38,"Nominal"),(50,64,"Nominal"),(96,120,"Nominal"),(124,161,"Nominal")]}),
("The patient has bony lesions as the sole evaluable disease. ",{"entities":[(16,28,"Nominal")]}),
("Evidence of gastrointestinal bleeding ",{"entities":[(12,37,"Nominal")]}),
("Gastric outlet obstruction or intestinal obstruction ",{"entities":[(0,26,"Nominal"),(31,52,"Nominal")]}),
("Central nervous system (CNS) metastases or prior radiation for CNS metastases ",{"entities":[(0,22,"Nominal"),(22,39,"Nominal"),(63,77,"Nominal")]}),
("Other tumor type than adenocarcinoma ",{"entities":[(0,16,"Nominal"),(22,36,"Nominal")]}),
("Major congenital defects or serious chronic illness. ",{"entities":[(6,24,"Nominal"),(28,51,"Nominal")]}),
("A family history of congenital or hereditary immunodeficiency. ",{"entities":[(19,30,"Nominal"),(34,62,"Nominal")]}),
("Any confirmed or suspected immunosuppressive or immunodeficient condition based on medical history and physical ",{"entities":[(17,44,"Nominal"),(48,73,"Nominal")]}),
("Acute disease at the time of enrolment ",{"entities":[(0,14,"Nominal")]}),
("History of seizures (this criterion does not apply to subjects who have had a single, uncomplicated febrile convulsion in the past) or neurological disease. ",{"entities":[(0,19,"Nominal")]}),
("History of allergic disease or reactions likely to be exacerbated by any component of the vaccines. ",{"entities":[(11,27,"Nominal")]}),
("History of or intercurrent diphtheria, tetanus, pertussis, hepatitis B, polio, and Haemophilus influenzae type b diseases. ",{"entities":[(27,37,"Nominal"),(39,46,"Nominal"),(48,57,"Nominal"),(59,70,"Nominal"),(72,77,"Nominal"),(83,94,"Nominal"),(95,112,"Nominal")]}),
("Previous vaccination against diphtheria, tetanus, pertussis, polio, hepatitis B, Haemophilus influenzae type b, and/or S. pneumoniae with the exception of vaccines where the first dose can be given within the first two weeks of life according to the national recommendations",{"entities":[(29,39,"Nominal"),(41,48,"Nominal"),(50,59,"Nominal"),(61,66,"Nominal"),(68,79,"Nominal"),(81,110,"Nominal"),(122,133,"Nominal")]}),
("Insufficient response to pregabalin in the treatment of partial seizure, or patients currently receiving pregabalin treatment.",{"entities":[(56,71,"Nominal"),(105,125,"Nominal"),(25,35,"Nominal")]}),
("Pre-existing eye diseases (glaucoma). ",{"entities":[(13,25,"Nominal"),(27,35,"Nominal")]}),
("Having a remote infection, ",{"entities":[(8,25,"Nominal")]}),
("History of immunodeficiency, ",{"entities":[(11,28,"Nominal")]}),
("History of receiving any antibiotics within prior 3 months, ",{"entities":[]}),
("Receiving any neoadjuvant therapy, ",{"entities":[(14,33,"Nominal")]}),
("Advanced or distant metastatic stage, ",{"entities":[(20,36,"Nominal")]}),
("Ductal carcinoma in situ (DCIS; stage 0 cancer), ",{"entities":[(0,25,"Nominal"),(26,30,"Nominal"),(40,46,"Nominal")]}),
("angioplasty with stenting ",{"entities":[(0,11,"Nominal")]}),
("contra-indications of radiotherapy ",{"entities":[(22,34,"Nominal")]}),
("Current active dental problems including infection of the teeth or jawbone (maxilla or mandibular); dental or fixture trauma, or a current or prior diagnosis of osteonecrosis of the jaw (ONJ), of exposed bone in the mouth, or of slow healing after dental procedures. ",{"entities":[(15,30,"Nominal"),(40,63,"Nominal"),(68,98,"Nominal"),(100,124,"Nominal"),(161,191,"Nominal")]}),
("Subjects who, in the opinion of the investigator, are unlikely to cooperate fully during the study ",{"entities":[]}),
("Known history or present abuse of alcohol or drugs ",{"entities":[(25,50,"Nominal")]}),
("Use of other investigational drugs 30 days prior to the date of randomization ",{"entities":[]}),
("Known hypersensitivity to zoledronic acid or other bisphosphonates ",{"entities":[(5,22,"Nominal")]}),
("Severe physical or psychological concomitant diseases that might impair compliance with the provisions of the study protocol or that might impair the assessment of drug or patient safety, e.g. clinically significant ascites, cardiac failure, NYHA III or IV, clinically relevant pathologic findings in ECG ",{"entities":[(7,53,"Nominal"),(225,240,"Nominal"),(258,304,"Nominal")]}),
("History of diseases with influence on bone metabolism such as Paget's disease and primary hyperparathyroidism ",{"entities":[(62,77,"Nominal"),(82,109,"Nominal"),(38,53,"Nominal")]}),
("Patients with clinically symptomatic brain metastases ",{"entities":[(37,53,"Nominal")]}),
("Corrected (adjusted for serum albumin) serum calcium concentration < 8.0 mg/dl (2.00 mmol/L) or 12.0 mg/dl (3.00 mmol/L). ",{"entities":[(39,93,"Numeric"),(97,122,"Numeric")]}),
("Abnormal renal function as evidenced by a calculated creatinine clearance < 30 ml/minute. ",{"entities":[(0,24,"Nominal"),(42,89,"Numeric")]}),
("Prior treatment with a bisphosphonate ",{"entities":[(0,24,"Nominal"),(42,89,"Numeric")]}),
("Subjects who are diagnosed as suffering from psychotic illness according to DSM-IV (Axis 1)22, or with a history of CNS disease, a history of infection that might affect CNS (HIV, syphilis, cytomegalovirus, herpes), or a history of head injury with loss of consciousness,pregnant women. ",{"entities":[(75,93,"Nominal"),(45,62,"Nominal"),(116,127,"Nominal"),(142,151,"Nominal"),(170,173,"Nominal"),(175,178,"Nominal"),(180,188,"Nominal"),(190,205,"Nominal"),(207,213,"Nominal"),(271,279,"Nominal"),(280,285,"Nominal")]}),
("use more than 2g a day; 5 times a week to everyday ",{"entities":[(0,22,"Numeric"),(24,38,"Numeric")]}),
("Current active dental problems including infection of the teeth or jawbone (maxilla or mandibular); dental or fixture trauma, or a current or prior diagnosis of osteonecrosis of the jaw (ONJ), of exposed bone in the mouth, or of slow healing after dental procedures. ",{"entities":[(15,30,"Nominal"),(40,63,"Nominal"),(68,98,"Nominal"),(100,124,"Nominal"),(161,191,"Nominal")]}),
("Subjects who, in the opinion of the investigator, are unlikely to cooperate fully during the study ",{"entities":[]}),
("Known history or present abuse of alcohol or drugs ",{"entities":[(25,50,"Nominal")]}),
("Use of other investigational drugs 30 days prior to the date of randomization ",{"entities":[]}),
("Known hypersensitivity to zoledronic acid or other bisphosphonates ",{"entities":[(5,22,"Nominal")]}),
("Severe physical or psychological concomitant diseases that might impair compliance with the provisions of the study protocol or that might impair the assessment of drug or patient safety, e.g. clinically significant ascites, cardiac failure, NYHA III or IV, clinically relevant pathologic findings in ECG ",{"entities":[(7,53,"Nominal"),(225,240,"Nominal"),(258,304,"Nominal")]}),
("History of diseases with influence on bone metabolism such as Paget's disease and primary hyperparathyroidism ",{"entities":[(62,77,"Nominal"),(82,109,"Nominal"),(38,53,"Nominal")]}),
("Patients with clinically symptomatic brain metastases ",{"entities":[(37,53,"Nominal")]}),
("Corrected (adjusted for serum albumin) serum calcium concentration < 8.0 mg/dl (2.00 mmol/L) or 12.0 mg/dl (3.00 mmol/L). ",{"entities":[(39,93,"Numeric"),(97,122,"Numeric")]}),
("Abnormal renal function as evidenced by a calculated creatinine clearance < 30 ml/minute. ",{"entities":[(0,24,"Nominal"),(42,89,"Numeric")]}),
("Prior treatment with a bisphosphonate ",{"entities":[(0,24,"Nominal"),(42,89,"Numeric")]})
         
]


### 2. Convert the annotated data into the spaCy bin object

#### in spaCy 3.x, annotated should be converted to a doc bin 

In [67]:
import spacy
from spacy.tokens import DocBin
from tqdm import tqdm
nlp = spacy.blank('en') # load a new spacy model
db = DocBin() # create a DocBin object
for text, annot in tqdm(trainData): # data in previous format
    doc = nlp.make_doc(text) # create doc object from text
    ents = []
    for start, end, label in annot['entities']: # add character indexes
        span = doc.char_span(start, end, label=label, alignment_mode='contract')
        if span is None:
            print('Skipping entity')
        else:
            ents.append(span)
    try:
        doc.ents = ents # label the text with the ents
        db.add(doc)
    except:
        print(text, annot)
db.to_disk('./train.spacy') # save the docbin object

 45%|████▌     | 175/387 [00:00<00:00, 712.23it/s]

Skipping entity
Skipping entity
Skipping entity
Skipping entity
Skipping entity
Skipping entity
Skipping entity


100%|██████████| 387/387 [00:00<00:00, 800.18it/s]

Skipping entity
Skipping entity
Skipping entity
Skipping entity





### 3. Generate the config file from the [spaCy website][https://spacy.io/usage/training]

Select the preferred language and component as ner. As per system requirement, choose CPU/GPU and save this configuration as base_config.cfg

To fill the remaining system defaults, run this command on the command line to generate the config.cfg file.

#### Generate the config file to train via Command line 

In [68]:
!python -m spacy init fill-config /home/sobha/Orion-CustomNER/base_config.cfg /home/sobha/Orion-CustomNER/config.cfg

[38;5;2m✔ Auto-filled config with all values[0m
[38;5;2m✔ Saved config[0m
/home/sobha/Orion-CustomNER/config.cfg
You can now add your data and train your pipeline:
python -m spacy train config.cfg --paths.train ./train.spacy --paths.dev ./dev.spacy


### 4. Train the model in the command line.

In [8]:
#in base_config.cfg file edited paths as below
#train = ./train.spacy
#dev = ./dev.spacy

#### To save the  model output to a folder named "Output" as an argument at the command line.

In [69]:
import os
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"
!python -m spacy train /home/sobha/Orion-CustomNER/config.cfg --paths.train /home/sobha/Orion-CustomNER/train.spacy --paths.dev /home/sobha/Orion-CustomNER/dev.spacy --output /home/sobha/Orion-CustomNER/output

[38;5;4mℹ Saving to output directory: /home/sobha/Orion-CustomNER/output[0m
[38;5;4mℹ Using CPU[0m
[1m
[2022-01-20 15:03:24,395] [INFO] Set up nlp object from config
[2022-01-20 15:03:24,412] [INFO] Pipeline: ['tok2vec', 'ner']
[2022-01-20 15:03:24,421] [INFO] Created vocabulary
[2022-01-20 15:03:24,422] [INFO] Finished initializing nlp object
[2022-01-20 15:03:25,165] [INFO] Initialized pipeline components: ['tok2vec', 'ner']
[38;5;2m✔ Initialized pipeline[0m
[1m
[38;5;4mℹ Pipeline: ['tok2vec', 'ner'][0m
[38;5;4mℹ Initial learn rate: 0.001[0m
E    #       LOSS TOK2VEC  LOSS NER  ENTS_F  ENTS_P  ENTS_R  SCORE 
---  ------  ------------  --------  ------  ------  ------  ------
  0       0          0.00     34.70    0.00    0.00    0.00    0.00
  3     200        357.19   4007.50   40.74   51.16   33.85    0.41
  7     400        580.98   1765.60   67.27   82.22   56.92    0.67
 12     600        967.82    795.81   70.91   86.67   60.00    0.71
 18     800        344.88    4

In [10]:
# !spacy train /home/sobha/Orion-CustomNER/config.cfg --output /home/sobha/Orion-CustomNER/output --paths.train /home/sobha/Orion-CustomNER/train.spacy --paths.dev /home/sobha/Orion-CustomNER/train.spacy

### 5. Load & Test the model

### Load the model.

In [112]:
import spacy

nlp = spacy.load('/home/sobha/Orion-CustomNER/output/model-last') #load the model-last

### Take the unseen data to test the model prediction.

In [113]:
sentence = 'Aged 10 or older, myocardial ischemia, able to undergo PTCA, stenting and CABG'

doc = nlp(sentence)

from spacy import displacy
displacy.render(doc, style='ent', jupyter=True)


In [89]:

nlp2 = spacy.load('/home/sobha/Orion-CustomNER/output/model-best') #load the model-best

In [114]:
sentence = 'Aged 10 or older, myocardial ischemia, able to undergo PTCA, stenting and CABG'

doc = nlp(sentence)

from spacy import displacy
displacy.render(doc, style='ent', jupyter=True)

In [91]:
sentence2='Type 1 Myocardial infarction Have at least two coronary artery territories of disease > 50% Be on treatment for Diabetes'
doc2 = nlp(sentence2)

from spacy import displacy
displacy.render(doc2, style='ent', jupyter=True)

In [115]:
sentence ='Age >= 18 Newly diagnosed multiple myeloma Ineligible for autologous stem cell transplant '
doc = nlp(sentence)

from spacy import displacy
displacy.render(doc, style='ent', jupyter=True)

In [116]:
sentence ='Age>=18'
doc = nlp(sentence)

from spacy import displacy
displacy.render(doc, style='ent', jupyter=True)

In [117]:
sentence ='age less than 18'
doc = nlp(sentence)

from spacy import displacy
displacy.render(doc, style='ent', jupyter=True)

In [118]:
sentence ='age greater than 18'
doc = nlp(sentence)

from spacy import displacy
displacy.render(doc, style='ent', jupyter=True)

In [119]:
sentence ='Life expectancy greater than or equal to 6 months' 
doc = nlp(sentence)

from spacy import displacy
displacy.render(doc, style='ent', jupyter=True)

In [96]:
sentence ='Metastatic breast cancer (BR) Metastatic ovarian cancer (OV) Metastatic endometrial cancer (EM) Metastatic cervical cancer (CX)'
doc = nlp(sentence)
from spacy import displacy
displacy.render(doc, style='ent', jupyter=True)

In [120]:
sentence='type 2 diabetic, age 18 and over, informed consent'
doc = nlp(sentence)
from spacy import displacy
displacy.render(doc, style='ent', jupyter=True)

In [121]:
sentence='Aged between 18-70 years old ,Diagnosis of Fatty Liver Disease ,Have a BMI between 25 to 50 kg/m2'
doc = nlp(sentence)
from spacy import displacy
displacy.render(doc, style='ent', jupyter=True)

In [24]:
# observed  age<=18 is not tretaed as numeric and havea bmi between  is treated as nominal

In [123]:
sentence='''Aged between 18-70 years old ,Diagnosis of Fatty Liver Disease ,Have a BMI between 25 to 50 kg/m2,
            type 2 diabetic,age 18 and over, informed consent
            Metastatic breast cancer (BR) Metastatic ovarian cancer (OV) Metastatic endometrial cancer (EM) Metastatic cervical cancer (CX)
            Age >= 18 Newly diagnosed multiple myeloma 
            Type 1 Myocardial infarction Have at least two coronary artery territories of disease > 50% Be on treatment for Diabetes
            Aged 18 or older, myocardial ischemia, able to undergo PTCA, stenting and CABG
            Pre-existing eye diseases (glaucoma). Having a remote infection
            History of immunodeficiency, 

          '''
doc = nlp(sentence)
from spacy import displacy
displacy.render(doc, style='ent', jupyter=True)

In [52]:
# Testing Spacy NER model
#spacy evaluate ./dev.spacy

## Age information Extraction

In [109]:
import spacy
from spacy.lang.en import English
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)

patterns = [
   [{"LOWER" : "age>="},{"LIKE_NUM" : True},{"LEMMA" : "year"}],
   [{"LOWER" : "age<="},{"LIKE_NUM" : True},{"LEMMA" : "year"}], 
   [{"LOWER" : "age"},{"ORTH" : "<"},{"LIKE_NUM" : True},{"LEMMA": "year"}], 
   [{"LOWER" : "age"},{"ORTH" : ">"},{"LIKE_NUM" : True},{"LEMMA": "year"}],
   [{"LOWER" : "age"},{"ORTH" : "<"},{"ORTH" : "="},{"LIKE_NUM" : True},{"LEMMA": "year"}], 
   [{"LOWER" : "age"},{"ORTH" : ">"},{"ORTH" : "="},{"LIKE_NUM" : True},{"LEMMA": "year"}],
   [{"LOWER" : "aged"},{"LIKE_NUM" : True},{"LOWER" : "or"},{"LEMMA" : "old"}],
   [{"LOWER" : "aged"},{"LIKE_NUM" : True},{"LOWER" : "and"},{"LEMMA" : "old"}],
   [{"LOWER" : "aged"},{"LIKE_NUM" : True},{"LOWER" : "and"},{"LOWER" : "under"}],
   [{"LOWER" : "aged"},{"LIKE_NUM" : True},{"LOWER" : "or"},{"LOWER" : "under"}],
   [{"LOWER" : "aged"},{"LIKE_NUM" : True},{"LOWER" : "and"},{"LOWER" : "over"}],
   [{"LOWER" : "aged"},{"LIKE_NUM" : True},{"LOWER" : "or"},{"LOWER" : "over"}],
   [{"LOWER" : "aged"},{"LIKE_NUM" : True},{"LEMMA": "year"},{"LOWER" : "or"},{"LEMMA" : "old"}],
   [{"LOWER" : "aged"},{"LIKE_NUM" : True},{"LOWER" : "and"},{"LEMMA": "year"},{"LEMMA" : "old"}],
   [{"LOWER" : "aged"},{"LIKE_NUM" : True},{"LOWER" : "and"},{"LEMMA": "year"},{"LOWER" : "under"}],
   [{"LOWER" : "aged"},{"LIKE_NUM" : True},{"LOWER" : "or"},{"LEMMA": "year"},{"LOWER" : "under"}],
   [{"LOWER" : "aged"},{"LIKE_NUM" : True},{"LOWER" : "and"},{"LEMMA": "year"},{"LOWER" : "over"}],
   [{"LOWER" : "aged"},{"LIKE_NUM" : True},{"LOWER" : "or"},{"LEMMA": "year"},{"LOWER" : "over"}],
   [{"LOWER" : "age"},{"LIKE_NUM" : True},{"LEMMA": "year"},{"LOWER" : "or"},{"LEMMA" : "old"}],
   [{"LOWER" : "age"},{"LIKE_NUM" : True},{"LOWER" : "and"},{"LEMMA": "year"},{"LEMMA" : "old"}],
   [{"LOWER" : "age"},{"LIKE_NUM" : True},{"LOWER" : "and"},{"LEMMA": "year"},{"LOWER" : "under"}],
   [{"LOWER" : "age"},{"LIKE_NUM" : True},{"LOWER" : "or"},{"LEMMA": "year"},{"LOWER" : "under"}],
   [{"LOWER" : "age"},{"LIKE_NUM" : True},{"LOWER" : "and"},{"LEMMA": "year"},{"LOWER" : "over"}],
   [{"LOWER" : "age"},{"LIKE_NUM" : True},{"LOWER" : "or"},{"LEMMA": "year"},{"LOWER" : "over"}],
   [{"LOWER" : "aged"},{"TEXT" : "between"},{"LIKE_NUM" : True},{"ORTH": "-"},{"LIKE_NUM" : True},{"LEMMA": "year"}],
   [{"LOWER" : "age"},{"ORTH" : "<"},{"LIKE_NUM" : True}], 
   [{"LOWER" : "age"},{"ORTH" : ">"},{"LIKE_NUM" : True}],  
   [{"TEXT" : "age <="},{"LIKE_NUM" : True},{"LEMMA": "year"}],
   [{"TEXT" : "age >="},{"LIKE_NUM" : True},{"LEMMA": "year"}],
   [{"LOWER" : "age"},{"LOWER" : "greater"},{"LOWER" : "than"},{"LIKE_NUM" : True},{"LEMMA": "year"}],
   [{"LOWER" : "age"},{"LOWER" : "less"},{"LOWER" : "than"},{"LIKE_NUM" : True},{"LEMMA": "year"}],
   [{"LOWER" : "age"},{"LOWER" : "over"},{"LIKE_NUM" : True},{"LEMMA": "year"}],
   [{"LOWER" : "age"},{"LOWER" : "under"},{"LIKE_NUM" : True},{"LEMMA": "year"}],
   [{"LOWER" : "age"},{"LOWER" : "above"},{"LIKE_NUM" : True},{"LEMMA": "year"}], 
   [{"LOWER" : "age"},{"LOWER" : "below"},{"LIKE_NUM" : True},{"LEMMA": "year"}],
   [{"LOWER" : "aged"},{"LOWER" : "over"},{"LIKE_NUM" : True},{"LEMMA": "year"}],
   [{"LOWER" : "aged"},{"LOWER" : "under"},{"LIKE_NUM" : True},{"LEMMA": "year"}],
   [{"LOWER" : "aged"},{"LOWER" : "above"},{"LIKE_NUM" : True},{"LEMMA": "year"}], 
   [{"LOWER" : "aged"},{"LOWER" : "below"},{"LIKE_NUM" : True},{"LEMMA": "year"}],
   [{"LOWER" : "aged"},{"LOWER": "between"},{"LIKE_NUM" : True},{"TEXT" : "and"},{"LIKE_NUM" : True}], 
   [{"LOWER" : "age"},{"LOWER": "between"},{"LIKE_NUM" : True},{"TEXT" : "and"},{"LIKE_NUM" : True}],
   [{"LIKE_NUM": True}, {"LEMMA": "year"}, {"LEMMA": "old"}],
   [{"LEMMA" : "age"}, {"LIKE_NUM" : True}, {"LEMMA": "year"}],
   [{"LOWER" : "age"}, {"LIKE_NUM" : True}, {"LEMMA": "year"}],
    [{"LOWER": "between"},{"LIKE_NUM" : True},{"TEXT" : "to"},{"LIKE_NUM" : True},{"LEMMA": "year"},{"TEXT" : "of"},{"LOWER" : "age"},],
    [{"LOWER" : "aged"},{"LOWER": "between"},{"LIKE_NUM" : True},{"TEXT" : "to"},{"LIKE_NUM" : True},{"LEMMA": "year"}], 
  # [{"Text" : "Age"}, {"LIKE_NUM" : True}, {"TEXT" : "or older"}]
  # [{"Text" : "Aged"}, {"LIKE_NUM" : True}, {"TEXT" : "or older"}]
  #  [{"TEXT" IN{'age','ages','aged'},{"LIKE_NUM" : True}]
  ] 

matcher.add("age_rule", patterns)



In [110]:
text = '''Aged 22 and older, undergoing 1 or 2 level spinal decompression.Age over 18 years.Age less than 65 years, have diagnosed ulcerative colitis 
          Aged 20 or older, myocardial ischemia, 
          Aged over 18 years Confirmed diagnosis of bronchiectasis within 5 years
          Aged between 18-70 years old Diagnosis of Non-alcoholic Fatty Liver Disease
          Age >= 10 years Diagnosed with Relapsed or Refractory Multiple Myeloma
          Aged between 40 and 85 years Diagnosed with COPD 
          Women between 40 to 70 years of age.
          Aged 18 years or older Diagnosis of Ankylosing Spondylitis or Axial '''
doc = nlp(text)

matches = matcher(doc)
for match_id, start, end in matches:
    string_id = nlp.vocab.strings[match_id]  # Get string representation
    span = doc[start:end]  # The matched span
   # print(f'Matching ID ,  Age Criteria ,  Texy Matched is ')
    print(f'{match_id},      {string_id},    {start},   {end},     {span.text}')
   # data.frame(match_id,string_id,start, end, span.text)

3741705598786462532,      age_rule,    0,   4,     Aged 22 and older
3741705598786462532,      age_rule,    13,   17,     Age over 18 years
3741705598786462532,      age_rule,    18,   23,     Age less than 65 years
3741705598786462532,      age_rule,    29,   33,     Aged 20 or older
3741705598786462532,      age_rule,    38,   42,     Aged over 18 years
3741705598786462532,      age_rule,    50,   56,     Aged between 18-70 years
3741705598786462532,      age_rule,    54,   57,     70 years old
3741705598786462532,      age_rule,    66,   71,     Age >= 10 years
3741705598786462532,      age_rule,    79,   84,     Aged between 40 and 85
3741705598786462532,      age_rule,    90,   97,     between 40 to 70 years of age
3741705598786462532,      age_rule,    99,   104,     Aged 18 years or older
