# EHost Reader for the Overlapped annotated spans

Spacy needs to be upgraded to `v3.4`.

`medspacy_io/reader/base_reader.py` has been modified where the overlapped spans are stored in the `spacy.tokens.SpanGroup` objects. The name of each `SpanGroup` is the concept name.
The concepts for each `Doc` are saved as list `Doc._.concepts`, which can be used as the list of keys to extract certain `SpanGroup`.

The documentation of `SpanGroup` is at https://spacy.io/api/spangroup.
***Remark:*** It is necessary to uninstall medspacy-io to avoid conflict. Otherwise, the older version of medspacy-io will be loaded. 

In [1]:
import sys
sys.path.append("../") #need to uninstall medspacy-io to test the package code.
sys.path.append("../medspacy")
print(sys.path)
from spacy.lang.en import English
from medspacy_io.reader import EhostDocReader

['/Users/u6022257/Documents/medspacy_io/dev_notebooks', '/Users/u6022257/opt/anaconda3/lib/python39.zip', '/Users/u6022257/opt/anaconda3/lib/python3.9', '/Users/u6022257/opt/anaconda3/lib/python3.9/lib-dynload', '', '/Users/u6022257/opt/anaconda3/lib/python3.9/site-packages', '/Users/u6022257/opt/anaconda3/lib/python3.9/site-packages/aeosa', '/Users/u6022257/opt/anaconda3/lib/python3.9/site-packages/medspacy-0.2.0.0-py3.9.egg', '/Users/u6022257/opt/anaconda3/lib/python3.9/site-packages/medspacy_quickumls-2.3-py3.9.egg', '/Users/u6022257/opt/anaconda3/lib/python3.9/site-packages/quickumls_simstring-1.1.5.post1-py3.9-macosx-10.9-x86_64.egg', '../', '../medspacy']


## Remark:
1.  In our case it is `support_overlap=True` and `new_version=True`.
2. `EhostDocReader` will search default folder according to the directory of data. The data and annotations have to be organized as eHost Project:
    1. schema file: `../config/projectsschema.xml`
    2. data folder: `../corpus/*.txt`
    3. annotation folder: `../saved/*.txt.knowtator.xml`

In [2]:
ereader = EhostDocReader(nlp=English(), schema_file='../tests/data/ehost_test_corpus3_overlap/config/projectschema.xml',support_overlap=True,new_version=True)
doc = ereader.read('../tests/data/ehost_test_corpus3_overlap/corpus/18305.txt')

check if doc has extension concepts: False
True False
setting the type of concept...
getting concept...
type of existing_concepts: <class 'list'> True


Now all the concepts are stored in the list:

In [3]:
doc._.concepts

['Symptom_Section', 'SectionHeader_HasSymptom', 'Symptom']

Now extract SpanGroup `Symptom_Section`:

In [4]:
doc.spans['Symptom_Section']

[HISTORY OF PRESENT ILLNESS:  Ms. [**Known lastname 50463**] was an 83-year-old
female with a history of polymyalgia rheumatica,
hypercholesterolemia, hypothyroidism, vertigo, postural
hypotension, and a history of syncope in the past who now is
on Florinef and presented to [**Hospital1 **] [**Location (un) 620**]
emergency department on [**2180-9-22**] after a syncopal episode
at home.  She reported that she passed out after urinating
while on the toilet.  She awoke and called her primary care
physician, [**Name10 (NameIs) 1023**] evaluated her in the office and suspected
dehydration, rehydrated her with fluids, and sent her home.
At home she continued to feel poorly, and her primary care
physician told her to return to the Emergency Department.  At
[**Location (un) 620**] emergency department on [**2180-9-23**] EKG revealed
polymorphic ventricular tachycardia with rate in the 130s and
blood pressure in the 140s to 150s/60s.  She was afebrile at
this time.

Labs in the Emergency Room 

You can check if the spans are overlapped in each SpanGroup; you can also extract each span from certain SpanGroup

In [5]:
print(doc.spans['Symptom_Section'].has_overlap)

span1 = doc.spans['Symptom_Section'][0]
print(span1.label_)
print(span1.start_char, span1.end_char)

False
Symptom_Section
128 2616


`SpanGroup` can also be added together in `Spacy 3.4`:

In [6]:
import spacy
print(spacy.__version__)
#entire_SpanGroup = doc.spans['Symptom_Section']+doc.spans['SectionHeader_HasSymptom']#+doc.spans['Symptom']
#print(entire_SpanGroup.has_overlap)
#print('length of each spanGroup:',len(entire_SpanGroup),len(doc.spans['Symptom_Section']),len(doc.spans['SectionHeader_HasSymptom']),len(doc.spans['Symptom']))

3.1.6


***Spacy3.1*** supports SpanGroup but does not support some methods such as SpanGroup addition, which are included in ***Spacy3.4***.  