# Exporting Extracted Medical Records to MongoDB

As a means of storage, we export out our extracted data from the reports to MongoDB

## Import pymongo, a MongoDB client for python

In [1]:
from pymongo import MongoClient

## Connect to a MongodDB cluster
  
After creating a cluster in MongoDB atlas, we can connect to it using the credentials provided  

In [3]:
cluster = "mongodb+srv://ai-champions:N8hObJPzoQamtWOn@cluster0.eesjih9.mongodb.net/?retryWrites=true&w=majority"

client = MongoClient(cluster)

## Check MongoDB databases

In [4]:
client.list_database_names()

['ai_project', 'admin', 'local']

We already have a database created, named 'ai_project' from the atlas console

## Select a database to perform actions

From the ;ist above, we choose a db

In [5]:
db = client["ai_project"]

## List collections in the database

We store data records in collections in MongoDB, we created a collection named "extracted_medical_records" in the database.

In [21]:
db.list_collection_names()

['extracted_medical_records']

## Test adding a record into the collection

Let's create a dummy data to add into the collection

In [7]:
record = {
    'DATE': ['age 20'],
    'GPE': ['Banepa', 'Dhulikhel'],
    'ORG': ['Kathmandu University'],
    'PERSON': ['John Doe'],
    'RAILA': 'Solti'
}

In [9]:
# select a collectioo
extracted_medical_records = db["extracted_medical_records"]

# Add record
extracted_medical_records.insert_one(record)

## Extract NER data from records and store it in MongoDB

We then perfom NER on sample data and then store the result in MongoDB

In [10]:
# import ner fuction
from ner.medical_ner import extract_data

### Apply NER only on the first 10 records, and then store it

In [12]:
records = []

for i in range(10):
    records.append(extract_data(f"assets/txt_records/report_{i}.txt"))

2022-12-11 18:15:01 INFO: Checking for updates to resources.json in case models have been updated.  Note: this behavior can be turned off with download_method=None or download_method=DownloadMethod.REUSE_RESOURCES
Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/main/resources_1.4.1.json: 193kB [00:00, 10.3MB/s]                                
2022-12-11 18:15:03 INFO: Loading these models for language: en (English):
| Processor | Package                  |
----------------------------------------
| tokenize  | combined                 |
| ner       | i2b2;radiology;ontonotes |

2022-12-11 18:15:03 INFO: Use device: gpu
2022-12-11 18:15:03 INFO: Loading: tokenize
2022-12-11 18:15:06 INFO: Loading: ner
2022-12-11 18:15:08 INFO: Done loading processors!
2022-12-11 18:15:14 INFO: Checking for updates to resources.json in case models have been updated.  Note: this behavior can be turned off with download_method=None or download_method=DownloadMethod.REUSE_RESOURCE

### After applying NER, we get the following data from the report

In [16]:
records[1]

{'TREATMENT': ['Left Heart Catheterization',
  'the procedure',
  '2% lidocaine',
  'A 6 French sheath',
  'modified Seldinger technique',
  'A 6 French angled pigtail',
  'aortic valve pullback',
  '6 French JL4 and JR4 catheters',
  'a previously placed stent',
  'the stent',
  'a previous stent',
  'a previously placed stent in the mid vessel',
  'The previously placed stent in the proximal and middle vessel',
  'a previously placed stent in the distal vessel'],
 'ANATOMY_MODIFIER': ['right',
  'interventricular',
  'groove',
  'apex',
  'proximal',
  'mid',
  'diagonal',
  'branch',
  'distal',
  'portion',
  'left',
  'anterior',
  'descending',
  'marginal',
  'obtuse'],
 'ANATOMY': ['groin',
  'right femoral artery',
  'coronary',
  'main artery',
  'vessel',
  'anterior descending artery',
  'circumflex artery',
  'coronary artery',
  'ventricle'],
 'TEST': ['left ventriculography',
  'selective coronary angiography',
  'Multiple views',
  'the prior angiogram',
  'Ejection fra

### Insert records to MongoDB

We can interatively insert the records by using '*insert_one*' function or use '*insert_many*'

In [17]:
for record in records:
    extracted_medical_records.insert_one(record)

### Display data from the database

We can display the data available in the MongoDB collection as follows

In [20]:
import pprint

cursor = extracted_medical_records.find({})
for document in cursor:
      pprint.pprint(document)

{'DATE': ['age 20'],
 'GPE': ['Banepa', 'Dhulikhel'],
 'ORG': ['Kathmandu University'],
 'PERSON': ['John Doe'],
 'RAILA': 'Solti',
 '_id': ObjectId('6395c92f4fe5e6d8f7437c72')}
{'ANATOMY': ['fascia',
             'midline',
             'rectus muscle',
             'peritoneum',
             'bladder',
             'uterus',
             'bowel',
             'peritoneal',
             'ligament',
             'ovaries',
             'broad ligament',
             'uterine arteries',
             'uterosacral ligaments',
             'cervix',
             'ovary',
             'ureter',
             'ovarian',
             'peritoneal cavity',
             'pelvis'],
 'ANATOMY_MODIFIER': ['dorsal',
                      'layer',
                      'superiorly',
                      'left',
                      'lateral',
                      'wall',
                      'anterior',
                      'bilaterally',
                      'right'],
 'OBSERVATION': ['incised'