## Processing Raw Brat files
Reads Brat format files and convert to a dictionary containing text, entity_list and role list (consisting of id, offset, category, span text), relation labels with entity id and role id mappings.

##Dataset Exploratory Analysis:


1.	Each entity is associated with at least one or multiple roles.
2.	Tailored to dataset entities are marked only if there is any role attached to it.
3.  A role may be associated with 0 or more entities.
4.	A word/span can be both entity and role and can be related to itself at the same time.
5.   Some of the stopwords and punctuations are also included in the spans of entities and roles.
6.  Significance of overlap has been seen between the roles: Status, Method, Type, Amount.
7.  Significance overlap has been seen between the entities: LivingSItuation, Family, Marital Status, and Residence.
8.  Some words are overlapping between entities and roles with depths >=2.
9. 27 of files have missing annotations.
10. Total files present are 364 and we have processed 337 leaving the empty annotations.

Running procedure:
Provide the file directory which has raw files of annotations and text in annotation file dir and give the save path in generic_format_file_path for processed data and perform run all.


annotation_file_dir = '/content/drive/MyDrive/PHD_assessment_gmu/SocialHistoryMTSamples/'
generic_format_file_path='/content/drive/MyDrive/PHD_assessment_gmu/SocialHistoryMTSamples.json'

In [None]:
import os
import pandas as pd
from itertools import chain
import nltk
from nltk.tokenize import word_tokenize
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

In [None]:
import collections
import json


In [None]:
annotation_file_dir = '/content/drive/MyDrive/PHD_assessment_gmu/data/raw_files/SocialHistoryMTSamples/'
generic_format_file_path='/content/drive/MyDrive/PHD_assessment_gmu/data/proceSocialHistoryMTSamples.json'

In [None]:
# Given the file paths of the uploaded files, let's read and structure the data.

def read_brat_text_file(file_path):
    """
    Reads a text file and returns its content.
    """
    with open(file_path, 'r', encoding='utf-8') as file:
        return file.read()

def read_brat_annotation_file(file_dir,file_name):
    """
    Reads a BRAT standoff annotation file and returns its content in a structured format.
    """
    entity_list=[]
    role_list=[]
    events_list=[]
    attributes_list=[]
    #Checking if the file is empty or not:
    file_path=file_dir+file_name
    if os.stat(file_path).st_size == 0:
      print("Empty file")
      return [],[],[],[],"Empty file"
    #Reading the file:
    with open(file_path, 'r', encoding='utf-8') as file:
        for line in file:
            parts = line.strip().split('\t')
            if parts[0].startswith('T'):
              #Entity_type and role extraction:
              entity_info=parts[1].split(' ')
              entity_category=entity_info[0]
              if entity_category in ["Status", "Type", "Temporal", "Method", "Amount", "Frequency", "History", "ExposureHistory", "QuitHistory", "LivingStatus", "MedicalCondition", "Location", "Extent","Other"]:
                role_list.append({'role_id':int(parts[0][1:]),'entity_type':"Role",'entity_category':entity_info[0],'entity_strt_pos':entity_info[1],'entity_end_pos':entity_info[2],'entity_text':parts[2]})
              else:
                entity_list.append({'entity_id':int(parts[0][1:]),'entity_type':"Entity",'entity_category':entity_info[0],'entity_strt_pos':entity_info[1],'entity_end_pos':entity_info[2],'entity_text':parts[2]})
            elif parts[0].startswith('E'):
              #Relations extraction:
              event_level=[]
              relations_lst_raw=parts[1].split(' ')
              primary_entity_id=int(relations_lst_raw[0].split(':T')[1])
              for i in range(1,len(relations_lst_raw)):
                event_level.append(int(relations_lst_raw[i].split(':T')[1]))
              events_list.append({'Event_id':int(parts[0][1:]),'entity_id':primary_entity_id,'Related_roles':event_level})

            elif parts[0].startswith('A'):
              #Attributes extraction:
              if parts[1].split(' ')[0]=='Value':
                attributes_list.append({'Attribute_id':int(parts[0][1:]),'Attribute_type':parts[1].split(' ')[0],'Role_id':int(parts[1].split(' ')[1][1:]),'Attribute_value':parts[1].split(' ')[2]})
              else:
                print('Invalid Attribute')
                print('\n')
                print(file_name)
                print('\n')
                print(parts)
                print('\n')
            else:
              print('Invalid Annotation')
              print(file_name)
              print('\n')
              print(parts)
              print('\n')

    return entity_list, role_list, events_list, attributes_list, "Success"

def read_brat_annotation_folder(annotation_file_dir,ignore_missing_annotations=True):
  ann_files = [f[:-4] for f in os.listdir(annotation_file_dir) if f.endswith('.ann')]
  txt_files = [f[:-4] for f in os.listdir(annotation_file_dir) if f.endswith('.txt')]
  #Checking any text files in the directory doesnt have the corresponding annotations.
  file_difference=set(txt_files)-set(ann_files)
  if len(file_difference)>0:
    print("The following files are missing annotations:")
    print(file_difference)
  else:
    print("All files have annotations")
  if ignore_missing_annotations:
    data_inp_lst=[]
    empty_files=[]
    empty_entities=[]
    empty_roles=[]
    empty_events=[]
    empty_attributes=[]
    for ann_file_name in ann_files:
      text_file_path = annotation_file_dir+ann_file_name+'.txt'
      annotation_file_name = ann_file_name+'.ann'
      ann_entity_list,ann_role_list,ann_events_list, ann_attributes_list,status = read_brat_annotation_file(annotation_file_dir,annotation_file_name)
      if status == "Empty file":
        empty_files.append(ann_file_name)
      else:
        if len(ann_entity_list)==0:
          empty_entities.append(ann_file_name)
        if len(ann_role_list)==0:
          empty_roles.append(ann_file_name)
        if len(ann_events_list)==0:
          empty_events.append(ann_file_name)
        if len(ann_attributes_list)==0:
          empty_attributes.append(ann_file_name)
        text_content = read_brat_text_file(text_file_path)
        data_inp_lst.append({
                  'text': text_content,
                  'entity_list': ann_entity_list,
                  'role_list': ann_role_list,
                  'events_list': ann_events_list,
                  'attributes_list': ann_attributes_list,
                  'file_name': ann_file_name
                  })
  return data_inp_lst, empty_files, empty_entities, empty_roles, empty_events, empty_attributes


annotated_data, empty_files, empty_entities, empty_roles, empty_events, empty_attributes=read_brat_annotation_folder(annotation_file_dir)



All files have annotations
Empty file
Empty file
Empty file
Empty file
Empty file
Empty file
Empty file
Empty file
Empty file
Empty file
Empty file
Empty file
Empty file
Empty file
Empty file
Empty file
Empty file
Empty file
Empty file
Empty file
Empty file
Empty file
Empty file
Empty file
Empty file
Empty file
Empty file
Invalid Annotation
174_Consult-HistoryandPhy.-FlankPain-Consult_8.ann


['#1', 'AnnotatorNotes T5', 'Denies "abuse" vs. Denies "use" - no way to specify this in status...']




In [None]:
len(empty_files)

27

In [None]:
empty_entities

[]

In [None]:
empty_roles

[]

In [None]:
empty_events

[]

In [None]:
empty_attributes

['402_Consult-HistoryandPhy.-OtitisMedia-H&P_7',
 '478_Consult-HistoryandPhy.-StatusEpilepticus_7',
 '18_Consult-HistoryandPhy.-AnklePain-Consult_10',
 '335_Consult-HistoryandPhy.-Neuroblastoma-Consult_8',
 '286_Consult-HistoryandPhy.-ItchyRash-ERVisit_7',
 '511_Consult-HistoryandPhy.-Well-ChildCheck-6_2',
 '430_Consult-HistoryandPhy.-PsychConsult-Depression-1_10',
 '179_Consult-HistoryandPhy.-ForeignBody-RightNose_8']


### Aspects of Data
1.   There are 27 files which doesn't have annotations and we are excluding it for training and testing purposes but would be a small sample of unlabeled data for manual testing.
2.   Attribute has only one type i.e Value and mostly have values which say about the time presence and whether negated or not.
3. Value Attribute is only linked or it presence comes only when the Role Status is present.
4. Whenever an Entity is present there is always a role and an event.




In [None]:
with open(generic_format_file_path, 'w', encoding='utf-8') as file:
  json.dump(annotated_data, file)

# Data Analysis:


1.   Explore whether Entities and roles are word or span level
2.   Find the Descriptive statitsics of sentences , Entities, Roles
3.   Find the Cardinality relationship between entities and Roles.





In [None]:
def tokenize_sentence(sentence):
    tokens = word_tokenize(sentence)
    return tokens

In [None]:
annotated_df = pd.DataFrame(annotated_data)

In [None]:
annotated_df.shape

(337, 6)

### Entity analysis

In [None]:
def role_overlap(role_lst):
  # Sort the roles based on entity_strt_pos
  sorted_roles = sorted(role_lst, key=lambda x: int(x['entity_strt_pos']))

  # Find overlapping and encompassing roles
  overlapping_roles = []
  encompassing_roles_dict = {}

  for i in range(len(sorted_roles) - 1):
      current_role = sorted_roles[i]

      for j in range(i + 1, len(sorted_roles)):
          next_role = sorted_roles[j]

          if (
              int(current_role['entity_strt_pos']) <= int(next_role['entity_end_pos']) and
              int(next_role['entity_strt_pos']) <= int(current_role['entity_end_pos'])
          ):
              # Roles are overlapping
              overlapping_roles.append((current_role, next_role))

              # Determine which role is encompassing the other
              if (
                  int(current_role['entity_strt_pos']) <= int(next_role['entity_strt_pos']) and
                  int(current_role['entity_end_pos']) >= int(next_role['entity_end_pos'])
              ):
                  # current_role encompasses next_role
                  current_category = (current_role['entity_category'],(current_role['entity_strt_pos'],current_role['entity_end_pos']))
                  next_category = (next_role['entity_category'],(next_role['entity_strt_pos'],next_role['entity_end_pos']))

                  if current_category not in encompassing_roles_dict:
                      encompassing_roles_dict[current_category] = []

                  encompassing_roles_dict[current_category].append(next_category)

              elif (
                  int(next_role['entity_strt_pos']) <= int(current_role['entity_strt_pos']) and
                  int(next_role['entity_end_pos']) >= int(current_role['entity_end_pos'])
              ):
                  # next_role encompasses current_role

                  next_category=(next_role['entity_category'],(next_role['entity_strt_pos'],next_role['entity_end_pos']))
                  current_category=(current_role['entity_category'],(current_role['entity_strt_pos'],current_role['entity_end_pos']))


                  if next_category not in encompassing_roles_dict:
                      encompassing_roles_dict[next_category] = []

                  encompassing_roles_dict[next_category].append(current_category)
  return encompassing_roles_dict

In [None]:
entity_df=pd.DataFrame(list(chain(*(annotated_df['entity_list'].tolist()))))


In [None]:
entity_df['entity_category'].unique()

array(['Tobacco', 'Alcohol', 'Drug', 'MaritalStatus', 'Family',
       'LivingSituation', 'Residence', 'SexualHistory',
       'PhysicalActivity', 'Occupation', 'InfectiousDiseases',
       'EnvironmentalExposure'], dtype=object)

In [None]:
entity_df

Unnamed: 0,entity_id,entity_type,entity_category,entity_strt_pos,entity_end_pos,entity_text
0,1,Entity,Tobacco,35,42,tobacco
1,2,Entity,Alcohol,44,51,ethanol
2,3,Entity,Drug,56,60,drug
3,5,Entity,MaritalStatus,83,92,separated
4,8,Entity,Family,148,156,daughter
...,...,...,...,...,...,...
1340,13,Entity,Tobacco,158,163,smoke
1341,3,Entity,Occupation,64,69,works
1342,6,Entity,Family,259,263,baby
1343,10,Entity,Tobacco,278,283,smoke


In [None]:
entity_df['entity_category'].value_counts()

Tobacco                  278
Alcohol                  254
Family                   183
Drug                     154
Occupation               135
MaritalStatus            122
LivingSituation           98
Residence                 51
PhysicalActivity          31
EnvironmentalExposure     20
InfectiousDiseases        11
SexualHistory              8
Name: entity_category, dtype: int64

In [None]:
entity_df['tokenized_text']= entity_df['entity_text'].apply(tokenize_sentence)

In [None]:

entity_df['len_of_enity']=entity_df['tokenized_text'].str.len()


In [None]:


entity_df['len_of_enity'].describe()


count    1345.000000
mean        1.894424
std         2.110244
min         1.000000
25%         1.000000
50%         1.000000
75%         2.000000
max        24.000000
Name: len_of_enity, dtype: float64

In [None]:
entity_df[entity_df['len_of_enity']>1]['len_of_enity'].value_counts()

2     255
3      65
4      61
5      32
6      20
7       9
8       5
11      5
14      4
10      3
9       2
15      2
19      2
12      2
18      1
24      1
16      1
21      1
13      1
Name: len_of_enity, dtype: int64

Check : Does are there any overlapping entities

In [None]:
overlapping_entities_list=[]
entity_encompassing_list=[]
entity_compassed_by_list=[]
for ele in annotated_data:
  overlaps=role_overlap(ele['entity_list'])
  if len(overlaps)>0:
    overlapping_entities_list.append(overlaps)
    for key, value in overlaps.items():
      entity_encompassing_list.append(key)
      entity_compassed_by_list.extend(value)
overlapping_entities_list

[{('LivingSituation', ('172', '194')): [('Residence', ('172', '177'))]},
 {('LivingSituation', ('29', '48')): [('Family', ('44', '48'))]},
 {('LivingSituation', ('28', '50')): [('Family', ('43', '50'))]},
 {('LivingSituation', ('47', '69')): [('Family', ('62', '69'))]},
 {('LivingSituation', ('20', '57')): [('Family', ('31', '34')),
   ('Family', ('36', '39')),
   ('Family', ('41', '48')),
   ('Family', ('50', '57'))]},
 {('LivingSituation', ('126', '163')): [('Family', ('141', '145')),
   ('Family', ('150', '163'))]},
 {('LivingSituation', ('44', '89')): [('Residence', ('44', '49')),
   ('MaritalStatus', ('55', '74')),
   ('Family', ('55', '62'))],
  ('MaritalStatus', ('55', '74')): [('Family', ('55', '62'))]},
 {('LivingSituation', ('20', '52')): [('Family', ('31', '38')),
   ('Family', ('43', '51'))]},
 {('LivingSituation', ('43', '62')): [('Family', ('58', '62'))]},
 {('LivingSituation', ('29', '65')): [('Family', ('44', '48')),
   ('Family', ('53', '65'))]},
 {('LivingSituation', 

In [None]:
len(overlapping_entities_list)

45

### Role Analysis

In [None]:
role_df=pd.DataFrame(list(chain(*(annotated_df['role_list'].tolist()))))
role_df['entity_category'].value_counts()

Status              958
Type                655
Amount              269
Method              167
Frequency           139
Location             80
Temporal             48
ExposureHistory      46
QuitHistory          45
LivingStatus         19
Other                 9
MedicalCondition      9
Extent                3
History               1
Name: entity_category, dtype: int64

In [None]:
role_df['entity_category'].unique()

array(['Status', 'Type', 'Amount', 'Method', 'Location', 'Frequency',
       'Temporal', 'LivingStatus', 'Extent', 'Other', 'MedicalCondition',
       'QuitHistory', 'ExposureHistory', 'History'], dtype=object)

In [None]:
role_df['tokenized_text']= role_df['entity_text'].apply(tokenize_sentence)
role_df['len_of_enity']=role_df['tokenized_text'].str.len()
role_df['len_of_enity'].describe()


count    2448.00000
mean        1.83415
std         1.63491
min         1.00000
25%         1.00000
50%         1.00000
75%         2.00000
max        23.00000
Name: len_of_enity, dtype: float64

In [None]:
role_df['len_of_enity'].value_counts()

1     1467
2      523
3      217
4      123
5       49
6       28
7       15
8        6
9        5
10       3
14       3
15       2
19       1
16       1
23       1
13       1
11       1
18       1
21       1
Name: len_of_enity, dtype: int64

Check : Does are there any overlapping roles

In [None]:
overlapping_roles_list=[]
encompassing_list=[]
compassed_by_list=[]
for ele in annotated_data:
  overlaps=role_overlap(ele['role_list'])
  if len(overlaps)>0:
    overlapping_roles_list.append(overlaps)
    for key, value in overlaps.items():
      encompassing_list.append(key)
      compassed_by_list.extend(value)
overlapping_roles_list

[{('Status', ('180', '190')): [('Type', ('183', '190'))]},
 {('Status', ('25', '30')): [('Type', ('25', '30'))],
  ('Method', ('32', '43')): [('Status', ('32', '37'))]},
 {('Method', ('29', '48')): [('Status', ('29', '34')),
   ('Type', ('44', '48'))]},
 {('Type', ('64', '71')): [('Status', ('64', '71'))],
  ('Method', ('90', '101')): [('Status', ('90', '95'))],
  ('Type', ('105', '109')): [('Status', ('105', '109'))],
  ('Status', ('118', '125')): [('Method', ('118', '125'))]},
 {('Method', ('28', '50')): [('Status', ('28', '33')),
   ('Type', ('43', '50'))]},
 {('Method', ('28', '53')): [('Status', ('28', '33'))]},
 {('Method', ('37', '66')): [('Status', ('37', '50'))]},
 {('Method', ('47', '69')): [('Status', ('47', '52')),
   ('Type', ('62', '69'))]},
 {('Method', ('20', '56')): [('Status', ('20', '25')),
   ('Type', ('31', '34')),
   ('Type', ('36', '40')),
   ('Type', ('41', '48')),
   ('Type', ('50', '56'))]},
 {('Status', ('62', '72')): [('Type', ('65', '72'))],
  ('Status', ('

In [None]:
roles_overlap_freq=[]
role_freq_dict = {}
for role_dict in overlapping_roles_list:
  for key_role in role_dict.keys():
    roles_overlap_freq.append(key_role[0])
  for value_list in role_dict.values():
    for value_role in value_list:
      roles_overlap_freq.append(value_role[0])

collections.Counter(roles_overlap_freq)


Counter({'Status': 187,
         'Type': 130,
         'Method': 114,
         'Amount': 29,
         'Temporal': 8,
         'Frequency': 12,
         'LivingStatus': 5,
         'MedicalCondition': 4,
         'Other': 1,
         'Location': 3,
         'QuitHistory': 3,
         'ExposureHistory': 1})

In [None]:
overlapping_roles_list[0]

{('Status', ('180', '190')): [('Type', ('183', '190'))]}

In [None]:
len(overlapping_roles_list)

145

In [None]:
collections.Counter(encompassing_list)

Counter({('Status', ('180', '190')): 1,
         ('Status', ('25', '30')): 1,
         ('Method', ('32', '43')): 1,
         ('Method', ('29', '48')): 2,
         ('Type', ('64', '71')): 1,
         ('Method', ('90', '101')): 1,
         ('Type', ('105', '109')): 1,
         ('Status', ('118', '125')): 1,
         ('Method', ('28', '50')): 1,
         ('Method', ('28', '53')): 1,
         ('Method', ('37', '66')): 1,
         ('Method', ('47', '69')): 1,
         ('Method', ('20', '56')): 1,
         ('Status', ('62', '72')): 1,
         ('Status', ('74', '90')): 1,
         ('Method', ('126', '164')): 1,
         ('Method', ('44', '74')): 1,
         ('Status', ('62', '76')): 1,
         ('Method', ('106', '128')): 1,
         ('Method', ('20', '51')): 1,
         ('Status', ('105', '119')): 2,
         ('Method', ('23', '46')): 1,
         ('Status', ('28', '37')): 1,
         ('Method', ('43', '82')): 1,
         ('Status', ('28', '38')): 2,
         ('Method', ('43', '63')): 1,
   

In [None]:
collections.Counter(compassed_by_list)

Counter({('Type', ('183', '190')): 1,
         ('Type', ('25', '30')): 1,
         ('Status', ('32', '37')): 1,
         ('Status', ('29', '34')): 9,
         ('Type', ('44', '48')): 2,
         ('Status', ('64', '71')): 1,
         ('Status', ('90', '95')): 1,
         ('Status', ('105', '109')): 1,
         ('Method', ('118', '125')): 1,
         ('Status', ('28', '33')): 4,
         ('Type', ('43', '50')): 1,
         ('Status', ('37', '50')): 1,
         ('Status', ('47', '52')): 1,
         ('Type', ('62', '69')): 1,
         ('Status', ('20', '25')): 6,
         ('Type', ('31', '34')): 1,
         ('Type', ('36', '40')): 1,
         ('Type', ('41', '48')): 1,
         ('Type', ('50', '56')): 1,
         ('Type', ('65', '72')): 1,
         ('Method', ('77', '90')): 1,
         ('Status', ('126', '131')): 1,
         ('Type', ('141', '145')): 1,
         ('Amount', ('150', '154')): 1,
         ('Type', ('155', '164')): 1,
         ('Type', ('55', '62')): 2,
         ('Temporal', ('

### Event analysis

Check does a role gets shared between entities

In [None]:
def check_role_overlap_entities(data):
  all_roles = [role for d in data for role in d['Related_roles']]
  role_counts = collections.Counter(all_roles)
  duplicates = {role: count for role, count in role_counts.items() if count > 1}
  return sum(duplicates.values())

In [None]:
total_overlap_roles = 0
for ele in annotated_data:
  total_overlap_roles=total_overlap_roles+check_role_overlap_entities(ele['events_list'])
total_overlap_roles

391



1.   Is every role in a file is linked to at least one entity?
2.  Does each enity in a file s linked to at least one event?
3.  Does an entity occur only in 1 event or more?

In [None]:
def validation(ev_lst,ent_lst,rls_lst):
  # Check if each role_id in role_list is present in related_roles in events_list
  missing_roles = [[role['role_id'],role['entity_category']] for role in rls_lst if role['role_id'] not in [r_id for event in ev_lst for r_id in event['Related_roles']]]

  # Check if each entity_id in entity_list is present in entity_id in events_list
  missing_entities = [entity['entity_id'] for entity in ent_lst if entity['entity_id'] not in [event['entity_id'] for event in ev_lst]]

  # Find if any entity_id in events_list occurs more than once
  duplicate_entities = [item for item, count in collections.Counter([event['entity_id'] for event in ent_lst]).items() if count > 1]
  return missing_roles,missing_entities,duplicate_entities

missing_roles_cnt=0
missing_entities_cnt=0
duplicate_entities_cnt=0
missing_roles_files=[]
missing_entities_files=[]
for ele in annotated_data:
  ev_lst=ele['events_list']
  ent_lst=ele['entity_list']
  rls_lst=ele['role_list']
  filen_nm=ele['file_name']
  missing_roles,missing_entities,duplicate_entities=validation(ev_lst,ent_lst,rls_lst)
  missing_roles_cnt=missing_roles_cnt+len(missing_roles)
  missing_entities_cnt=missing_entities_cnt+len(missing_entities)
  duplicate_entities_cnt=duplicate_entities_cnt+len(duplicate_entities)
  if len(missing_roles)>0:
    missing_roles_files.append({'file_name':filen_nm,'missing_roles':missing_roles})
  if len(missing_entities)>0:
    missing_entities_files.append({'file_name':filen_nm,'missing_entities':missing_entities})

print(f"Role_ids not present in related_roles: {missing_roles_cnt}")
print(f"Entity_ids not present in events_list: {missing_entities_cnt}")
print(f"Duplicate entity_ids in events_list: {duplicate_entities_cnt}")
print(f"Files with missing roles: {missing_roles_files}")
print(f"Files with missing entities: {missing_entities_files}")

Role_ids not present in related_roles: 18
Entity_ids not present in events_list: 0
Duplicate entity_ids in events_list: 0
Files with missing roles: [{'file_name': '39_Consult-HistoryandPhy.-BipolarAffectiveDisorder-Consult_10', 'missing_roles': [[2, 'Method'], [3, 'Status']]}, {'file_name': '407_Consult-HistoryandPhy.-PediatricRheumatologyConsult_8', 'missing_roles': [[2, 'Status'], [3, 'Method']]}, {'file_name': '328_Consult-HistoryandPhy.-NeonatalConsult_9', 'missing_roles': [[17, 'Type'], [18, 'Amount']]}, {'file_name': '406_Consult-HistoryandPhy.-ParoxysmalAtrialFibrillation_10', 'missing_roles': [[2, 'Status'], [3, 'Type']]}, {'file_name': '470_Consult-HistoryandPhy.-Sleepiness-Consult_9', 'missing_roles': [[10, 'Type']]}, {'file_name': '81_Consult-HistoryandPhy.-Congestion-21-day-old_11', 'missing_roles': [[6, 'MedicalCondition']]}, {'file_name': '440_Consult-HistoryandPhy.-PsychH&P-2_8', 'missing_roles': [[35, 'Method']]}, {'file_name': '471_Consult-HistoryandPhy.-SmallBowelObst

In [None]:
missing_roles_files

[{'file_name': '39_Consult-HistoryandPhy.-BipolarAffectiveDisorder-Consult_10',
  'missing_roles': [[2, 'Method'], [3, 'Status']]},
 {'file_name': '407_Consult-HistoryandPhy.-PediatricRheumatologyConsult_8',
  'missing_roles': [[2, 'Status'], [3, 'Method']]},
 {'file_name': '328_Consult-HistoryandPhy.-NeonatalConsult_9',
  'missing_roles': [[17, 'Type'], [18, 'Amount']]},
 {'file_name': '406_Consult-HistoryandPhy.-ParoxysmalAtrialFibrillation_10',
  'missing_roles': [[2, 'Status'], [3, 'Type']]},
 {'file_name': '470_Consult-HistoryandPhy.-Sleepiness-Consult_9',
  'missing_roles': [[10, 'Type']]},
 {'file_name': '81_Consult-HistoryandPhy.-Congestion-21-day-old_11',
  'missing_roles': [[6, 'MedicalCondition']]},
 {'file_name': '440_Consult-HistoryandPhy.-PsychH&P-2_8',
  'missing_roles': [[35, 'Method']]},
 {'file_name': '471_Consult-HistoryandPhy.-SmallBowelObstruction_10',
  'missing_roles': [[7, 'Status']]},
 {'file_name': '70_Consult-HistoryandPhy.-CholangiocarcinomaConsult_9',
  'mi