# Code for interacting with API of openFDA

## Plan
In this project, we are looking for device adverse events. We are particularly interested in:
    - Cause of failure
    - Date of failure (ie age of device)
We want to see which is the most common cause for given device to fail and at which stage of use this occurs.
Relevant fields could be (for full reference see https://open.fda.gov/device/event/reference/):
### Event
device_date_of_manufacturer
date_of_event, date_report, date_received
previous_use_code, remedial_action
single_use_flag
### Source
reprocessed_and_reused_flag
### Device
device.generic_name
device.expiration_date_of_device, device.device_age_text
device.implant_flag, device.date_removed_flag
device.manufacturer_d_name, device.manufacturer_d_state, device.manufacturer_d_country
### Patient
patient.sequence_number_outcome, patient.sequence_number_treatment
### Report Text
mdr_text.text, mdr_text.text_type_code
### Reporter Dependent Fields
#### By user facility / importer
report_date
event_location
manufacturer_name, manufacturer_country
manufacturer_g1_name, manufacturer_g1_state
### OpenFDA fields
device_class
### Further interesting fields:
Source: reporter_occupation_code
Device: device.device_operator

In [1]:
import numpy as np
import pandas as pd
import json
import requests

baseurl = 'https://api.fda.gov/device/event.json?'
apikey = ''
with open('apikey.txt', 'r') as myfile:
    apikey = myfile.read().replace('\n', '')

In [105]:
# Example of querying, for complete guide go to: https://open.fda.gov/api/
# query = 'search=(device.generic_name:"stent"+AND+reprocessed_and_reused_flag:"Y"+AND+date_of_event:' + \
# '["20150324"+TO+"20170324"]+AND+(_exists_:device_date_of_manufacturer+AND+_exists_:date_of_event))&limit=10'
# # query2 = '_exists_:search=()'
# q1 = baseurl + 'api_key=' + apikey + '&' + query

In [2]:
# Example of quering, for complete guide go to: https://open.fda.gov/api/
start_date = '20150324'
end_date = '20170324'
query = 'search='
limit = 50

In [3]:
list_features = ['device_date_of_manufacturer', # features to check for existance
                 'date_of_event',
                 #'date_report',
                 #'date_received',
                 'previous_use_code',
                 #'remedial_action',
                 'single_use_flag',
                 'reprocessed_and_reused_flag',
                 #'reporter_occupation_code',
                 #'device.date_received',
                 #'device.generic_name' # this allows for empty string! 
                ]

list_features_specific = ['device.openfda.device_name:"sensor"',
                  #'device.implant_flag:"Y"',
                  #'previous_use_code:"I"', # I - initial use, R - reuse, U - unknown, * - invalid data
                  #'device.manufacturer_d_country:"US"' # SZ - Switzerland
                 ]

list_device_names = ["pump",
                    "sensor",
                    "prosthesis",
                    "defibrilator",
                    "pacemaker",
                    "catheter",
                    "electrode",
                    #"wearable",
                     "stent",
                     "ray",
                     "ventilator",
                     "bed",
                     "implant",
                     "lens",
                     #"mds" # https://www.cancer.org/cancer/myelodysplastic-syndrome/about/what-is-mds.html
                     "dialysis",
                     "graft",
                    ]
                  
    
# adding date range
query = query+"date_of_event:[\""+start_date+"\""+"+TO+"+"\""+end_date+"\"]"
for x in list_features:
    query = query + "+AND+_exists_:" + x
# for y in list_features_specific:
#     query = query + "+AND+" + y
device_name = list_device_names[7]
name_query = '+AND+device.openfda.device_name:' + device_name ;

q1 = baseurl + 'api_key=' + apikey + '&' + query + '&' + 'limit=' + str(limit)
q2 = baseurl + 'api_key=' + apikey + '&' + query + name_query + '&' +'limit=' + str(limit)

In [5]:
dq = requests.get(q2)
# dq1.json()['results']
data = json.loads(dq.text)
number = data['meta']['results']['total'] # check number of matching entries
results = data['results']
number

14269

In [5]:
# Can also spare some structuring effort when loading data by calling normalization method
# dftest = pd.io.json.json_normalize(results)

In [6]:
results[3]

{'adverse_event_flag': 'N',
 'date_manufacturer_received': '20150326',
 'date_of_event': '20150326',
 'date_received': '20150406',
 'date_report': '20150326',
 'device': [{'brand_name': 'XACT CAROTID STENT SYSTEM',
   'catalog_number': '82097-01',
   'date_received': '20150406',
   'date_removed_flag': '',
   'device_age_text': '',
   'device_availability': 'No',
   'device_evaluated_by_manufacturer': 'R',
   'device_event_key': '',
   'device_operator': 'HEALTH PROFESSIONAL',
   'device_report_product_code': 'NIM',
   'device_sequence_number': ' 1.0',
   'generic_name': 'CAROTID STENT SYSTEM',
   'implant_flag': '',
   'lot_number': '4101061',
   'manufacturer_d_address_1': 'ABBOTT VASCULAR',
   'manufacturer_d_address_2': '26531 YNEZ ROAD',
   'manufacturer_d_city': 'TEMECULA',
   'manufacturer_d_country': 'US',
   'manufacturer_d_name': 'AV-TEMECULA-CT',
   'manufacturer_d_postal_code': '92591 4628',
   'manufacturer_d_state': 'CA',
   'manufacturer_d_zip_code': '92591',
   'manufac

In [7]:
# Fields of Interest
fois_result = ['device_date_of_manufacturer',
               'date_of_event']
fois_device = [#'generic_name', 
               'expiration_date_of_device', 
               #'device_age_text', 
               #'implant_flag', 
               #'date_removed_flag', \
               'manufacturer_d_name', 
               #'manufacturer_d_state',
               #'manufacturer_d_country'
              ]
fois_patient = [#'sequence_number_outcome',
                #'sequence_number_treatment'
              ]
fois_mdrText = ['text',
                'text_type_code']
fois_openfda = ['device_name',
                #'device_class',
                'medical_specialty_description']

# device = data['results'][0]['device'][0]
device = [x['device'][0] for x in data['results']]
# patient = data['results'][0]['patient'][0]
patient = [x['patient'][0] for x in data['results']]
# mdrText = data['results'][0]['mdr_text'][0] # there may be more items in the list! 
mdrText = [x['mdr_text'] for x in data['results']]
#mdrText = [y['text'] for y in [x['mdr_text'][0] for x in data['results']]]
# openfda = data['results'][0]['device'][0]['openfda']
openfda = [x['device'][0]['openfda'] for x in data['results']]

In [8]:
fillDic = {'mdr_text_key': '', 'patient_sequence_number': '', 'text': np.nan, 'text_type_code': np.nan}
a = [x[0] if len(x) > 0 else fillDic for x in mdrText]
b = [x[1] if len(x) > 1 else fillDic for x in mdrText] # some of them have even three entries....

In [59]:
df_results = pd.DataFrame(results, index = range(len(results)), columns = fois_result)
df_openfda = pd.DataFrame(openfda, index = range(len(results)),columns = fois_openfda)
df_device = pd.DataFrame(device, index = range(len(results)),columns = fois_device)
df_patient = pd.DataFrame(patient, index = range(len(results)),columns = fois_patient)
# df_mdrText = pd.DataFrame(mdrText, index = range(len(results)),columns = fois_mdrText)

# df = pd.concat([df_device, df_patient, df_mdrText, df_openfda], axis = 1)

In [60]:
a = pd.DataFrame(a, index = range(len(results)),columns = fois_mdrText)
b = pd.DataFrame(b, index = range(len(results)),columns = fois_mdrText)
df_mdrText = pd.concat([a, b], axis = 1)
df = pd.concat([df_results, df_device, df_patient, df_mdrText, df_openfda], axis = 1)

In [61]:
df['age_of_device_days'] = pd.to_datetime(df['date_of_event'], format='%Y%m%d') \
- pd.to_datetime(df['device_date_of_manufacturer'], format='%Y%m%d')
df = df.drop(['date_of_event','device_date_of_manufacturer'], axis = 1)


In [62]:
df

Unnamed: 0,expiration_date_of_device,manufacturer_d_name,text,text_type_code,text.1,text_type_code.1,device_name,medical_specialty_description,age_of_device_days
0,20160218.0,MEDTRONIC IRELAND,RESULTS: RELATED TO OPERATIONAL CONTEXT (THE 3...,Additional Manufacturer Narrative,IT IS REPORTED THAT THE PATIENT HAD A STENT PR...,Description of Event or Problem,Coronary Drug-Eluting Stent,Unknown,250 days
1,20160130.0,BOSTON SCIENTIFIC - GALWAY,"SAME CASE AS MDR ID: 2134265-2015-02289, 21342...",Description of Event or Problem,DATE OF BIRTH: 1975. (B)(6). DEVICE IS A COMBI...,Additional Manufacturer Narrative,Coronary Drug-Eluting Stent,Unknown,187 days
2,20160103.0,BOSTON SCIENTIFIC - GALWAY,DATE OF BIRTH: 1975. (B)(6). DEVICE IS A COMBI...,Additional Manufacturer Narrative,"SAME CASE AS MDR ID: 2134265-2015-02270, 21342...",Description of Event or Problem,Coronary Drug-Eluting Stent,Unknown,211 days
3,,AV-TEMECULA-CT,IT WAS REPORTED THAT DURING A PROCEDURE TO TRE...,Description of Event or Problem,(B)(4). IT IS INDICATED THAT THE DEVICE IS NOT...,Additional Manufacturer Narrative,"Stent, Carotid",Unknown,176 days
4,,AV-TEMECULA-CT,(B)(4). THERE WAS NO REPORTED DEVICE MALFUNCTI...,Additional Manufacturer Narrative,IT WAS REPORTED THAT THE PROCEDURE WAS TO TREA...,Description of Event or Problem,Coronary Drug-Eluting Stent,Unknown,207 days
5,,AV-TEMECULA-CT,(B)(4). THERE WAS NO REPORTED DEVICE MALFUNCTI...,Additional Manufacturer Narrative,IT WAS REPORTED THAT THE PROCEDURE WAS TO TREA...,Description of Event or Problem,Coronary Drug-Eluting Stent,Unknown,147 days
6,20161007.0,AV-TEMECULA-CT,(B)(4). CONCOMITANT PRODUCTS: GUIDE WIRE: SION...,Additional Manufacturer Narrative,IT WAS REPORTED THAT THE PROCEDURE WAS TO TREA...,Description of Event or Problem,Coronary Drug-Eluting Stent,Unknown,181 days
7,20141110.0,BOSTON SCIENTIFIC - GALWAY,SAME CASE AS MDR ID: 2134265-2015-02007 AND 21...,Description of Event or Problem,DEVICE IS A COMBINATION PRODUCT. THE COMPLAINT...,Additional Manufacturer Narrative,Coronary Drug-Eluting Stent,Unknown,483 days
8,20121221.0,BOSTON SCIENTIFIC - MAPLE GROVE,DEVICE IS A COMBINATION PRODUCT. THE COMPLAINT...,Additional Manufacturer Narrative,SAME CASE AS MDR ID: 2134265-2015-02008 AND 21...,Description of Event or Problem,Coronary Drug-Eluting Stent,Unknown,1172 days
9,,ATRIUM MEDICAL CORP.,WE ARE AWAITING THE RETURN OF THE DEVICE FOR I...,Additional Manufacturer Narrative,STENT DISLODGED FROM BALLOON UPON ACCESS THROU...,Description of Event or Problem,"Stent, Renal",Unknown,116 days


In [13]:
# file_name = device_name + '_mdrTextClasses.csv'
# # cols_to_write = ['text', 'text_type_code', 'device_name',
# #        'medical_specialty_description', 'age_of_device_days']
# df.to_csv(file_name, mode = 'w', encoding='utf-8')

### Factorize selected columns
This will convert nominal (string) entries into categorical.
Later we may want to rename duplicate columns.

In [63]:
# Columns that we want to translate into categories
factCols = ['manufacturer_d_name',
       'text_type_code', 'device_name',
       'medical_specialty_description']

# This works but will not assign consistent labeling across multiple columns
# df2 = df[factCols].apply(lambda x: pd.factorize(x)[0])

In [64]:
# Columns that we want to translate into categories, inculde duplicates
factCols2 = ['manufacturer_d_name',
       'text_type_code', 'text_type_code', 'device_name',
       'medical_specialty_description', ]

In [65]:
# http://stackoverflow.com/questions/39390160/pandas-factorize-on-an-entire-data-frame
def categorise(df):
    categories = {k: v for v, k in enumerate(df.stack().unique())}
    return df.replace(categories)

In [66]:
df[factCols2] = categorise(df[factCols])

In [72]:
# http://stackoverflow.com/questions/28910851/python-pandas-changing-some-column-types-to-categories
# http://pandas.pydata.org/pandas-docs/stable/categorical.html

# df[factCols2].astype('category')
df[factCols2] = df[factCols].apply(lambda x: x.astype('category'))

Later we may want to adress the issue of multiple columns and rather random numbering of categories. Both are not currently crucial.