## 1. Introduction
Medical advancements have made great strides in the last century. With the new understanding of materials science, implantable medical devices have become so popular that many people depend on them to live their lives.

The United States holds the world's most significant medical device market, with sales making up 40% of worldwide revenue. Roughly 32 million Americans, about 10 percent, have an implanted medical device in them. However, several patients have reported harmful effects caused by implantable medical devices in the last few years. 

The U.S. Food & Drug Administration (FDA) agency protects the public health by assuring these devices' safety, efficacy, and security. According to data from the FDA, medical devices have been linked to more than 80,000 deaths and 1.7 million injuries in the last decade. The FDA collects data voluntarily from manufacturers, doctors, and patients, which often leads to incomplete reporting. The American Database for Medical Implant Transparency, (ADMIT) https://dataverse.harvard.edu/dataverse/admit, was created to close the gap between different government sources that contained information about implantable medical devices approved by the FDA.


## 2. Methodology
This project analyzes a dataset from the American Database for Medical Implant Transparency (ADMIT). With over 300k records of implantable devices, this data set contains information about the type of device, classification, recalls, injuries, or deaths caused, among other data. 

This analysis aims to answer these questions:

- What FDA panel reviews the most implantable medical device applications?


- What companies produce the most devices?
    - What medical industry do they target?
    
    
- How does the FDA classify these devices?
    - How does that affect approval?    
    
    
- What device class have the most recalls, malfunctions, injuries, deaths, and adverse events reported?
 
 
- How does a clinical trial impact study sponsorship?  
    - Is there a correlation between clinical trials and total adverse events reported?

## 3. Analysis

In [54]:
#import libs
import pandas as pd
import numpy as np
import chart_studio.plotly as py
import plotly.express as px
# %matlotlib inline?


In [4]:
#read data set
df=pd.read_excel('ADMIT.xlsx')

In [172]:
# subset data set by complelling features
df1 = df[['company_name','brand_name','model_number','med_specialty','premarket_submissions_number', \
          'device_class','recall','malfunction','injury','death','totalAE','has_clinicalTrial',\
          'Study Sponsor','c_TotalAE']].copy()
# Names have a mix of upper and lower cases. Chnage all names to lower case before analysis
df1['company_name'] = df1['company_name'].str.lower()
df1.head()

Unnamed: 0,company_name,brand_name,model_number,med_specialty,premarket_submissions_number,device_class,recall,malfunction,injury,death,totalAE,has_clinicalTrial,Study Sponsor,c_TotalAE
0,"alphatec spine, inc.",IdentiTi,122-11122505-S,Orthopedic,K183705,2,0,0,0,0,0,N,,
1,"alphatec spine, inc.",IdentiTi,121-10092215,Orthopedic,K183705,2,0,0,1,0,1,N,,
2,"encore medical, l.p.",DJO SURGICAL,801-05-735,Orthopedic,K170573,2,70,2980,494,19,513,N,,
3,"life spine, inc.",Solstice Occipito-Cervico-Thoracic System,9235-16,Orthopedic,K090343,2,264,3102,1563,344,1907,N,,
4,engage uni llc,Engage Partial Knee System,1-50040-003,Orthopedic,K190439,2,628,34533,9604,92,9696,N,,


#### What FDA panel receives the most implantable medical device applications?

Before a company can market a medical device, it must submit an FDA application for approval. The device should be classified under one of the 16 medical specialty "panels" when submitting the application. The FDA provides specific regulations and requirements for each panel on the FDA website. 
All the device classification panels can be found at: https://www.fda.gov/medical-devices/classify-your-medical-device/device-classification-panels


In [235]:
#Create df1_panel. Group by med_specialty, count devices, sort by number count
df1_panel = df1.groupby('med_specialty').agg(device_count = ('model_number','count'))\
                                        .sort_values('device_count', ascending=False)
#calculate percentage
df1_panel['percentage'] = round(100*df2['device_count']/sum(df2['device_count']),2)
df1_panel

Unnamed: 0_level_0,device_count,percentage
med_specialty,Unnamed: 1_level_1,Unnamed: 2_level_1
Orthopedic,318264,99.47
"General, Plastic Surgery",1130,0.35
Cardiovascular,428,0.13
"Gastroenterology, Urology",78,0.02
Obstetrics/Gynecology,28,0.01
Unknown,20,0.01


In [234]:
#bar plot
fig_panel = px.bar(df1_panel.sort_values('device_count'), 
                   x='device_count',log_x=True,
                   labels={'device_count': 'log(device_count)', 'med_specialty':'Medical Specialty Panel'},
                   title='Applications per Medical Specialty',
                   text='device_count')

fig_panel.show()

#### What companies produce the most devices and what medical industry do they target?

The analysis below shows that the 10 companies that produce the most implantable devices target the Orthopedics market. Moreover, we can see that these top 10 players have various products in order of magnitude greater than other companies.

In [238]:
#Group by company_name, med_specialty, cound number of devices per company_name
df1_company = df1.groupby(['company_name','med_specialty']).agg(number_of_products = ('model_number','count'))\
                                                            .sort_values('number_of_products', ascending=False)
df1_company.head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,number_of_products
company_name,med_specialty,Unnamed: 2_level_1
"nuvasive, inc.",Orthopedic,24737
"gbs commonwealth co.,ltd.",Orthopedic,24616
"medtronic sofamor danek, inc.",Orthopedic,19318
"globus medical, inc.",Orthopedic,17850
"biomet orthopedics, llc",Orthopedic,12213
"zimmer, inc.",Orthopedic,12207
"alphatec spine, inc.",Orthopedic,12117
"smith & nephew, inc.",Orthopedic,10971
"l&k biomed co. ,ltd.",Orthopedic,9292
seaspine orthopedics corporation,Orthopedic,8948


In [239]:
#treemap plot shows difference in manufactured products per company per specialty
fig_company = px.treemap(df1_company.reset_index(), 
                         path=['med_specialty','company_name'], 
                         values='number_of_products',
                         color='number_of_products',
                         title='Major Producers of Implantable Medical Devices')
fig_company.show()

- To compare the top 5 players in each specialty area, I grouped by medical_specialty, selected the top 5, and sorted by medical specialty name. By studying this data frame, we can see the striking difference in the number of implantable devices produced by the Orthopedic industry compared to the others.

In [266]:
#top 5 players per specialty area                                                     
df1_company_sort = df1_company.groupby('med_specialty').head(5)\
                                  .sort_values(['med_specialty', 'number_of_products'], ascending=[False,False])\
                                  .reset_index()
df1_company_sort = df1_company_sort[(df1_company_sort['med_specialty'] != 'Unknown')].reset_index(drop=True)
df1_company_sort

Unnamed: 0,company_name,med_specialty,number_of_products
0,"nuvasive, inc.",Orthopedic,24737
1,"gbs commonwealth co.,ltd.",Orthopedic,24616
2,"medtronic sofamor danek, inc.",Orthopedic,19318
3,"globus medical, inc.",Orthopedic,17850
4,"biomet orthopedics, llc",Orthopedic,12213
5,femcare limited,Obstetrics/Gynecology,18
6,"gyrus acmi, inc.",Obstetrics/Gynecology,5
7,bayer healthcare llc,Obstetrics/Gynecology,3
8,richard wolf medical instruments corp.,Obstetrics/Gynecology,1
9,pop medical solutions ltd,Obstetrics/Gynecology,1


In [267]:
#treemap
fig_company = px.treemap(df1_company_sort.reset_index(), 
                         path=['med_specialty','company_name'], 
                         values='number_of_products',
                         color='number_of_products',
                         title='Major Producers of Implantable Medical Devices')
fig_company.show()

- What is the medical industry that accounts for the most adverse events reported?
    - What companies have the most recalls?

In [8]:
df1.groupby('company_name').agg({'model_number':'count','recall': 'sum','malfunction':'sum',\
                                 'injury':'sum','death':'sum','totalAE':'sum'})\
                           .sort_values('recall', ascending=False)

Unnamed: 0_level_0,model_number,recall,malfunction,injury,death,totalAE
company_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"life spine, inc.",4348,392512,14216607,2949146,216088,3165234
"bk meditech co.,ltd.",3114,391738,19281188,4832406,250928,5083334
"nexxt spine, llc",1672,354437,34344856,4156891,200312,4357203
signature orthopaedics pty ltd,2598,328793,5114073,4171897,210270,4382167
"pioneer surgical technology, inc.",2396,305830,42349341,8073379,254757,8328136
...,...,...,...,...,...,...
g21 srl,2,0,0,0,0,0
"onkos surgical, inc.",158,0,40,307,29,336
"nvision biomedical technologies, inc.",454,0,0,0,15,15
"nuvasive, inc.",24737,0,233,51,0,51


In [9]:
df1.groupby('company_name')


<pandas.core.groupby.generic.DataFrameGroupBy object at 0x0000020A9110C8B0>

- How does the FDA classify these devices?
    - How does that affect aproval?


In [10]:
df1.groupby('device_class').agg({'model_number':'count','injury':'sum','death':'sum'})

Unnamed: 0_level_0,model_number,injury,death
device_class,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2,318342,109445737,3574871
3,1606,616853,17082


- Is there a correlation between clinical trial and total adverse events reported?

In [24]:
# #In devices’
# clinical trials for which information was available regarding the sponsor of the study (N =
# 8842), it was found that every single clinical trial
# (100%; N = 842) was sponsored by the manufacturer and/or company in charge of marketing
# the device. 
df[(df['has_clinicalTrial']=='Y') & (df['Study Sponsor'] == 'Y')].count()

company_name                    842
brand_name                      838
product_code                    842
model_number                    842
med_specialty                   842
device_gender                   842
premarket_submissions_number    842
device_class                    842
recall                          842
malfunction                     842
injury                          842
death                           842
totalAE                         842
has_clinicalTrial               842
Study Sponsor                   842
n                               842
nwomen                          827
rwomen                          827
c_SeriousAE                     617
c_OtherAE                       143
c_TotalAE                       759
dtype: int64

In [27]:
# The average sample size of clinical
# trials was 1,311 participants.
df[df['has_clinicalTrial']=='Y']['n'].mean()

1311.1622846781504

In [29]:
# Regarding adverse effects reported in clinical trials, the total
# rate of adverse effects could be assessed for 954
# (86.49%) devices, for an average of 22.65% of
# participants reporting at least one adverse effect.
df[df['has_clinicalTrial']=='Y']['c_TotalAE'].mean()

22.651446540880467

In [35]:
# #
# However, further examination of this data reveals that 2.54% (N = 28) of these devices’ 
#clinical trials had a total adverse effect rate of 100%
df[(df['has_clinicalTrial']=='Y') & (df['c_SeriousAE']+df['c_OtherAE']==100)].count()/(df[df['has_clinicalTrial']=='Y']).count()

company_name                    0.018132
brand_name                      0.018215
product_code                    0.018132
model_number                    0.018132
med_specialty                   0.018132
device_gender                   0.018132
premarket_submissions_number    0.018132
device_class                    0.018132
recall                          0.018132
malfunction                     0.018132
injury                          0.018132
death                           0.018132
totalAE                         0.018132
has_clinicalTrial               0.018132
Study Sponsor                   0.023753
n                               0.018132
nwomen                          0.018382
rwomen                          0.018382
c_SeriousAE                     0.032206
c_OtherAE                       0.136054
c_TotalAE                       0.020964
dtype: float64

In [46]:
# #For a smaller proportion of devices (13.33%, N
# = 147), it was possible to assess the severity of
# adverse effects reported in clinical trials. The average rate of “serious” adverse effects reported by
# participants was 6.09%, while the average rate
# of “other” adverse effects (i.e. “complications”
# and/or “observations”, excluding death and serious adverse effects) was 10.39%.
df[(df['has_clinicalTrial']=='Y') ]['c_SeriousAE'].mean()

15.56499194847039

In [47]:
df[(df['has_clinicalTrial']=='Y') ]['c_OtherAE'].mean()

10.393061224489799

## 4. Summary

## 5. Future Work