# D212 - Data Mining II Performance Assessment Task 3

Assignment Completed by Favio Conde and Submitted August 15, 2023 for WGU - MSDA
</br>

### Table of Contents

#### Part I: Research Question
<ul>
    <li><a href='#a1'>A1: Proposal of Question</a></li>
    <li><a href='#a2'>A2: Defined Goal</a></li>
</ul>
 
#### Part II: Market Basket Justification
<ul>
    <li><a href='#b1'>B1. Explanation of Market Basket</a></li>
    <li><a href='#b2'>B2. Transaction Example</a></li>
    <li><a href='#b3'>B3. Market Basket Assumption</a></li>
</ul> 

#### Part III: Data Preparation and Analysis
<ul>
    <li><a href='#c1'>C1: Transforming The Dataset</a></li>
    <li><a href='#c2'>C2: Code Execution</a></li>
    <li><a href='#c3'>C3: Association Rule Table</a></li><li><a href='#c2'>C2: Code Execution</a></li>
    <li><a href='#c4'>C4: Top Three Rules</a></li>
</ul>

#### Part IV: Data Summary and Implications
<ul>
    <li><a href='#d1'>D1: Significance of Support, Lift, and Confidence Summary</a></li>
    <li><a href='#d2'>D2: Practical Significance of Findings</a></li>
    <li><a href='#d3'>D3: Course of Action</a></li>
</ul>

#### Part V: Attachments
<ul>
    <li><a href='#e'>E: Panopto Recording</a></li>
    <li><a href='#f'>F. Third-Party Code Reference</a></li>
    <li><a href='#g'>G. Sources</a></li>
</ul>

### PART I: RESEARCH QUESTION

#### A1. Proposal of Question<a id='a1'></a>

Which medication is likely prescribed when the patient takes Abilify?

#### A2. Defined Goal<a id='a2'></a>

My analysis aims to determine which prescription medications are likely prescribed when the patient is on Abilify.  The hospital seeks to predict readmission rates in patients better.  Therefore, analyzing prescription medication and finding what combination of prescriptions readmitted patients are likely to take may provide insight into when a patient is more likely to be readmitted.

### PART II: MARKET BASKET JUSTIFICATION

#### B1. Explanation of Market Basket<a id='b1'></a>

A Market Basket analysis considers the dataset and looks for a grouping of items occurring together.  An example would be going to the grocery store and purchasing hotdog buns.  More likely than not, you will also buy ketchup, mustard, and other condiments for the hotdog (Sivek, 2020).  Market Basket analysis model involves analyzing the data to determine which items are often grouped (or, in the case of the medical dataset, prescriptions that are grouped together).  The model works by building association rules, which look at the probability of an item (consequent) appearing in the same transaction as when the antecedent appears. The technique looks at the antecedent and consequent occurring as a percentage of the antecedent appearing in the whole dataset (confidence) or as a percentage of the pair appearing in the entire dataset (support).

The hospital aims to reduce the readmission rate, so analyzing patients' prescriptions may provide insight into medications readmitted patients are likely to take.  The analysis findings could point to medical conditions that readmitted patients are likely to suffer from, which may help the hospital identify patients with a high probability of readmission.

#### B2. Transaction Example<a id='b2'></a>

In [1]:
#importing libraries and packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

In [2]:
#importing CSV file
df = pd.read_csv('Files/medical_market_basket.csv')
df.head()

Unnamed: 0,Presc01,Presc02,Presc03,Presc04,Presc05,Presc06,Presc07,Presc08,Presc09,Presc10,Presc11,Presc12,Presc13,Presc14,Presc15,Presc16,Presc17,Presc18,Presc19,Presc20
0,,,,,,,,,,,,,,,,,,,,
1,amlodipine,albuterol aerosol,allopurinol,pantoprazole,lorazepam,omeprazole,mometasone,fluconozole,gabapentin,pravastatin,cialis,losartan,metoprolol succinate XL,sulfamethoxazole,abilify,spironolactone,albuterol HFA,levofloxacin,promethazine,glipizide
2,,,,,,,,,,,,,,,,,,,,
3,citalopram,benicar,amphetamine salt combo xr,,,,,,,,,,,,,,,,,
4,,,,,,,,,,,,,,,,,,,,


In [3]:
#showing example of transaction
df.iloc[1]

Presc01                 amlodipine
Presc02          albuterol aerosol
Presc03                allopurinol
Presc04               pantoprazole
Presc05                  lorazepam
Presc06                 omeprazole
Presc07                 mometasone
Presc08                fluconozole
Presc09                 gabapentin
Presc10                pravastatin
Presc11                     cialis
Presc12                   losartan
Presc13    metoprolol succinate XL
Presc14           sulfamethoxazole
Presc15                    abilify
Presc16             spironolactone
Presc17              albuterol HFA
Presc18               levofloxacin
Presc19               promethazine
Presc20                  glipizide
Name: 1, dtype: object

#### B3. Market Basket Assumption<a id='b3'></a>

An assumption from Market Basket is that there is significance in items appearing together, that purchasing one thing complements another. For example, using the medical market basket dataset, if prescriptionA and prescriptionB often occur together, patients who take prescriptionA are likely accompanied by prescriptionB.

### PART III: DATA PREPARATION AND ANALYSIS

#### C1. Transforming the Dataset<a id='c1'></a>

In [4]:
#inspecting df
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15002 entries, 0 to 15001
Data columns (total 20 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   Presc01  7501 non-null   object
 1   Presc02  5747 non-null   object
 2   Presc03  4389 non-null   object
 3   Presc04  3345 non-null   object
 4   Presc05  2529 non-null   object
 5   Presc06  1864 non-null   object
 6   Presc07  1369 non-null   object
 7   Presc08  981 non-null    object
 8   Presc09  654 non-null    object
 9   Presc10  395 non-null    object
 10  Presc11  256 non-null    object
 11  Presc12  154 non-null    object
 12  Presc13  87 non-null     object
 13  Presc14  47 non-null     object
 14  Presc15  25 non-null     object
 15  Presc16  8 non-null      object
 16  Presc17  4 non-null      object
 17  Presc18  4 non-null      object
 18  Presc19  3 non-null      object
 19  Presc20  1 non-null      object
dtypes: object(20)
memory usage: 2.3+ MB


In [5]:
#checking dimensionos of dataset
df.shape

(15002, 20)

In [6]:
#dropping every other row since they are nulls
df = df.iloc[1::2].reset_index(drop=True)

In [7]:
#checking that every other row dropped
df.head(10)

Unnamed: 0,Presc01,Presc02,Presc03,Presc04,Presc05,Presc06,Presc07,Presc08,Presc09,Presc10,Presc11,Presc12,Presc13,Presc14,Presc15,Presc16,Presc17,Presc18,Presc19,Presc20
0,amlodipine,albuterol aerosol,allopurinol,pantoprazole,lorazepam,omeprazole,mometasone,fluconozole,gabapentin,pravastatin,cialis,losartan,metoprolol succinate XL,sulfamethoxazole,abilify,spironolactone,albuterol HFA,levofloxacin,promethazine,glipizide
1,citalopram,benicar,amphetamine salt combo xr,,,,,,,,,,,,,,,,,
2,enalapril,,,,,,,,,,,,,,,,,,,
3,paroxetine,allopurinol,,,,,,,,,,,,,,,,,,
4,abilify,atorvastatin,folic acid,naproxen,losartan,,,,,,,,,,,,,,,
5,cialis,,,,,,,,,,,,,,,,,,,
6,hydrochlorothiazide,glyburide,,,,,,,,,,,,,,,,,,
7,metformin,salmeterol inhaler,sertraline HCI,,,,,,,,,,,,,,,,,
8,metoprolol,carvedilol,losartan,,,,,,,,,,,,,,,,,
9,glyburide,,,,,,,,,,,,,,,,,,,


In [8]:
#checking dataset dimensions after dropping every other row
df.shape

(7501, 20)

In [9]:
#creating list of lists
trans = []
for i in range(len(df)):
    trans.append([str(df.values[i,j]) for j in range(len(df.columns))])

In [10]:
#set transaction encoder
TE = TransactionEncoder()
array = TE.fit(trans).transform(trans)

In [11]:
#converting dataset back to dataframe
df_cleaned = pd.DataFrame(array, columns = TE.columns_)
df_cleaned

Unnamed: 0,Duloxetine,Premarin,Yaz,abilify,acetaminophen,actonel,albuterol HFA,albuterol aerosol,alendronate,allopurinol,...,trazodone HCI,triamcinolone Ace topical,triamterene,trimethoprim DS,valaciclovir,valsartan,venlafaxine XR,verapamil SR,viagra,zolpidem
0,False,False,False,True,False,False,True,True,False,True,...,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,True,...,False,False,False,False,False,False,False,False,False,False
4,False,False,False,True,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7496,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
7497,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
7498,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
7499,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


In [12]:
#printing columns
for col in df_cleaned.columns:
    print(col)

Duloxetine
Premarin
Yaz
abilify
acetaminophen
actonel
albuterol HFA
albuterol aerosol
alendronate
allopurinol
alprazolam
amitriptyline
amlodipine
amoxicillin
amphetamine
amphetamine salt combo
amphetamine salt combo xr
atenolol
atorvastatin
azithromycin
benazepril
benicar
boniva
bupropion sr
carisoprodol
carvedilol
cefdinir
celebrex
celecoxib
cephalexin
cialis
ciprofloxacin
citalopram
clavulanate K+
clonazepam
clonidine HCI
clopidogrel
clotrimazole
codeine
crestor
cyclobenzaprine
cymbalta
dextroamphetamine XR
diazepam
diclofenac sodium
doxycycline hyclate
enalapril
escitalopram
esomeprazole
ezetimibe
fenofibrate
fexofenadine
finasteride
flovent hfa 110mcg inhaler
fluconozole
fluoxetine HCI
fluticasone
fluticasone nasal spray
folic acid
furosemide
gabapentin
glimepiride
glipizide
glyburide
hydrochlorothiazide
hydrocodone
hydrocortisone 2.5% cream
ibuprophen
isosorbide mononitrate
lansoprazole
lantus
levofloxacin
levothyroxine sodium
lisinopril
lorazepam
losartan
lovastatin
meloxicam
met

In [13]:
#checking dimensions before drop
df_cleaned.shape

(7501, 120)

In [14]:
#dropping nan column
df_cleaned = df_cleaned.drop(['nan'], axis=1)

In [15]:
#confirming 'nan' column dropped
df_cleaned.shape

(7501, 119)

In [16]:
#exporting cleaned data to CSV
df_cleaned.to_csv('medical_cleaned.csv', index=False)

#### C2. Code Execution<a id='c2'></a>

In [17]:
#creating Apriori function
a_rules = apriori(df_cleaned, min_support = 0.05, use_colnames = True)
a_rules.head(10)

Unnamed: 0,support,itemsets
0,0.238368,(abilify)
1,0.079323,(alprazolam)
2,0.071457,(amlodipine)
3,0.068391,(amphetamine salt combo)
4,0.179709,(amphetamine salt combo xr)
5,0.129583,(atorvastatin)
6,0.17411,(carvedilol)
7,0.076523,(cialis)
8,0.087188,(citalopram)
9,0.059992,(clopidogrel)


#### C3. Association Rules Table<a id='c3'></a>

In [18]:
ass_r = association_rules(a_rules, metric = 'lift', min_threshold = 1)
ass_r

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(abilify),(amphetamine salt combo xr),0.238368,0.179709,0.050927,0.213647,1.188845,0.00809,1.043158,0.208562
1,(amphetamine salt combo xr),(abilify),0.179709,0.238368,0.050927,0.283383,1.188845,0.00809,1.062815,0.193648
2,(abilify),(carvedilol),0.238368,0.17411,0.059725,0.250559,1.439085,0.018223,1.102008,0.400606
3,(carvedilol),(abilify),0.17411,0.238368,0.059725,0.343032,1.439085,0.018223,1.159314,0.369437
4,(abilify),(diazepam),0.238368,0.163845,0.05266,0.220917,1.348332,0.013604,1.073256,0.339197
5,(diazepam),(abilify),0.163845,0.238368,0.05266,0.3214,1.348332,0.013604,1.122357,0.308965


#### C4. Top Three Rules<a id='c4'></a>

To get the top three rules, I set the lift at greater or equal to 1.3, and confidence greater or equal to 0.25.  Lift is an indication of the prescription under `consequents` occurring when the prescription under `antecedents` occurs.  Confidence looks at the percent of all transactions where both the prescription under `antecedents` and `consequents` are both prescribed to the patient.  I sorted the three rules in descending order by lift.

In [19]:
ass_r[(ass_r['lift'] >= 1.3) & (ass_r['confidence'] >= 0.25)].sort_values(by=['lift'], ascending=False)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
2,(abilify),(carvedilol),0.238368,0.17411,0.059725,0.250559,1.439085,0.018223,1.102008,0.400606
3,(carvedilol),(abilify),0.17411,0.238368,0.059725,0.343032,1.439085,0.018223,1.159314,0.369437
5,(diazepam),(abilify),0.163845,0.238368,0.05266,0.3214,1.348332,0.013604,1.122357,0.308965


### PART IV: DATA SUMMARY AND IMPLICATIONS

#### D1. Significance of Support, Lift and Confidence Summary<a id='d1'></a>

In my analysis, I sought to identify which medications are likely prescribed to patients when they are on Abilify.  To investigate, I will retreive transactions where Abilify is either an antecedent or consquent.

In [20]:
analysis = ass_r[(ass_r['antecedents'] == {'abilify'}) | (ass_r['consequents'] == {'abilify'})]
analysis

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(abilify),(amphetamine salt combo xr),0.238368,0.179709,0.050927,0.213647,1.188845,0.00809,1.043158,0.208562
1,(amphetamine salt combo xr),(abilify),0.179709,0.238368,0.050927,0.283383,1.188845,0.00809,1.062815,0.193648
2,(abilify),(carvedilol),0.238368,0.17411,0.059725,0.250559,1.439085,0.018223,1.102008,0.400606
3,(carvedilol),(abilify),0.17411,0.238368,0.059725,0.343032,1.439085,0.018223,1.159314,0.369437
4,(abilify),(diazepam),0.238368,0.163845,0.05266,0.220917,1.348332,0.013604,1.073256,0.339197
5,(diazepam),(abilify),0.163845,0.238368,0.05266,0.3214,1.348332,0.013604,1.122357,0.308965


Based on the results, when a patient is prescribed Abilify, they likely also taking one of the following:
<ul>
    <li>Amphetamine Salt Combo XR</li>
    <li>Carvedilol</li>
    <li>Diazepam</li>
</ul>

Support is the frequency that both the antecedent and consequent appear as a percentage of the total number of transactions in the dataset.  Assuming there's no relationship between the antecedent and consequent, lift shows by how much that expectation is exceeded.  A number greater than 1 indicates that the antecedent is increasing the likelihood of the consequent also occuring in the transaction.  Confidence looks at the antecedent and consequent occurring as a percentage of the total times the antecedent occurs in the dataset.  


Below is a summary of Abilify and each of the prescriptions in the list above.

<u>Abilify and Amphetamine Salt Combo XR Results:</u>
<ul>
    <li>Support: nearly 5.1% of all transactions contain both of these medications</li>
    <li>Lift: 1.2 tells us that Abilify is increasing the likelihood that Amphetamine Salt Combo XR will appear in the same transaction.</li>
    <li>Confidence: both prescriptions appear 21.4% of the time when Abilify appears, and both medications appear 28.3% of the time that Amphetamine Salt Combo XR appears.</li>
</ul>

<u>Abilify and Carvedilol Results:</u>
<ul>
    <li>Support: nearly 6.0% of all transactions contain both of these medications</li>
    <li>Lift: 1.4 tells us that Abilify is increasing the likelihood that Carvedilol will appear in the same transaction.</li>
    <li>Confidence:both of prescriptions appear 25.1% of the time when Abilify appears, and both medications appear 34.3% of the time that Carvedilol appears.</li>
</ul>

<u>Abilify and Diazepam Results:</u>
<ul>
    <li>Support: nearly 5.2% of all transactions contain both of these medications</li>
    <li>Lift: 1.3 tells us that Abilify is increasing the likelihood that Diazepam will appear in the same transaction.</li>
    <li>Confidence: both prescriptions appear 22.1% of the time when Abilify appears, and both medications appear 32.1% of the time that Diazepam appears.</li>
</ul>

#### D2. Practical Significance of Findings<a id='d2'></a>

Given that Abilify appears in almost 25% of the dataset, it is very likely that readmitted patients are prescribed this medication, along with one of the other three prescriptions identified in the market basket analysis.  However, more investigation is needed to determine better if the hospital can use the results of this analysis to predict readmission.

Hypertension has been a likely predictor of readmission in my past analysis, and doctors prescribe Carvedilol to treat heart failure and hypertension, so this medication is likely common amongst readmitted patients.

#### D3. Course of Action<a id='d3'></a>

I recommend further analysis to determine what combination of prescription medications readmitted patients are likely to take.  By identifying groups of patients with a higher probability of readmission, the hospital can explore ways to decrease the likelihood that the patient is readmitted.

### PART V: ATTACHMENTS

#### E. Panopto Recording<a id='e'></a>

<a href='https://wgu.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=0c425e77-b404-4ce3-9c70-b05f003b0eaa'>Video Link</a>

#### F. Third-Party Code References<a id='f'></a>

Kamara, Dr. Kesselly (n.d.). <i>Data Mining II - D212</i> [Webinar].
    </br>&emsp;&emsp;Western Governors University. https://wgu.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=dbe89ddb-e92f-4d40-a87a-af030178abf1

#### G. Sources<a id='g'></a>

Abilify (n.d.).  <i>Abilify</i>.  Abilify.
    </br>&emsp;&emsp;Retrieved August 14, 2023, from https://www.abilify.com/

Sinha, MD, Sanjaii (2022, November 24).  <i>Carvedilol</i>.  Drugs.
    </br>&emsp;&emsp;Retrieved August 14, 2023, from https://www.drugs.com/carvedilol.html#:~:text=Carvedilol%20is%20a%20beta%2Dblocker,not%20to%20pump%20as%20well.

Sivek, Ph.D, Susan Currie (2020, November 16).  <i>Market Basket Analysis 101: Key Concepts</i>.  Towards Data Science.
    </br>&emsp;&emsp;Retrieved August 14, 2023, from https://towardsdatascience.com/market-basket-analysis-101-key-concepts-1ddc6876cd00