### Part I: Research Question

**A. Describe the purpose of this data mining report by doing the following:**

**1. Propose one question relevant to a real-world organizational situation that you will answer using market basket analysis.**

A critical component in patient relationship management is understanding patients and the services the patients receive. When a hospital is able to better understand its patients’ characteristics, they are able to target treatment to patients, resulting in more effective cost of care for the hospital in the long term.

In this analysis, the research questions is **are there any medications that are prescribed or coupled together?** This analysis will use market basket analysis to analyze patient data to identify associated medical prescriptions of the hospital's patients, ultimately allowing better business and strategic decision-making for the hospital.

**2. Define one goal of the data analysis. Ensure that your goal is reasonable within the scope of the scenario and is represented in the available data.**

The goal of this analysis is to discover relationships or associations between medications that are frequent together or in a "shopping basket". 

### Part II: Market Basket Justification

**B. Explain the reasons for using market basket analysis by doing the following:**

**1. Explain how market basket analyzes the selected dataset. Include expected outcomes.**

Market basket analysis can be applied to the hospital data to gain insights into patient medication patterns and to inform clinical decision-making. The primary goal of the market basket analysis is to identify whether there are any association rules amongst the medical prescription dataset.

The market basket analysis uses the Apriori algorithm to analyze the dataset. The Apriori algorithm works by scanning a transaction dataset to discover frequent itemsets, which are subsets of items that appear together. In our example, the Apriori algorithm will be analyzing the medical dataset by scanning and discovering which medications are frequently shopped together.

An expected outcome of this analysis is to be able to identify whether there are any data associations that describe relationships between the variables. Additionally, we are expecting measurable values such as the support, confidence, and lift values to measure how strong our analysis results are.

**2. Provide one example of transactions in the dataset.**

An example of a transaction from the dataset is valsartan and abilify.

**3. Summarize one assumption of market basket analysis.**

One assumption of the market basket analysis is that there is independent purchase behavior. The market basket analysis assumes that the items purchased in a transaction are independent of each other. (McColl, 2022)

### Part III: Data Preparation and Analysis

**C. Prepare and perform market basket analysis by doing the following:**

**1. Transform the dataset to make it suitable for market basket analysis. Include a copy of the cleaned dataset.**

Please see the below for the data transformation required for the Market Basket Analysis. Copy of the cleaned data is attached with the performance assessment.

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

#import the medical df

df = pd.read_csv(r"C:\Users\shabn\Documents\WGU - MSDA\D212\medical_market_basket.csv")

In [2]:
df.shape

(15002, 20)

In [3]:
df.isnull().sum()

Presc01     7501
Presc02     9255
Presc03    10613
Presc04    11657
Presc05    12473
Presc06    13138
Presc07    13633
Presc08    14021
Presc09    14348
Presc10    14607
Presc11    14746
Presc12    14848
Presc13    14915
Presc14    14955
Presc15    14977
Presc16    14994
Presc17    14998
Presc18    14998
Presc19    14999
Presc20    15001
dtype: int64

In [4]:
# let's see what the data looks like
df.head(10)

Unnamed: 0,Presc01,Presc02,Presc03,Presc04,Presc05,Presc06,Presc07,Presc08,Presc09,Presc10,Presc11,Presc12,Presc13,Presc14,Presc15,Presc16,Presc17,Presc18,Presc19,Presc20
0,,,,,,,,,,,,,,,,,,,,
1,amlodipine,albuterol aerosol,allopurinol,pantoprazole,lorazepam,omeprazole,mometasone,fluconozole,gabapentin,pravastatin,cialis,losartan,metoprolol succinate XL,sulfamethoxazole,abilify,spironolactone,albuterol HFA,levofloxacin,promethazine,glipizide
2,,,,,,,,,,,,,,,,,,,,
3,citalopram,benicar,amphetamine salt combo xr,,,,,,,,,,,,,,,,,
4,,,,,,,,,,,,,,,,,,,,
5,enalapril,,,,,,,,,,,,,,,,,,,
6,,,,,,,,,,,,,,,,,,,,
7,paroxetine,allopurinol,,,,,,,,,,,,,,,,,,
8,,,,,,,,,,,,,,,,,,,,
9,abilify,atorvastatin,folic acid,naproxen,losartan,,,,,,,,,,,,,,,


In [5]:
# find na values presc 01
df.Presc01.isna().sum()

7501

In [6]:
len(df)

15002

In [7]:
df = df[df.Presc01.notna()]
df.head()

Unnamed: 0,Presc01,Presc02,Presc03,Presc04,Presc05,Presc06,Presc07,Presc08,Presc09,Presc10,Presc11,Presc12,Presc13,Presc14,Presc15,Presc16,Presc17,Presc18,Presc19,Presc20
1,amlodipine,albuterol aerosol,allopurinol,pantoprazole,lorazepam,omeprazole,mometasone,fluconozole,gabapentin,pravastatin,cialis,losartan,metoprolol succinate XL,sulfamethoxazole,abilify,spironolactone,albuterol HFA,levofloxacin,promethazine,glipizide
3,citalopram,benicar,amphetamine salt combo xr,,,,,,,,,,,,,,,,,
5,enalapril,,,,,,,,,,,,,,,,,,,
7,paroxetine,allopurinol,,,,,,,,,,,,,,,,,,
9,abilify,atorvastatin,folic acid,naproxen,losartan,,,,,,,,,,,,,,,


In [8]:
df.shape

(7501, 20)

In [9]:
# we will need to create a list of lists for this analysis

trans = []
for i in range (0, 7501):
    trans.append([str(df.values[i,j]) for j in range (0, 20)])

In [10]:
# transactionalize dataset to prepare data

TE = TransactionEncoder()
array = TE.fit(trans).transform(trans)

In [11]:
# prepare cleaned data

cleaned_df = pd.DataFrame(array, columns=TE.columns_)
cleaned_df

Unnamed: 0,Duloxetine,Premarin,Yaz,abilify,acetaminophen,actonel,albuterol HFA,albuterol aerosol,alendronate,allopurinol,...,trazodone HCI,triamcinolone Ace topical,triamterene,trimethoprim DS,valaciclovir,valsartan,venlafaxine XR,verapamil SR,viagra,zolpidem
0,False,False,False,True,False,False,True,True,False,True,...,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,True,...,False,False,False,False,False,False,False,False,False,False
4,False,False,False,True,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7496,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
7497,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
7498,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
7499,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


In [12]:
# print columns
for col in cleaned_df.columns:
    print(col)

Duloxetine
Premarin
Yaz
abilify
acetaminophen
actonel
albuterol HFA
albuterol aerosol
alendronate
allopurinol
alprazolam
amitriptyline
amlodipine
amoxicillin
amphetamine
amphetamine salt combo
amphetamine salt combo xr
atenolol
atorvastatin
azithromycin
benazepril
benicar
boniva
bupropion sr
carisoprodol
carvedilol
cefdinir
celebrex
celecoxib
cephalexin
cialis
ciprofloxacin
citalopram
clavulanate K+
clonazepam
clonidine HCI
clopidogrel
clotrimazole
codeine
crestor
cyclobenzaprine
cymbalta
dextroamphetamine XR
diazepam
diclofenac sodium
doxycycline hyclate
enalapril
escitalopram
esomeprazole
ezetimibe
fenofibrate
fexofenadine
finasteride
flovent hfa 110mcg inhaler
fluconozole
fluoxetine HCI
fluticasone
fluticasone nasal spray
folic acid
furosemide
gabapentin
glimepiride
glipizide
glyburide
hydrochlorothiazide
hydrocodone
hydrocortisone 2.5% cream
ibuprophen
isosorbide mononitrate
lansoprazole
lantus
levofloxacin
levothyroxine sodium
lisinopril
lorazepam
losartan
lovastatin
meloxicam
met

In [13]:
#drop empty column

cleaned_df = cleaned_df.drop(['nan'], axis=1)

In [14]:
for col in cleaned_df.columns:
    print(col)

Duloxetine
Premarin
Yaz
abilify
acetaminophen
actonel
albuterol HFA
albuterol aerosol
alendronate
allopurinol
alprazolam
amitriptyline
amlodipine
amoxicillin
amphetamine
amphetamine salt combo
amphetamine salt combo xr
atenolol
atorvastatin
azithromycin
benazepril
benicar
boniva
bupropion sr
carisoprodol
carvedilol
cefdinir
celebrex
celecoxib
cephalexin
cialis
ciprofloxacin
citalopram
clavulanate K+
clonazepam
clonidine HCI
clopidogrel
clotrimazole
codeine
crestor
cyclobenzaprine
cymbalta
dextroamphetamine XR
diazepam
diclofenac sodium
doxycycline hyclate
enalapril
escitalopram
esomeprazole
ezetimibe
fenofibrate
fexofenadine
finasteride
flovent hfa 110mcg inhaler
fluconozole
fluoxetine HCI
fluticasone
fluticasone nasal spray
folic acid
furosemide
gabapentin
glimepiride
glipizide
glyburide
hydrochlorothiazide
hydrocodone
hydrocortisone 2.5% cream
ibuprophen
isosorbide mononitrate
lansoprazole
lantus
levofloxacin
levothyroxine sodium
lisinopril
lorazepam
losartan
lovastatin
meloxicam
met

In [15]:
cleaned_df.shape

(7501, 119)

In [16]:
cleaned_df.to_csv('market_basket_cleaned.csv', index = False)

**2. Execute the code used to generate association rules with the Apriori algorithm. Provide screenshots that demonstrate the error-free functionality of the code.**

Please see the code below for the Apriori algorithm and the error free code.

In [17]:
len(cleaned_df)

7501

In [18]:
frequent_med = apriori(cleaned_df, min_support = 0.005, max_len = 2, use_colnames = True)

len(frequent_med)

552

In [19]:
frequent_med['length'] = frequent_med['itemsets'].apply(lambda x: len(x))
frequent_med.head()

Unnamed: 0,support,itemsets,length
0,0.011998,(Duloxetine),1
1,0.046794,(Premarin),1
2,0.238368,(abilify),1
3,0.015731,(acetaminophen),1
4,0.011998,(actonel),1


In [20]:
rules = association_rules(frequent_med, metric="confidence", min_threshold=0.40)
rules = rules.sort_values(by = 'confidence', ascending = False)
rules.describe()

Unnamed: 0,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
count,11.0,11.0,11.0,11.0,11.0,11.0,11.0,11.0
mean,0.032796,0.232527,0.014022,0.43691,1.895332,0.006287,1.366972,0.480622
std,0.028483,0.019375,0.011836,0.031215,0.252447,0.005023,0.086197,0.057737
min,0.010399,0.17411,0.005066,0.401254,1.683336,0.002364,1.272045,0.418306
25%,0.014131,0.238368,0.005999,0.413951,1.736601,0.003062,1.299629,0.432704
50%,0.018531,0.238368,0.007732,0.419028,1.757904,0.003614,1.310962,0.474369
75%,0.046527,0.238368,0.020064,0.463275,1.988233,0.008973,1.447858,0.505198
max,0.098254,0.238368,0.040928,0.487179,2.546642,0.017507,1.485182,0.616032


**3. Provide values for the support, lift, and confidence of the association rules table.**

Please see the code below for the support, lift, and confidence of the association rules table.

In [21]:
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
9,(valsartan),(abilify),0.010399,0.238368,0.005066,0.487179,2.043811,0.002587,1.485182,0.516084
0,(Duloxetine),(abilify),0.011998,0.238368,0.005733,0.477778,2.004369,0.002873,1.458444,0.507175
5,(salmeterol inhaler),(abilify),0.015598,0.238368,0.007332,0.470085,1.972098,0.003614,1.437273,0.500736
3,(metformin),(abilify),0.050527,0.238368,0.023064,0.456464,1.914955,0.01102,1.401255,0.503221
10,(potassium Chloride),(carvedilol),0.014131,0.17411,0.006266,0.443396,2.546642,0.003805,1.483802,0.616032
1,(glipizide),(abilify),0.065858,0.238368,0.027596,0.419028,1.757904,0.011898,1.310962,0.461536
7,(temezepam),(abilify),0.018531,0.238368,0.007732,0.417266,1.750511,0.003315,1.306998,0.436833
2,(lisinopril),(abilify),0.098254,0.238368,0.040928,0.416554,1.747522,0.017507,1.305401,0.474369
8,(trimethoprim DS),(abilify),0.018797,0.238368,0.007732,0.411348,1.725681,0.003252,1.293856,0.428575
4,(potassium Chloride),(abilify),0.014131,0.238368,0.005733,0.40566,1.701822,0.002364,1.281476,0.418306


**4. Identify the top three rules generated by the Apriori algorithm. Include a screenshot of the top rules along with their summaries.**

The top three rules and summary are found in the following:

In [22]:
rules.head(3)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
9,(valsartan),(abilify),0.010399,0.238368,0.005066,0.487179,2.043811,0.002587,1.485182,0.516084
0,(Duloxetine),(abilify),0.011998,0.238368,0.005733,0.477778,2.004369,0.002873,1.458444,0.507175
5,(salmeterol inhaler),(abilify),0.015598,0.238368,0.007332,0.470085,1.972098,0.003614,1.437273,0.500736


### Part IV: Data Summary and Implications

**D. Summarize your data analysis by doing the following:**

**1. Summarize the significance of support, lift, and confidence from the results of the analysis.**

* **Support**: Support measures the frequency of occurrence of an itemset in the transactions. It indicates how popular a particular combination of items is among all the transactions. In the instance of our data and analysis, our top three medications have support values below 0.01 which is a low value. This means that the medications are not "shopped" or pair together frequently.

* **Confidence**: Confidence measures the conditional probability that an item Y is purchased when item X is purchased. It tells us how likely item Y is to be bought when item X is already in the basket. Confidence is measured from 0 to 1, 1 indicating the strongest confidence. In our analysis, we see that our top items have a confidence of 0.49, 0.48, and 0.47 respectively. Our confidence in this analysis is low (below 0.5), indicating that there is not a high probability of items being purchased together.

* **Lift**: Lift measures the strength of the association between items X and Y compared to their individual occurrence. A lift greater than 1 indicates a positive association, meaning the items are likely to be purchased together more often than by chance. The lift values in this analysis are 2.04, 2.00, and 1.97 respectively. Using our definition, the top 3 rules in our analysis has lifts greater than 1 indicating a positive association (meaning that they are likely to be purchased together more often).

(McColl, 2022)

**2. Discuss the practical significance of the findings from the analysis.**

As mentioned above, the results of our analysis are the following:

* The top three rules have support values below 0.01 which is a low value. This means that the medications are not "shopped" or pair together frequently.
* The top items have a confidence of 0.49, 0.48, and 0.47 respectively. Our confidence in this analysis is low (below 0.5), indicating that there is not a high probability of items being prescribed together.
* The top 3 rules in our analysis has lifts greater than 1 indicating a positive association (meaning that they are likely to be purchased together more often).


**3. Recommend a course of action for the real-world organizational situation from part A1 based on your results from part D1.**

From our analysis above, we see that there was not a strong relationship between the medications from the medical prescription dataset. We see that there is a weak support value (indicating that there is not a high frequency of the top three combination medications) and that there is a weak confidence (less than 50% chance of the medications being prescribed together).

This is not necessarily a negative consequence for the hospital. For the hospital to not frequently prescribe multiple medications can indicate that the hospital is prescribing single medications that alleviate the patient's symptoms. The course of action that I would recommend the hospital to continue their methods of medication prescription as it seems that there are no negative consequences of medications not being "shopped" together. It also benefits their patients financially and health wise as less medications can reduce medication side effects.

### Part V: Attachments

**E. Provide a Panopto video recording that includes a demonstration of the functionality of the code used for the analysis and a summary of the programming environment.**
 
Panopto recording: https://wgu.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=a4cc5b32-c2d8-4c9e-89a2-b0590045628b

**F. Record all web sources used to acquire data or segments of third-party code to support the application. Ensure the web sources are reliable.**
 

**G. Acknowledge sources, using in-text citations and references, for content that is quoted, paraphrased, or summarized.**

McColl, Lynsey. “Market Basket Analysis: Understanding Customer Behaviour.” Select Statistical Consultants, 1 Mar. 2022, select-statistics.co.uk/blog/market-basket-analysis-understanding-customer-behaviour/. 

**H. Demonstrate professional communication in the content and presentation of your submission.**