# D212: Data Mining II - Task 3
***

### By: Leng Yang
### Student ID: 012298452
### Date: October 8, 2024
***
<br>
<br>
<br>

## Table of Contents
* [A1. Proposal of Question](#A1)
* [A2. Defined Goal](#A2)
* [B1. Explanation of Market Basket](#B1)
* [B2. Transaction Example](#B2)
* [B3. Market Basket Assumption](#B3)
* [C1. Transforming the Data Set](#C1)
* [C2. Code Execution](#C2)
* [C3. Association Rules Table](#C3)
* [C4. Top Three Rules](#C4)
* [D1. Significance of Support, Lift, and Confidence Summary](#D1)
* [D2. Practical Significance of Findings](#D2)
* [D3. Course of Action](#D3)
* [E - E1. Panopto Video of Code/Programs](#E)
* [F. Sources for Third-Party Code](#F)
* [G. Sources](#G)

<BR>

<BR>

<BR>

<BR>

<BR>

<BR>

## A1. Proposal of Question <a class="anchor" id="A1"></a>

The research question of focus is: Can market basket analysis (MBA) be used to identify typical prescriptions associated with patients?

<BR>

## A2. Defined Goal <a class="anchor" id="A2"></a>

This analysis aims to utilize market basket analysis to identify medications that are typically prescribed together. The analysis can be used with the data set since it contains transaction data of patient prescriptions. By looking at which medications are frequently prescribed together, we can understand the underlying reasons for those prescriptions.

<BR>

## B1. Explanation of Market Basket <a class="anchor" id="B1"></a>

Market basket analysis is a technique used to analyze transaction data, which determines products frequently bought together. The underlying mechanism works by looking at the items within transactions, or the "basket," and defining rules based on how often certain items are purchased together, resulting in what is known as "association rules." Several statistical measures are also calculated to determine the association rules' usefulness.

The expected outcome of this analysis is to determine the top three association rules. These rules will identify which medications are frequently prescribed together.

<BR>

## B2. Transaction Example <a class="anchor" id="B2"></a>

An example of a transaction from the data set is of a patient with a prescription history with three medications: citalopram, benicar, and amphetamine salt combo xr.

<BR>

## B3. Market Basket Assumption <a class="anchor" id="B3"></a>

One assumption of market basket analysis is that the items in the basket appear to do so in a joint occurrence whereby they complement each other within a transaction. That is to say, purchasing one item will lead to purchasing other items (Tian, 2015).

<BR>

## C1. Transforming the Data Set <a class="anchor" id="C1"></a>

The cleaned data set is submitted alongside the report as "D212_Task3_Data.csv."

In [28]:
#Import necessary libraries
import pandas as pd
import numpy as np
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

import warnings
warnings.filterwarnings('ignore')

In [29]:
#Load in data and drop all rows containing only NaN values
df = pd.read_csv('medical_market_basket.csv')
df.dropna(axis=0, how='all', ignore_index=True, inplace=True)
df.head()

Unnamed: 0,Presc01,Presc02,Presc03,Presc04,Presc05,Presc06,Presc07,Presc08,Presc09,Presc10,Presc11,Presc12,Presc13,Presc14,Presc15,Presc16,Presc17,Presc18,Presc19,Presc20
0,amlodipine,albuterol aerosol,allopurinol,pantoprazole,lorazepam,omeprazole,mometasone,fluconozole,gabapentin,pravastatin,cialis,losartan,metoprolol succinate XL,sulfamethoxazole,abilify,spironolactone,albuterol HFA,levofloxacin,promethazine,glipizide
1,citalopram,benicar,amphetamine salt combo xr,,,,,,,,,,,,,,,,,
2,enalapril,,,,,,,,,,,,,,,,,,,
3,paroxetine,allopurinol,,,,,,,,,,,,,,,,,,
4,abilify,atorvastatin,folic acid,naproxen,losartan,,,,,,,,,,,,,,,


In [30]:
#Create list of lists of items
rows = []
for i in range(0, 7501):
    rows.append([str(df.values[i,j]) for j in range(0,20)])
    
#Instantiate encoder and fit and transform data and then place into dataframe, dropping column that contains NaN values
encoder = TransactionEncoder()
te = encoder.fit(rows).transform(rows)
items = pd.DataFrame(te, columns=encoder.columns_)
items.drop(columns='nan', inplace=True)

In [31]:
#Check final shape of dataframe
items.shape

(7501, 119)

In [32]:
#Preview of items
items.head()

Unnamed: 0,Duloxetine,Premarin,Yaz,abilify,acetaminophen,actonel,albuterol HFA,albuterol aerosol,alendronate,allopurinol,...,trazodone HCI,triamcinolone Ace topical,triamterene,trimethoprim DS,valaciclovir,valsartan,venlafaxine XR,verapamil SR,viagra,zolpidem
0,False,False,False,True,False,False,True,True,False,True,...,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,True,...,False,False,False,False,False,False,False,False,False,False
4,False,False,False,True,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


In [33]:
#Export cleaned and prepared data
items.to_csv('D212_Task3_Data.csv', index=False)

<BR>

## C2. Code Execution <a class="anchor" id="C2"></a>

The Apriori algorithm was used to filter only for itemsets frequently seen out of all possible itemsets. The algorithm used a minimum support value of 0.02, representing itemsets in at least 2% of all transactions, or about 150 of the total 7501 transactions.

Once the itemsets table was created, an association rules table was generated using data from that table. The association rules table was generated using 'lift' as a metric, with a minimum threshold of 1.0. Setting this value for the minimum threshold allows for rules that only show a positive association between the antecedent and consequent itemsets, whereby the purchase of the antecedent positively affects the likelihood of the consequent being purchased.

In [37]:
#Apriori algorithm used to generate only frequent itemsets
frequent_itemsets = apriori(items, min_support=0.02, use_colnames=True)
frequent_itemsets

Unnamed: 0,support,itemsets
0,0.046794,(Premarin)
1,0.238368,(abilify)
2,0.020397,(albuterol aerosol)
3,0.033329,(allopurinol)
4,0.079323,(alprazolam)
...,...,...
98,0.023064,"(diazepam, lisinopril)"
99,0.023464,"(diazepam, losartan)"
100,0.022930,"(diazepam, metoprolol)"
101,0.020131,"(glyburide, doxycycline hyclate)"


In [38]:
# Create association rules using minimum lift metric of 1.0
rules = association_rules(frequent_itemsets, metric='lift', min_threshold=1.0)
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(amlodipine),(abilify),0.071457,0.238368,0.023597,0.330224,1.385352,0.006564,1.137144,0.299568
1,(abilify),(amlodipine),0.238368,0.071457,0.023597,0.098993,1.385352,0.006564,1.030562,0.365218
2,(amphetamine salt combo),(abilify),0.068391,0.238368,0.024397,0.356725,1.496530,0.008095,1.183991,0.356144
3,(abilify),(amphetamine salt combo),0.238368,0.068391,0.024397,0.102349,1.496530,0.008095,1.037830,0.435627
4,(amphetamine salt combo xr),(abilify),0.179709,0.238368,0.050927,0.283383,1.188845,0.008090,1.062815,0.193648
...,...,...,...,...,...,...,...,...,...,...
89,(metoprolol),(diazepam),0.095321,0.163845,0.022930,0.240559,1.468215,0.007312,1.101015,0.352502
90,(glyburide),(doxycycline hyclate),0.170911,0.095054,0.020131,0.117785,1.239135,0.003885,1.025766,0.232768
91,(doxycycline hyclate),(glyburide),0.095054,0.170911,0.020131,0.211781,1.239135,0.003885,1.051852,0.213256
92,(glyburide),(losartan),0.170911,0.132116,0.028530,0.166927,1.263488,0.005950,1.041786,0.251529


<BR>

## C3. Association Rules Table <a class="anchor" id="C3"></a>

Below is the association rules table containing the appropriate metrics for support, lift and confidence of each rule.

In [42]:
# Create association rules using minimum lift metric of 1.0
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(amlodipine),(abilify),0.071457,0.238368,0.023597,0.330224,1.385352,0.006564,1.137144,0.299568
1,(abilify),(amlodipine),0.238368,0.071457,0.023597,0.098993,1.385352,0.006564,1.030562,0.365218
2,(amphetamine salt combo),(abilify),0.068391,0.238368,0.024397,0.356725,1.496530,0.008095,1.183991,0.356144
3,(abilify),(amphetamine salt combo),0.238368,0.068391,0.024397,0.102349,1.496530,0.008095,1.037830,0.435627
4,(amphetamine salt combo xr),(abilify),0.179709,0.238368,0.050927,0.283383,1.188845,0.008090,1.062815,0.193648
...,...,...,...,...,...,...,...,...,...,...
89,(metoprolol),(diazepam),0.095321,0.163845,0.022930,0.240559,1.468215,0.007312,1.101015,0.352502
90,(glyburide),(doxycycline hyclate),0.170911,0.095054,0.020131,0.117785,1.239135,0.003885,1.025766,0.232768
91,(doxycycline hyclate),(glyburide),0.095054,0.170911,0.020131,0.211781,1.239135,0.003885,1.051852,0.213256
92,(glyburide),(losartan),0.170911,0.132116,0.028530,0.166927,1.263488,0.005950,1.041786,0.251529


<BR>

## C4. Top Three Rules <a class="anchor" id="C4"></a>

The top three rules were determined using 'confidence' as the primary metric—the metric ranges from zero to one, with the latter implying the most confidence. Confidence explains how likely it is to see that the consequent is purchased, given that the antecedent was purchased (Sivek, 2020). With confidence scores of between 0.4 - 0.45,  it is reasonably probable that Abilify will also be prescribed, given the initial prescriptions of metformin, glipizide, or lisinopril. This is further implicated with lift scores of greater than one, thus indicating the items are positively associated with one another.

In [46]:
rules.sort_values('confidence', ascending=False).head(3)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
31,(metformin),(abilify),0.050527,0.238368,0.023064,0.456464,1.914955,0.01102,1.401255,0.503221
25,(glipizide),(abilify),0.065858,0.238368,0.027596,0.419028,1.757904,0.011898,1.310962,0.461536
29,(lisinopril),(abilify),0.098254,0.238368,0.040928,0.416554,1.747522,0.017507,1.305401,0.474369


<BR>

## D1. Significance of Support, Lift, and Confidence Summary <a class="anchor" id="D1"></a>

Support measures how frequently itemsets appear compared to the total number of transactions, represented as a ratio (IUYasik, 2023). The top three rules show that metformin and abilify appear together in 2.3% of transactions, glipizide and abilify in 2.7% of transactions, and lisinopril and abilify in 4.0% of transactions.

Lift is a measure used to determine if and how two itemsets are associated beyond what would be expected by chance. Values greater than one indicate a positive association in that the antecedent positively increases the likelihood of the consequent being purchased. Values equal to one indicate no association beyond what would be expected by chance. Lastly, values less than one indicate a negative or weak association, suggesting the antecedent harms the likelihood of the consequent being purchased (IUYasik, 2023). All of the top three rules have lift values of greater than one, indicating that the items are positively associated, thus suggesting that the initial prescriptions increase the likelihood of an abilify prescription.

Confidence is a measure of how likely it is seen that the consequent is purchased, given that the antecedent was purchased. The value ranges from zero to one, with the latter implying the most confidence (Sivek, 2020). With confidence scores of between 0.4 - 0.45, it is reasonably probable that Abilify will also be prescribed, given the initial prescriptions of metformin, glipizide, or lisinopril.

<BR>

## D2. Practical Significance of Findings <a class="anchor" id="D2"></a>

Although each initial medication and Abilify comprise about 9% of all transactions, the more significant finding involves a prescription containing Abilify, which appears in 24% of transactions. The initial prescriptions include both metformin and glipizide, which are medications used to treat diabetes (Mayo Foundation for Medical Education and Research, 2024), and lisinopril, used for the treatment of high blood pressure (drugs.com, n.d.). The subsequent medication, Abilify, is a medication used in the treatment of mental health disorders such as schizophrenia and bipolar disorder (Cleveland Clinic, 2024). While there are no clear patterns between the antecedent and the consequent prescriptions, the more significant finding of the frequency of Abilify prescriptions is rather striking. This finding could indicate that the patient population may be suffering from a mental health crisis.

<BR>

## D3. Course of Action <a class="anchor" id="D3"></a>

An underlying reason for the pattern of an Abilify prescription after a prescription of metformin, glipizide, or lisinopril is not inherently present at a glance, so further research is suggested in this regard. Furthermore, the number of transactions containing an Abilify prescription could indicate some mental health crisis within the patient population. Further research into the causes of these mental health issues could prove helpful. Additionally, placing more resources into mental health care could also be beneficial.

<BR>

## E - E1. Panopto Video of Code/Programs <a class="anchor" id="E"></a>

A recording is submitted alongside the report and can also be found at: https://wgu.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=19e8e9e8-bbc1-4ede-ab21-b20301420c9e

<BR>

## F. Sources for Third-Party Code <a class="anchor" id="F"></a>

No sources of third-party code were used for this assessment.

<BR>

## G. Sources <a class="anchor" id="G"></a>

Cleveland Clinic. (2024, May 1). _Abilify (Aripiprazole ): Uses & side effects_. Cleveland Clinic. https://my.clevelandclinic.org/health/drugs/19695-aripiprazole-tablets 

IUYasik. (2023, October 4). _Market basket analysis & apriori algorithm using Zhang’s metric_. Medium. https://medium.com/@iuyasik/market-basket-analysis-apriori-algorithm-using-zhangs-metric-708406fc5dfc#:~:text=Lift%20is%20a%20metric%20that,would%20be%20expected%20by%20chance. 

_Lisinopril uses, dosage, side effects & warnings_. Drugs.com. (n.d.). https://www.drugs.com/lisinopril.html 

Mayo Foundation for Medical Education and Research. (2024, October 1). _Glipizide and metformin (oral route) description and brand names_. Mayo Clinic. https://www.mayoclinic.org/drugs-supplements/glipizide-and-metformin-oral-route/description/drg-20061984#:~:text=Glipizide%20and%20Metformin%20combination%20is,excess%20sugar%20for%20later%20use. 

Sivek, S. C. (2020, November 17). _Market basket analysis 101: Key concepts_. Medium. https://towardsdatascience.com/market-basket-analysis-101-key-concepts-1ddc6876cd00 

Tian, H. (2015, September 1). _Market basket analysis_. HUA’S Analysis. https://sarahtianhua.wordpress.com/portfolio/market-basket-analysis/ 