# D212 Data Mining-II 
## Performance Assessment
## Task-3

Submitted by Muhammad Ilyas, Student ID 011143032, for WGU's MSDA program

## A1: Proposal of Question

What are the frequent co-occurring prescriptions or medications among the hospital's patients?

This question aims to uncover associations between different prescriptions or medications that patients frequently use together. Understanding these associations can provide insights into potential relationships between medical conditions or treatment approaches, aiding the hospital in tailoring treatments more effectively.

## A2: Defined Goal

 Identify common patterns of prescription combinations among patients to improve treatment strategies and patient care pathways.

This goal aligns with the available dataset of patient prescription histories. By identifying frequent co-occurring prescriptions, the hospital aims to develop a better understanding of patient needs, potentially leading to improved treatment plans, reduced readmissions, and more targeted care approaches.

These objectives aim to leverage market basket analysis techniques to uncover associations within the prescription data, allowing the hospital to enhance its patient care strategies and optimize cost-effectiveness in the long term. 

## B1: Explanation of Market Basket

Market basket analysis identifies associations or patterns within a dataset to uncover items that are frequently purchased or used together. In the case of patient prescription data, market basket analysis identifies co-occurring prescriptions among patients. It examines which medications tend to be prescribed together and calculates their associations based on the frequency of occurrence. The expected outcomes involve the discovery of prescription combinations that commonly occur within patient histories. For instance, it might reveal that patients prescribed medication A are often also prescribed medication B or C, indicating potential co-treatment patterns or related medical conditions(How to Perform Market Basket Analysis, 2022).

## B2: Transaction Example

In the context of patient prescriptions, a transaction might be represented as follows:

Transaction ID: 001

Prescription 1: Amlodipine
Prescription 2: Albuterol aerosol
Prescription 3: Pantoprazole
This represents a transaction where a patient has been prescribed these three medications as part of their treatment.

## B3: Assumption of Market Basket Analysis

One assumption of market basket analysis is the "Apriori Principle." This principle assumes that if an itemset (a collection of items) is frequent, then all of its subsets must also be frequent. In the context of patient prescriptions, it implies that if a particular combination of medications is frequently prescribed, then the individual medications within that combination are also likely to be frequently prescribed separately or in other combinations. This assumption forms the basis for identifying frequent itemsets and association rules in market basket analysis.

### C1: Transforming the Dataset

In [None]:
import pandas as pd

# Read the dataset
df = pd.read_csv('medical_market_basket.csv')

pd.set_option("display.max_columns", None)
df.head()

In [None]:
# As it seems that every alternative row is null:
# If first column of prescribed drug is NaN, will delete row:
df = df[df['Presc01'].notna()]
df.reset_index(drop=True, inplace=True)
df.head()

In [None]:
# Convert the data into a list of transactions
transactions = []
for index, row in df.iterrows():
    transaction = [str(row.iloc[i]) for i in range(20) if pd.notnull(row.iloc[i])]
    transactions.append(transaction)

# Example of the first few transactions
print(transactions[:5])  # Displaying the first 5 transactions

In [4]:
# Convert the transaction list into a one-hot encoded DataFrame
df = pd.DataFrame(transactions)
encoded_df = pd.get_dummies(pd.DataFrame(transactions).stack()).groupby(level=0).sum()
# Replace values greater than 1 with 1 to ensure a proper binary-encoded DataFrame
encoded_df = encoded_df.applymap(lambda x: 1 if x > 1 else x)

  encoded_df = encoded_df.applymap(lambda x: 1 if x > 1 else x)


In [None]:
# Convert DataFrame to boolean type
cleaned_df = encoded_df.astype(bool)
cleaned_df

In [6]:
cleaned_df.to_csv('task3_full_clean.csv', index=False)

### C2: Code Execution

In [None]:
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

# Generate frequent item sets
frequent_itemsets = apriori(cleaned_df, min_support=0.01, use_colnames=True)
# Display frequent itemsets
print(frequent_itemsets)

In [8]:
# Generate association rules
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1.2)

### C3: Association Rules Table

In [None]:
# Display the association rules
rules

### C4: Top Three Rules

In [10]:
top_3_rules = rules.sort_values(by='lift', ascending=False).head(3)
top_3_rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
243,(methylprednisone),(lisinopril),0.04946,0.098254,0.015998,0.32345,3.291994,0.011138,1.33286,0.73246
242,(lisinopril),(methylprednisone),0.098254,0.04946,0.015998,0.162822,3.291994,0.011138,1.13541,0.772094
322,(lisinopril),"(abilify, carvedilol)",0.098254,0.059725,0.017064,0.173677,2.907928,0.011196,1.137902,0.727602


### D1: Significance of Support, Lift and Confidence Summary

#### Support: 
Indicates how frequently the antecedent and consequent items appear together. Higher support values signify stronger associations. In top rules, the support ranges between approximately 0.015998 to 0.017064, indicating these associations are relatively common in the dataset.

#### Lift: 
Represents the strength of association between antecedent and consequent beyond what would be expected by chance. A lift greater than 1 implies a positive relationship between the items. The lift values in the top rules are notably high, ranging from approximately 2.9 to over 3.29, suggesting strong associations.

#### Confidence: 
Reflects the probability of the consequent occurring given the antecedent. Higher confidence values indicate a higher likelihood of the consequent given the antecedent. Your top rules exhibit confidence ranging from approximately 0.162822 to 0.323450, indicating moderately strong predictive power.

### D2: Practical Significance of Findings

#### Association Strength: 
The high lift values suggest robust associations between certain medications. For instance, the association between methylprednisone and lisinopril is notably strong, indicating that patients prescribed one medication are significantly more likely to be prescribed the other.

#### Treatment Insights: 
These associations can provide valuable insights into potential treatment pathways. For example, if patients are frequently prescribed both methylprednisone and lisinopril, it might suggest a common condition or an effective combined treatment strategy.

### D3: Course of Action

#### Treatment Bundling: 
Consider creating treatment bundles or standardized care pathways that incorporate these frequently associated medications. For instance, for patients prescribed methylprednisone, closely monitor and consider the potential need for lisinopril based on the association strength.

#### Physician Awareness: 
Educate physicians about these strong medication associations to assist in making more informed prescription decisions. This awareness can help in tailoring treatments effectively.

#### Patient Care Optimization: 
Tailor patient care plans based on the identified associations to provide more personalized and potentially more effective treatment strategies.

These actions based on the identified associations can contribute to optimized patient care, potentially reducing readmissions, improving outcomes, and enhancing cost-effectiveness for the hospital.

### E & E1: Panopto Video
The video is recorded and being uploaed in directed folder. 

### G: Sources
How to Perform Market Basket Analysis. (2022, November 4). 365 Data Science. https://365datascience.com/tutorials/python-tutorials/market-basket-analysis/

### F: Sources for Third Party Code
No third party code was used (only Data Camp)