# D212 Task 3

## Part I: Research Question

### A.  Describe the purpose of this data mining report by doing the following:

#### 1.  Propose one question relevant to a real-world organizational situation that you will answer using market basket analysis.

Can we identify the relationship between the purchase of certain items and other items from our dataset of customer transactions in order to use this data to recommend items to customers based on current purchases?

#### 2.  Define one goal of the data analysis. Ensure that your goal is reasonable within the scope of the scenario and is represented in the available data.

The goal of this analysis will be to identify the relationship between the items a customer purchases to see if there is a greater chance of them purchasing one item based on another. The purpose of this will be to use to properly pair, market and recommend items based on a customers current purchases.

## Part II: Market Basket Justification

### B.  Explain the reasons for using market basket analysis by doing the following:

#### 1.  Explain how market basket analyzes the selected dataset. Include expected outcomes.

Market basket analysis looks at the frequency of items being purchased together to try to identify a relationship between the items. 

The expected outcome of this analysis will be:
* The **antecedent** which is the initial purchase
* The **consequent** which is the item purchased as a result of the first purchase
* The **support** which is the ratio of transactions which contain an item or items.
* The **confidence** which is the ratio of the support of an item or items to the support of an item or items.
* The **lift** which is the ratio of the transactions containing an item or items to the item or items if they were random and independent, or the ration of the true to expected confidence.

#### 2.  Provide one example of transactions in the dataset.

The first transaction in the dataset is:

['Logitech M510 Wireless mouse',
 'HP 63 Ink',
 'HP 65 ink',
 'nonda USB C to USB Adapter',
 '10ft iPHone Charger Cable',
 'HP 902XL ink',
 'Creative Pebble 2.0 Speakers',
 'Cleaning Gel Universal Dust Cleaner',
 'Micro Center 32GB Memory card',
 'YUNSONG 3pack 6ft Nylon Lightning Cable',
 'TopMate C5 Laptop Cooler pad',
 'Apple USB-C Charger cable',
 'HyperX Cloud Stinger Headset',
 'TONOR USB Gaming Microphone',
 'Dust-Off Compressed Gas 2 pack',
 '3A USB Type C Cable 3 pack 6FT',
 'HOVAMP iPhone charger',
 'SanDisk Ultra 128GB card',
 'FEEL2NICE 5 pack 10ft Lighning cable',
 'FEIYOLD Blue light Blocking Glasses']

#### 3.  Summarize one assumption of market basket analysis.
 
Market basket analysis relies on there being a frequent relationship between items which are purchased together. If no items are purchased together frequently enough, no meaningful conclusions can be drawn from the analysis.

## Part III: Data Preparation and Analysis

### C.  Prepare and perform market basket analysis by doing the following:

#### 1.  Transform the dataset to make it suitable for market basket analysis. Include a copy of the cleaned dataset.

In [1]:
# import necessary libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from mlxtend.preprocessing import TransactionEncoder

In [2]:
# import data from csv
df = pd.read_csv('teleco_market_basket.csv')

# set it so we can see all columns
pd.set_option('display.max_columns', None)

In [3]:
trans_list = df.stack().groupby(level=0).apply(list).tolist()

In [4]:
encoder = TransactionEncoder().fit(trans_list)
oh = encoder.transform(trans_list)
oh_df = pd.DataFrame(oh, columns = encoder.columns_)

In [11]:
oh_df.to_csv('t3_data.csv')

#### 2.  Execute the code used to generate association rules with the Apriori algorithm. Provide screenshots that demonstrate the error-free functionality of the code.

In [5]:
from mlxtend.frequent_patterns import apriori, association_rules
freq_items = apriori(oh_df, min_support=.0034, use_colnames=True)

In [6]:
freq_items

Unnamed: 0,support,itemsets
0,0.009065,(10ft iPHone Charger Cable)
1,0.050527,(10ft iPHone Charger Cable 2 Pack)
2,0.005199,(3 pack Nylon Braided Lightning Cable)
3,0.042528,(3A USB Type C Cable 3 pack 6FT)
4,0.019064,(5pack Nylon Braided USB C cables)
...,...,...
1218,0.003733,"(Dust-Off Compressed Gas 2 pack, Screen Mom Sc..."
1219,0.004399,"(Dust-Off Compressed Gas 2 pack, VIVO Dual LCD..."
1220,0.004533,"(Dust-Off Compressed Gas 2 pack, VIVO Dual LCD..."
1221,0.004399,"(Dust-Off Compressed Gas 2 pack, VIVO Dual LCD..."


In [7]:
rules = association_rules(freq_items, metric='confidence', min_threshold=0.1)

#### 3.  Provide values for the support, lift, and confidence of the association rules table.

In [235]:
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(10ft iPHone Charger Cable 2 Pack),(Anker USB C to HDMI Adapter),0.050527,0.068391,0.006932,0.137203,2.006162,0.003477,1.079755
1,(Anker USB C to HDMI Adapter),(10ft iPHone Charger Cable 2 Pack),0.068391,0.050527,0.006932,0.101365,2.006162,0.003477,1.056572
2,(10ft iPHone Charger Cable 2 Pack),(Apple Lightning to Digital AV Adapter),0.050527,0.087188,0.006266,0.124011,1.422329,0.001860,1.042035
3,(10ft iPHone Charger Cable 2 Pack),(Apple Pencil),0.050527,0.179709,0.009065,0.179420,0.998387,-0.000015,0.999647
4,(10ft iPHone Charger Cable 2 Pack),(Apple USB-C Charger cable),0.050527,0.132116,0.007066,0.139842,1.058479,0.000390,1.008982
...,...,...,...,...,...,...,...,...,...
2038,"(HP 61 ink, VIVO Dual LCD Monitor Desk mount, ...",(Nylon Braided Lightning to USB cable),0.010932,0.095321,0.003466,0.317073,3.326386,0.002424,1.324709
2039,"(Nylon Braided Lightning to USB cable, HP 61 ink)","(VIVO Dual LCD Monitor Desk mount, Screen Mom ...",0.022930,0.035462,0.003466,0.151163,4.262677,0.002653,1.136305
2040,"(Nylon Braided Lightning to USB cable, VIVO Du...","(HP 61 ink, Screen Mom Screen Cleaner kit)",0.027863,0.032129,0.003466,0.124402,3.871945,0.002571,1.105383
2041,"(Nylon Braided Lightning to USB cable, Screen ...","(HP 61 ink, VIVO Dual LCD Monitor Desk mount)",0.023597,0.039195,0.003466,0.146893,3.747761,0.002541,1.126242


#### 4.  Identify the top three rules generated by the Apriori algorithm. Include a screenshot of the top rules along with their summaries.

##### Support

In [8]:
rules.sort_values('support', ascending=False)[0:3]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
361,(Dust-Off Compressed Gas 2 pack),(VIVO Dual LCD Monitor Desk mount),0.238368,0.17411,0.059725,0.250559,1.439085,0.018223,1.102008
362,(VIVO Dual LCD Monitor Desk mount),(Dust-Off Compressed Gas 2 pack),0.17411,0.238368,0.059725,0.343032,1.439085,0.018223,1.159314
312,(HP 61 ink),(Dust-Off Compressed Gas 2 pack),0.163845,0.238368,0.05266,0.3214,1.348332,0.013604,1.122357


##### Confidence

In [9]:
rules.sort_values('confidence', ascending=False)[0:3]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
2005,"(Screen Mom Screen Cleaner kit, SanDisk Ultra ...",(Dust-Off Compressed Gas 2 pack),0.005733,0.238368,0.003733,0.651163,2.731752,0.002366,2.183344
752,"(10ft iPHone Charger Cable 2 Pack, Nylon Braid...",(Dust-Off Compressed Gas 2 pack),0.007999,0.238368,0.005066,0.633333,2.656954,0.003159,2.077178
764,"(10ft iPHone Charger Cable 2 Pack, Stylus Pen ...",(Dust-Off Compressed Gas 2 pack),0.006799,0.238368,0.004266,0.627451,2.632276,0.002645,2.04438


##### Lift

In [10]:
rules.sort_values('lift', ascending=False)[0:3]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
810,"(Dust-Off Compressed Gas 2 pack, Anker 2-in-1 ...",(FEIYOLD Blue light Blocking Glasses),0.009599,0.065858,0.003866,0.402778,6.115863,0.003234,1.564145
139,(Apple Lightning to USB cable),(Falcon Dust Off Compressed Gas),0.015598,0.059992,0.004533,0.290598,4.843951,0.003597,1.325072
813,(Anker 2-in-1 USB Card Reader),"(Dust-Off Compressed Gas 2 pack, FEIYOLD Blue ...",0.029463,0.027596,0.003866,0.131222,4.755044,0.003053,1.119277


## Part IV: Data Summary and Implications

### D.  Summarize your data analysis by doing the following:

#### 1.  Summarize the significance of support, lift, and confidence from the results of the analysis.

* The **support** which is the ratio of transactions which contain an item or items. This is the probability of the item or items in the whole dataset. This indicates the amount of times the item occurs in the dataset showing how much it should apply to larger datasets with this item.
* The **confidence** which is the ratio of the support of an item or items to the support of an item or items. This is the probability of the consequent, fiven the antecedent. This shows how often this relationship is correct.
* The **lift** which is the ratio of the transactions containing an item or items to the item or items if they were random and independent, or the ration of the true to expected confidence. This gives us the how many times more likely it is that the consequent will be purchased based on the antecedent being purchased. This number indicates whether it is a coincidental correlation or not.

#### 2.  Discuss the practical significance of the findings from the analysis.

This analysis may be able to be used to decide whether to recommend items based on other items being purchased. This could be practically applied in order to make a recommender software to analyze what a customer has bought before or what is in their cart in order to make recommendations at checkout or through targeted marketing. In this analysis you can see the relationship between purchases customers have made such as in the top lift result of 'Anker 2-in-1 USB Card Reader' and 'Dust-Off Compressed Gas 2 pack' with the consequent of FEIYOLD Blue light Blocking Glasses. It might be good to recommend those glasses at checkout, since it seems likely for customers to buy those alongside those antecedents.

#### 3.  Recommend a course of action for the real-world organizational situation from part A1 based on your results from part D1.

To use the results of this analysis, the organization should create an automatic recommendation system based on the relationships between the items that customers usually buy together to bring these items into the customer view. This could be done at checkout online, or by using a related items section on an the online page where an antecedent product is located containing common consequent purchases.

## Part V: Attachments

### E.  Provide a Panopto video recording that includes a demonstration of the functionality of the code used for the analysis and a summary of the programming environment.

https://wgu.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=b35d5520-20a6-4dbb-acc3-ade400339e7d
 
### F.  Record all web sources used to acquire data or segments of third-party code to support the application. Ensure the web sources are reliable.

N/A

### G.  Acknowledge sources, using in-text citations and references, for content that is quoted, paraphrased, or summarized.

N/A