# Create Next Best Offering to Drive Revenue and Loyalty

DQLab.id Fashion is a fashion store that sells various products such as jeans, shirts, cosmetics, and others. Even though it is quite developed, with the increasing number of competitors and many products that are still in stock, it certainly worries Mr. Agus, the manager of DQLab.id Fashion.

One solution is to create innovative packages. Where products that were previously not selling well but had market share could be packaged and sold.

As a data scientist, you will be assigned to help Pak Agus to identify interesting product packages to package so that in the end it can increase the profits and loyalty of DQLab.id Fashion customers. And to accomplish this, we'll be using R and the a prioriary algorithm from the arules package throughout this project.

Dataset from DQLAB Data Science Course

# DQLab.id Fashion Sales Transaction Dataset

Our data, called transaction_dqlab_retail.tsv, is a TSV (Tab Separated Value) file format which contains 3 months data of transactions with 33,669 lines of data (3,450 transaction codes).

# Project Completion Instructions

in completing this project we will go through 3 main steps, they are :

1.Get top 10 of products sold 

2.Get top 10 of products sold from bottom

2.Get a list of all product package combinations with strong correlations

3.Get a list of all product package combinations with specific items

In [3]:
#install.packages("arules")

library(arules)
transaksi_tabular <- read.transactions(file="transaksi_dqlab_retail.tsv", format="single", sep="\t", cols=c(1,2), skip=1)
write(transaksi_tabular, file="test_project_retail_1.txt", sep=",")

# 1. Get top 10 of products sold 

Create an R script to generate the list of top 10 products sold, and save the results in the file top10_item_retail.txt.
Use the dataset transaction_dqlab_retail.tsv when reading data.

In [28]:
library(arules)
transaksi <- read.transactions(file="transaksi_dqlab_retail.tsv", format="single", sep="\t", cols=c(1,2), skip=1)
data_item <- itemFrequency(transaksi, type="absolute")
data_item <- sort(data_item, decreasing = TRUE)
data_item <- data_item[1:10]
data_item <- data.frame("Nama Produk"=names(data_item), "Jumlah"=data_item, row.names=NULL)
print(data_item)
write.csv(data_item, file="top10_item_retail.txt")

                 Nama.Produk Jumlah
1               Shampo Biasa   2075
2              Serum Vitamin   1685
3          Baju Batik Wanita   1312
4          Baju Kemeja Putih   1255
5       Celana Jogger Casual   1136
6                Cover Koper   1086
7         Sepatu Sandal Anak   1062
8  Tali Pinggang Gesper Pria   1003
9        Sepatu Sport merk Z    888
10              Wedges Hitam    849


# Getting top 10 of products sold from bottom

The next stage is that generating list of top 10 products from bottom from the given transaction dataset and the results are saved in the file bottom10_item_retail.txt.



In [25]:
library(arules)
transaksi <- read.transactions(file="transaksi_dqlab_retail.tsv", format="single", sep="\t", cols=c(1,2), skip=1)
data_item <- itemFrequency(transaksi, type="absolute")
data_item <- sort(data_item, decreasing = TRUE)
data_item <- data_item[1:10]
data_item <- data.frame("Nama Produk"=names(data_item), "Jumlah"=data_item, row.names=NULL)
print(data_item)
write.csv(data_item, file="top10_item_retail.txt")

                 Nama.Produk Jumlah
1               Shampo Biasa   2075
2              Serum Vitamin   1685
3          Baju Batik Wanita   1312
4          Baju Kemeja Putih   1255
5       Celana Jogger Casual   1136
6                Cover Koper   1086
7         Sepatu Sandal Anak   1062
8  Tali Pinggang Gesper Pria   1003
9        Sepatu Sport merk Z    888
10              Wedges Hitam    849


# Getting interesting product combinations

Once we are sure you can do it Mr. Agus would like you to send a file containing a list of the 10 most "interesting" product combination packages.

The definition of interesting combination product packages according to Mr. Agus is the following characteristics :

Having a close association or relationship, minimum product combination of 2 items and maximum 3 items,the product combination appeared in at least 10 of all transactions, have a confidence level of at least 50 percent.


To find a good product combination, we will use one of the most known unsupervised algorithms called apriori algorithm. 3 important terms that are known in the algorithm a priori:

1.Support
This says how popular an itemset is, as measured by the proportion of transactions in which an itemset appears.

2.Confidence
his says how likely item Y is purchased when item X is purchased

3.Elevator
This says how likely item Y is purchased when item X is purchased, while controlling for how popular item Y is.


At this time, because the criteria for an interesting combination product according to Mr. Agus are at least 10 times out of all transactions, the support value must be 10 / the number of transactions. Then have a confidence value of at least 50 percent, then we use a conf value with a value of 0.5



In [27]:
library(arules)
nama_file <- "transaksi_dqlab_retail.tsv"
transaksi_tabular <- read.transactions(file=nama_file, format="single", sep="\t", cols=c(1,2), skip=1)
apriori_rules <- apriori(transaksi_tabular, parameter= list(supp=10/length(transaksi_tabular), conf=0.5, minlen=2, maxlen=3))
apriori_rules <- head(sort(apriori_rules, by='lift', decreasing = T),n=10)
inspect(apriori_rules)
write(apriori_rules, file="kombinasi_retail.txt")

Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime     support minlen
        0.5    0.1    1 none FALSE            TRUE       5 0.002898551      2
 maxlen target  ext
      3  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 10 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[69 item(s), 3450 transaction(s)] done [0.00s].
sorting and recoding items ... [68 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3

"Mining stopped (maxlen reached). Only patterns up to a length of 3 returned!"

 done [0.01s].
writing ... [4637 rule(s)] done [0.00s].
creating S4 object  ... done [0.00s].
     lhs                             rhs                              support confidence    coverage     lift count
[1]  {Tas Makeup,                                                                                                  
      Tas Pinggang Wanita}        => {Baju Renang Anak Perempuan} 0.010434783  0.8780488 0.011884058 24.42958    36
[2]  {Tas Makeup,                                                                                                  
      Tas Travel}                 => {Baju Renang Anak Perempuan} 0.010144928  0.8139535 0.012463768 22.64629    35
[3]  {Tas Makeup,                                                                                                  
      Tas Ransel Mini}            => {Baju Renang Anak Perempuan} 0.011304348  0.7358491 0.015362319 20.47322    39
[4]  {Sunblock Cream,                                                                         

# Looking for Product Packages that can be paired with Slow-Moving Items

Slow-moving items are products whose sales movement is slow or not fast enough. This will be problematic if the product items are still piled up.

Because these items are difficult to sell individually, we need to look for strong associations of this product item with other products so that if it is packaged it will be more attractive.

Pak Agus also believes in this, and wants you to help identify two product items that he thinks are still in large stock and need to find pairs of items for packaging.

The two product items are a "Tas Makeup" and a "Baju Renang Pria Anak-anak". Pak Agus wants to ask for a combination that can be bundled with the two products.

Each product is issued 3 rules with the strongest association, so there are 6 rules in total. The requirements for this strong association are still the same as those mentioned earlier by Pak Agus, except that the confidence level is tried at a minimum level of 0.1.

Create an R script to generate the list and the results are saved in the file combination_retail_slow_moving.txt. And to generate this file, rules don't need to be converted into a data frame and can be written directly by writing like the following syntax.

In [11]:
library(arules)
transaksi_tabular <- read.transactions(file="transaksi_dqlab_retail.tsv", format = "single", sep= "\t", cols=c(1,2), skip=1)
mba2 <- apriori(transaksi_tabular, parameter = list(supp = 10/length(transaksi_tabular), confidence = 0.1, minlen = 2, maxlen = 3))

#subset tas makeup
mba_1 <- subset(mba2, rhs %in% "Tas Makeup")
mba_1 <- head(sort(mba_1, by="lift", decreasing=TRUE), n=3)

#subset baju renang pria anak-anak
mba_2 <- subset(mba2, rhs %in% "Baju Renang Pria Anak-anak")
mba_2 <- head(sort(mba_2, by="lift", decreasing=TRUE), n=3)

mba2 <- c(mba_1,mba_2)
write(mba2, file="kombinasi_retail_slow_moving.txt")

Apriori

Parameter specification:
 confidence minval smax arem  aval originalSupport maxtime     support minlen
        0.1    0.1    1 none FALSE            TRUE       5 0.002898551      2
 maxlen target  ext
      3  rules TRUE

Algorithmic control:
 filter tree heap memopt load sort verbose
    0.1 TRUE TRUE  FALSE TRUE    2    TRUE

Absolute minimum support count: 10 

set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[69 item(s), 3450 transaction(s)] done [0.00s].
sorting and recoding items ... [68 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 3

"Mining stopped (maxlen reached). Only patterns up to a length of 3 returned!"

 done [0.01s].
writing ... [39832 rule(s)] done [0.00s].
creating S4 object  ... done [0.01s].
