<a href="https://colab.research.google.com/github/genc-ozge/genc-ozge.github.io/blob/main/MarketBasket_TicketSales.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Market Basket Analysis | Ticket Sales Data

The purpose of this study is to analyze ticket sales data from a cultural institution and employ market basket analysis to establish association rules between festivals and their corresponding venues. This study seeks to address the following issue: analyzing ticket sales data for a cultural institution that organizes festivals and one-time events throughout the year to define association rules between venues and festivals. These association rules can find applications in various direct marketing activities, such as providing event recommendations on websites, delivering personalized e-bulletins, and offering customized event suggestions on mobile applications. For instance, if an individual purchases a ticket for an event at a specific venue, they can receive a triggered email containing recommended events and venues based on the established association rules. Furthermore, the study aims to gain insights into audience behavior. The Python language is employed for data cleansing, algorithm application, and results presentation. Once the association rules are established, they can be leveraged in future recommendation projects.

## Data
The data consists of ticket sales transactions from a cultural institution.

The transactional data includes detailed information for each ticket sales transaction, including:

- Customer ID
- Purchase date and time
- Performance name
- Performance venue

The festivals and their corresponding venues are as follows:

- Festival F: A film festival with seven venues (Venue names: F1, F2, F3, ..., F7)
- Festival M: A music festival with eleven venues (Venue names: M1, M2, M3, ..., M11)
- Festival C: A jazz festival with fifteen venues (Venue names: C1, C2, ..., C15)
- Festival S: Concerts and special events with two venues (Venue names: S1, S2)

The original customer IDs have been replaced with sequential numbers from 1 to 23,739 for the purpose of this study. In the original data, certain transactions lack customer IDs due to the ticket holders being listed as anonymous. These transactions have been excluded from the study, and only transactions with valid customer IDs are considered.

## Pre-Processing for Apriori Algorithm

During the data cleaning stage, the following assumptions are taken into account, and the data is simplified and these transactions are excluded:

- return ticket transactions: It is assumed that all purchased tickets are not returned. While this assumption may not be entirely realistic, considering the limited timeframe of available transactions, it is impractical to identify all returned tickets accurately.
- exclusive events within the festivals.
- voucher transactions; only purchased tickets are considered.
- anonymous transactions: Transactions without a customer ID are omitted.

After finalizing the cleaning process, the dataset is used for subsequent pre-processing steps in implementing the apriori algorithm.

In [None]:
!pip install apyori

#importing the required libraries
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
from apyori import apriori





In [None]:
from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))

Saving CustVenue.csv to CustVenue.csv
User uploaded file "CustVenue.csv" with length 1554118 bytes


In [None]:
#Read File
myd = pd.read_csv('CustVenue.csv', sep=";")
myd.head()

Unnamed: 0,CustomerID,VenueID,EventName
0,1,M6,Festival M
1,1,M1,Festival M
2,1,M6,Festival M
3,1,M11,Festival M
4,2,M6,Festival M


In [None]:
print(myd.VenueID.unique())
print(myd.CustomerID.unique())
print(myd.EventName.unique())

['M6' 'M1' 'M11' 'F1' 'F3' 'F2' 'M4' 'M7' 'F4' 'C8' 'C3' 'S1' 'C11' 'C5'
 'C13' 'C1' 'M8' 'M10' 'S2' 'F7' 'F5' 'C14' 'C4' 'C10' 'M2' 'C12' 'M9'
 'C6' 'M3' 'F6' 'M5' 'C9' 'C2' 'C15' 'C7']
[    1     2     3 ... 23737 23738 23739]
['Festival M' 'Festival F' 'Festival C' 'EventGroup S']


In [None]:
#changing the variable types to "category"

myd[['VenueID','CustomerID']] = myd[['VenueID','CustomerID']].astype('category')

In [None]:
#dropping the duplicates and EventName column

myd=myd.drop_duplicates()
myd1 = myd.drop("EventName", axis=1)

In [None]:
#creating nmyl list for Apriori algorithm

myl = myd1.values.tolist()
values = set(map(lambda x:x[0],myl))
nmyl = [[y[1] for y in myl if y[0] == x] for x in values]
len(nmyl)

23614

In [None]:
#training Apriori algorithm

rules = apriori(nmyl, min_support=0.01, min_confidence=0.7, min_lift=3, min_length=2)

# The results
results = list(rules)
print(results)

[RelationRecord(items=frozenset({'F3', 'F1', 'F2'}), support=0.04772592529855171, ordered_statistics=[OrderedStatistic(items_base=frozenset({'F3', 'F2'}), items_add=frozenset({'F1'}), confidence=0.820830298616169, lift=3.587467457250086)]), RelationRecord(items=frozenset({'F1', 'F4', 'F2'}), support=0.023164224612518, ordered_statistics=[OrderedStatistic(items_base=frozenset({'F4', 'F2'}), items_add=frozenset({'F1'}), confidence=0.8794212218649518, lift=3.8435411314304964)]), RelationRecord(items=frozenset({'F1', 'F2', 'F5'}), support=0.027441348352672142, ordered_statistics=[OrderedStatistic(items_base=frozenset({'F2', 'F5'}), items_add=frozenset({'F1'}), confidence=0.7641509433962265, lift=3.3397483578305556)]), RelationRecord(items=frozenset({'F1', 'F6', 'F2'}), support=0.019818751588040993, ordered_statistics=[OrderedStatistic(items_base=frozenset({'F6', 'F2'}), items_add=frozenset({'F1'}), confidence=0.8524590163934426, lift=3.725701871759162)]), RelationRecord(items=frozenset({'F

In [None]:
#Since the results are mostly related to Festival F, we exclude Festival F
myd2 = myd[myd.EventName != 'Festival F']
print(myd2.VenueID.unique())

['M6', 'M1', 'M11', 'M4', 'M7', ..., 'M5', 'C9', 'C2', 'C15', 'C7']
Length: 28
Categories (35, object): ['C1', 'C10', 'C11', 'C12', ..., 'M8', 'M9', 'S1', 'S2']


In [None]:
#dropping EventName column for myd2(without Festival F)
myd2 = myd2.drop("EventName", axis=1)

In [None]:
#creating nmyl2 list for Apriori
myl2 = myd2.values.tolist()
values2 = set(map(lambda x:x[0], myl2))
nmyl2 = [[y[1] for y in myl2 if y[0]==x] for x in values2]
len(nmyl2)

14453

In [None]:
#training nmyl2 Apriori
rules2 = apriori(nmyl2, min_support=0.005, min_confidence=0.7, min_lift=3, min_length=2)

In [None]:
#the results
results2 = list(rules2)
print(results2)

[RelationRecord(items=frozenset({'M1', 'M7'}), support=0.011900643465024563, ordered_statistics=[OrderedStatistic(items_base=frozenset({'M7'}), items_add=frozenset({'M1'}), confidence=0.7078189300411523, lift=8.51090432269948)]), RelationRecord(items=frozenset({'M1', 'C11', 'M6'}), support=0.005604372794575521, ordered_statistics=[OrderedStatistic(items_base=frozenset({'M1', 'C11'}), items_add=frozenset({'M6'}), confidence=0.7297297297297297, lift=7.4168662333219295)]), RelationRecord(items=frozenset({'M1', 'M6', 'M4'}), support=0.011554694527087801, ordered_statistics=[OrderedStatistic(items_base=frozenset({'M1', 'M4'}), items_add=frozenset({'M6'}), confidence=0.8067632850241545, lift=8.19982402141639), OrderedStatistic(items_base=frozenset({'M4', 'M6'}), items_add=frozenset({'M1'}), confidence=0.7660550458715596, lift=9.211142743745134)]), RelationRecord(items=frozenset({'M1', 'M5', 'M6'}), support=0.0059503217325122815, ordered_statistics=[OrderedStatistic(items_base=frozenset({'M5'

In [None]:
#Apriori Algorithm only with EventNames
myd3 = myd.drop("VenueID", axis=1)
myd3 = myd3.drop_duplicates()

In [None]:
#creating nmyl3 list for Apriori algorithm
myl3 = myd3.values.tolist()
values3 = set(map(lambda x:x[0], myl3))
nmyl3 = [[y[1] for y in myl3 if y[0]==x] for x in values3]
len(nmyl3)

23614

In [None]:
#training nmyl3 for Apriori Algorithm
rules3 = apriori(nmyl3, min_support = 0.001, min_confidence=0.1, min_lift=2, min_length=2)

In [None]:
# The Results
results3 = list(rules3)
print(results3)

[RelationRecord(items=frozenset({'Festival F', 'Festival M', 'Festival C'}), support=0.010502244431269585, ordered_statistics=[OrderedStatistic(items_base=frozenset({'Festival M'}), items_add=frozenset({'Festival F', 'Festival C'}), confidence=0.10205761316872428, lift=2.792570657434826), OrderedStatistic(items_base=frozenset({'Festival F', 'Festival C'}), items_add=frozenset({'Festival M'}), confidence=0.28736964078794897, lift=2.792570657434826)]), RelationRecord(items=frozenset({'Festival F', 'EventGroup S', 'Festival M', 'Festival C'}), support=0.001185737274498179, ordered_statistics=[OrderedStatistic(items_base=frozenset({'EventGroup S', 'Festival M'}), items_add=frozenset({'Festival F', 'Festival C'}), confidence=0.32941176470588235, lift=9.013591438893053), OrderedStatistic(items_base=frozenset({'Festival F', 'EventGroup S', 'Festival C'}), items_add=frozenset({'Festival M'}), confidence=0.21052631578947367, lift=2.045830625947585), OrderedStatistic(items_base=frozenset({'Festi

## Results

### Case 1

The associations are primarily observed among the Festival F venues. Among these venues, F1 has the highest frequency, making it one of the oldest, central, and popular locations within Festival F. Despite F5 and F7 not being in close proximity to the other Festival F venues, they still appear in the frequent item sets and rules. Notably, there are rules that include different event names, with C5, M1, and M6 being venues related to Festival F.


### Case 2


By excluding Festival F from the dataset, we observe that the rules mainly involve the Festival M venues. All Festival M venues, except M11, are situated in the European part of Istanbul and appear in the rules. However, M11 does not appear in any of the rules. Additionally, C11 is present in the rules along with M1 and M16.

### Case 3


When the festival venues variable is excluded, and only the event names are considered, the support, lift, and confidence values are lower compared to the previous implementations. The results reveal association rules between Festival F, Festival M, Festival C, and Event Group S.

These results provide valuable insights into the associations between festival venues and event names in different scenarios.