# Introduction
Apriori algorithm is used to identify frequent individual items, discover patterns between variables and learn the association rules in the data.

In [1]:
!pip install apyori



# Data
The provided input file ("categories.txt") consists of the category lists of 77,185 places in the US. Each line corresponds to the category list of one place, where the list consists of a number of category instances (e.g., hotels, restaurants, etc.) that are separated by semicolons. An example line is provided below: 

Local Services;IT Services & Computer Repair 

In the example above, the corresponding place has two category instances: 

"Local Services" and "IT Services & Computer Repair".

In [2]:
import pandas as pd

data = pd.read_csv('categories.csv')
data.head()

Unnamed: 0,Column1,Column2,Column3,Column4,Column5,Column6,Column7,Column8
0,Breakfast & Brunch,American (Traditional),Restaurants,,,,,
1,Sandwiches,Restaurants,,,,,,
2,Local Services,IT Services & Computer Repair,,,,,,
3,Restaurants,Italian,,,,,,
4,Food,Coffee & Tea,,,,,,


#Data Transformation
The Apriori algorithm cannot be implemented on a dataframe and requires it to be transformed as list of lists without NaN values for input.

In [3]:
main_list=[]
main_list = data.stack().groupby(level=0).apply(list).tolist()
main_list[:5]

[['Breakfast & Brunch', 'American (Traditional)', 'Restaurants'],
 ['Sandwiches', 'Restaurants'],
 ['Local Services', 'IT Services & Computer Repair'],
 ['Restaurants', 'Italian'],
 ['Food', 'Coffee & Tea']]

#Frequent Single Item Mining
The association rules for single category items for each place is determined at a minimum relative support of 0.01.

In [4]:
from apyori import apriori

rules1 = apriori(main_list, min_support=0.01, max_length=1)
results1 = list(rules1)

results1[:5]

[RelationRecord(items=frozenset({'Active Life'}), support=0.04020211180928937, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'Active Life'}), confidence=0.04020211180928937, lift=1.0)]),
 RelationRecord(items=frozenset({'American (New)'}), support=0.02063872514089525, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'American (New)'}), confidence=0.02063872514089525, lift=1.0)]),
 RelationRecord(items=frozenset({'American (Traditional)'}), support=0.03130141866943059, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'American (Traditional)'}), confidence=0.03130141866943059, lift=1.0)]),
 RelationRecord(items=frozenset({'Arts & Entertainment'}), support=0.02942281531385632, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'Arts & Entertainment'}), confidence=0.02942281531385632, lift=1.0)]),
 RelationRecord(items=frozenset({'Auto Repair'}), support=0.02

The apriori algorithm outputs relative support, from which absolute support is found with 77,185 category lists.

In [5]:
part1=[]

for i in results1:
  for j in i.ordered_statistics:
    abs_support = int(i.support * 77185)
    part1.append((str(abs_support)+':'+next(iter(i.items))))
    
part1[:5]

['3103:Active Life',
 '1593:American (New)',
 '2416:American (Traditional)',
 '2271:Arts & Entertainment',
 '1716:Auto Repair']

#Frequent Itemset Mining
The association rules for all category items for each place is determined at a minimum relative support of 0.01.

In [6]:
rules2 = apriori(main_list, min_support=0.01)
results2 = list(rules2)

results2[:5]

[RelationRecord(items=frozenset({'Active Life'}), support=0.04020211180928937, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'Active Life'}), confidence=0.04020211180928937, lift=1.0)]),
 RelationRecord(items=frozenset({'American (New)'}), support=0.02063872514089525, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'American (New)'}), confidence=0.02063872514089525, lift=1.0)]),
 RelationRecord(items=frozenset({'American (Traditional)'}), support=0.03130141866943059, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'American (Traditional)'}), confidence=0.03130141866943059, lift=1.0)]),
 RelationRecord(items=frozenset({'Arts & Entertainment'}), support=0.02942281531385632, ordered_statistics=[OrderedStatistic(items_base=frozenset(), items_add=frozenset({'Arts & Entertainment'}), confidence=0.02942281531385632, lift=1.0)]),
 RelationRecord(items=frozenset({'Auto Repair'}), support=0.02

In [7]:
part2=[]

for i in results2:
  for j in i.ordered_statistics:
    temp_list=[]
    temp_list = list(i.items)
    temp_list_str = ';'.join(temp_list)
    abs_support = int(i.support * 77185)
    part2.append((str(abs_support))+':'+temp_list_str)

part2 = list(set(part2))
part2[:5]

['874:Pubs',
 '1442:Fitness & Instruction',
 '2415:American (Traditional);Restaurants',
 '823:General Dentistry;Health & Medical',
 '2421:Nightlife;Bars;Restaurants']

#Saving Files

In [0]:
with open('patterns1.txt', 'w') as f:
    for i in part1:
        f.write("%s\n" % i)

with open('patterns2.txt', 'w') as f:
    for i in part2:
        f.write("%s\n" % i)