# Asocijativna analiza
## Tehnike i metode analize podataka
### Jovan Juric 1206

Dataset preuzet sa linka: https://www.kaggle.com/akalyasubramanian/dataset-for-apriori-algorithm#__sid=js0


In [195]:
from csv import reader

#This method reads from the dataset
def read_csv(file_path):
    data = []
    item_set = set()
    with open (file_path, 'r') as file:
        csv_reader = reader(file)
        for items in csv_reader:
            data.append(items)
            tmp_item_set = set(items)
            for item in tmp_item_set:
                item_set.add(frozenset([item]))
    return data,item_set
            

In [196]:
data,item_set = read_csv('store_data.csv')

Ova metoda vrši izračunavanje *support* koeficijenta za dati skup artikala

In [197]:
def calc_support(data,item_set,item_supp_dict):
    supp=[]
    N = len(data)
    for item in item_set:
        counter = 0
        for item_list in data:
            if item.issubset(item_list):
                counter+=1
        supp_val = counter/N
        supp.append(supp_val)
        item_supp_dict[item]=supp_val
    return supp

Ova metoda vrši filtriranje skupova artikala na osnovu vrednosti *support* koeficijenta

In [None]:
def filter_above_min_support(item_set,support,min_val):
    new_item_set = set()            
    for i, item in enumerate(item_set):
        if support[i] > min_val:
            new_item_set.add(item)
    return new_item_set

Ova metoda služi za formiranje novih kandidata dodavanje jednog elementa postojećim skupovima.

In [199]:
def add_item_to_sets(item_set,k):
    new_item_set = [s1.union(s2) for s1 in item_set for s2 in item_set if len(s1.union(s2)) == k]
    return new_item_set

Metoda *prune* vrši izostavljanje skupova koji nisu frekventni.

Metoda *powerset* za tadi skup vraća sve moguće podskupove tog skupa.

In [200]:
from itertools import combinations,chain

def prune(prev_item_set, curr_item_set, k):
    new_item_set = curr_item_set.copy()
    for item in new_item_set:
        possible_sets = combinations(item, k)
        for s in possible_sets:
            if frozenset(s) not in prev_item_set:
                new_item_set.remove(item)
                break
    return new_item_set

def powerset(iterable):
    "powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
    s = list(iterable)
    return chain.from_iterable(combinations(s, r) for r in range(1,len(s)))

Metoda *conclusions* vrši izračunavanje asocijativnih pravila na osnovu izračunatih *support* vrednosti skupova. Vrši se izračunavanje pouzdanosti(*confidence*) nekih skupova u odnosu na druge pdoskupove i na osnovu toga se mogu izvršiti zaključci.

In [201]:
def conclusions(item_set_dict,item_supp_dict,min_val):
    deduction_list = []
    for key, item_set in item_set_dict.items():
        for item in item_set:
            possible_subsets = powerset(item)
            for subset in possible_subsets:
                confidence = item_supp_dict[item] / item_supp_dict[frozenset(subset)]
                if confidence > min_val:
                    deduction_list.append((confidence,set(subset),set(item.difference(subset))))
    return deduction_list
            

In [202]:
item_set_dict = dict()
item_supp_dict = dict()
support = calc_support(data,item_set,item_supp_dict)
item_set = filter_above_min_support(item_set,support,0.01)
k=2
while(len(item_set)):
    item_set_dict[k-1]=item_set
    new_item_set = add_item_to_sets(item_set,k)
    new_item_set = prune(item_set,new_item_set,k-1)
    support = calc_support(data,new_item_set,item_supp_dict)
    item_set = filter_above_min_support(new_item_set,support,0.01)
    k+=1



U nastavku možemo videti rezutate *apriori* algoritma. Ljudi često kupuju mineralnu vodu i iz tog razloga se možda može isključiti iz proračuna.

In [208]:
deduction_list = conclusions(item_set_dict,item_supp_dict,0.1)
deduction_list.sort(key = lambda x: x[0],reverse=True)
deduction_list

[(0.5066666666666667, {'eggs', 'ground beef'}, {'mineral water'}),
 (0.503030303030303, {'ground beef', 'milk'}, {'mineral water'}),
 (0.47398843930635837, {'chocolate', 'ground beef'}, {'mineral water'}),
 (0.46892655367231634, {'frozen vegetables', 'milk'}, {'mineral water'}),
 (0.45646437994722955, {'soup'}, {'mineral water'}),
 (0.455026455026455, {'pancakes', 'spaghetti'}, {'mineral water'}),
 (0.4476744186046512, {'olive oil', 'spaghetti'}, {'mineral water'}),
 (0.44360902255639095, {'milk', 'spaghetti'}, {'mineral water'}),
 (0.43568464730290457, {'chocolate', 'milk'}, {'mineral water'}),
 (0.4353741496598639, {'ground beef', 'spaghetti'}, {'mineral water'}),
 (0.43062200956937796, {'frozen vegetables', 'spaghetti'}, {'mineral water'}),
 (0.42424242424242425, {'eggs', 'milk'}, {'mineral water'}),
 (0.4190283400809717, {'olive oil'}, {'mineral water'}),
 (0.41693811074918563, {'ground beef', 'mineral water'}, {'spaghetti'}),
 (0.41655359565807326, {'ground beef'}, {'mineral water