# Proto1_3

**Tässä "alaprotossa" otetaan mukaan enemmän yrityksiä.**

Data:
- interaktiot kaikkien muiden ryhmistä paitsi Alman kehittäjien
- interaktiot ryhmä, ei käyttäjäkohtaisia
- yritysdatana maakuntakoodi ja toimialakoodi yrityksistä, jotka löytyy kannasta


Kysymyksiä:

1. Tuottaako malli mitenkään järkeviä suosituksia?
2. Warp vai bpr?
3. Mitä tapahtuu, jos item_featuret jättää kokonaan pois?
4. item_ ja user_user_identity_featuret? Päälle vai pois? (Tää sittenkin ehkä proto2:een tai myöhempään hiomisvaiheeseen)


## Importataan tarvittavat kirjastot

In [2]:
from lightfm import LightFM
from lightfm.cross_validation import random_train_test_split
from lightfm.evaluation import auc_score
from lightfm.evaluation import precision_at_k
from lightfm.evaluation import recall_at_k
from lightfm.evaluation import reciprocal_rank
from lightfm.data import Dataset

import numpy as np
import pandas as pd

import statistics

## Määritetään halutut ominaisuudet yrityksille

In [3]:
SELECTED_COMPANY_FEATURES = ['location_region_code', 'industry_code']

## Ladataan raakadata yrityksistä

In [None]:
COMPANIES_RAW = pd \
        .read_csv('data/prod_data_companies_more_data_2021_09_16.csv',
                  delimiter='\t',
                  na_values='(null)',
                  dtype={
                      'business_id': 'string',
                      'company_name': 'string',
                      'company_form': 'string',
                      'company_form_code': 'string',
                      'location_region': 'string',
                      'location_region_code': 'string',
                      'location_municipality': 'string',
                      'location_municipality_code': 'string',
                      'industry_code': 'string',
                      'company_status': 'string',
                      'company_status_code': 'string',
                      'personnel_class': 'string'
                  }
                  )

print(COMPANIES_RAW)

## Käsitellään ja poimitaan halutut yritysdatat

In [4]:
ITEM_IDS = list(COMPANIES_RAW['business_id'].values)

# pitäisi varmaan prefiksoida koodit, jotta pysyvät uniikkeina, kun niitä lisätään
item_feature_labels_tmp = [COMPANIES_RAW[feature].dropna().unique() for feature in SELECTED_COMPANY_FEATURES]

ITEM_FEATURE_LABELS = [item for sublist in item_feature_labels_tmp for item in sublist]

ITEM_FEATURES = [(company['business_id'], 
                  [company[feature] for feature in SELECTED_COMPANY_FEATURES if str(company[feature]) != '<NA>'])
                     for company in COMPANIES_RAW.to_dict(orient='records')]

print(ITEM_FEATURES[0:10])

[('01423486', ['02', '68201']), ('15697971', ['01', '87302']), ('02373820', ['01', '88999']), ('02105471', ['19', '68201']), ('01556668', ['06', '68201']), ('01556668', ['06', '68201']), ('15697971', ['01', '87302']), ('02026351', ['01', '47730']), ('01165149', ['01', '68201']), ('01497530', ['07', '87301'])]


## Ladataan vuorovaikutusdata

In [5]:
interactions_tmp = pd \
    .read_csv('data/interactions_2021_08_19.csv',
             delimiter='\t',
             dtype={
                 'group_id': 'string',
                 'business_id': 'string',
                 'owner': 'string'
             })

# Poistetaan vuorovaikutusdatasta sellaiset y-tunnukset, joita ei löydy kohteista
INTERACTIONS_RAW = interactions_tmp[interactions_tmp.business_id.isin(ITEM_IDS)]

print(interactions_tmp.shape)
print(INTERACTIONS_RAW.shape)

(548198, 3)
(346309, 3)


## Käsitellään vuorovaikutusdata

Toistaiseksi ainakin pidän ryhmää "käyttäjänä". Oletukseni on, että ryhmä yrityksiä on se taso, jolle suosituksia halutaan luoda.

In [21]:
group_sizes = INTERACTIONS_RAW['group_id'].value_counts()
group_sizes_df = pd.DataFrame({'group_id': group_sizes.index, 'group_size': group_sizes.values})

INTERACTIONS_WITH_GROUP_SIZES = INTERACTIONS_RAW.merge(group_sizes_df, on='group_id')

INTERACTIONS_10_RAW = INTERACTIONS_WITH_GROUP_SIZES[INTERACTIONS_WITH_GROUP_SIZES.group_size >= 10]

INTERACTIONS_50_RAW = INTERACTIONS_WITH_GROUP_SIZES[INTERACTIONS_WITH_GROUP_SIZES.group_size >= 50]

INTERACTIONS_10 = [(interaction['group_id'], interaction['business_id']) 
                for interaction in INTERACTIONS_10_RAW.to_dict(orient='records')]

INTERACTIONS_50 = [(interaction['group_id'], interaction['business_id']) 
                for interaction in INTERACTIONS_50_RAW.to_dict(orient='records')]

USER_IDS_10 = list(set(INTERACTIONS_10_RAW['group_id'].values))

USER_IDS_50 = list(set(INTERACTIONS_50_RAW['group_id'].values))

print(len(INTERACTIONS_50))
print(INTERACTIONS_10[0:10])

337217
[('c2626398-faac-4ff3-b02d-cdc64b50cdaa', '01681709'), ('c2626398-faac-4ff3-b02d-cdc64b50cdaa', '15055514'), ('c2626398-faac-4ff3-b02d-cdc64b50cdaa', '01876143'), ('c2626398-faac-4ff3-b02d-cdc64b50cdaa', '01863991'), ('c2626398-faac-4ff3-b02d-cdc64b50cdaa', '05363070'), ('c2626398-faac-4ff3-b02d-cdc64b50cdaa', '01387534'), ('c2626398-faac-4ff3-b02d-cdc64b50cdaa', '01372818'), ('c2626398-faac-4ff3-b02d-cdc64b50cdaa', '18348689'), ('c2626398-faac-4ff3-b02d-cdc64b50cdaa', '01421229'), ('c2626398-faac-4ff3-b02d-cdc64b50cdaa', '01446661')]


## Luodaan LightFM:n ymmärtämä Dataset-olio

Kysymyksiä:
1. Tuleeko parempia tuloksia identity_featuret päällä vai poissa?
2. Parempi normalisoinnin kanssa vai ilman?

In [29]:
DATASET = Dataset(user_identity_features=False, item_identity_features=False)

# user_featureja ei ainakaan vielä ole
DATASET.fit(users=USER_IDS_10, items=ITEM_IDS, item_features=ITEM_FEATURE_LABELS)

ITEM_FEATURES_DS = DATASET.build_item_features(ITEM_FEATURES, normalize=False)

(INTERACTIONS_10_DS, WEIGHTS_10_DS) = DATASET.build_interactions(INTERACTIONS_10)

#(INTERACTIONS_50_DS, WEIGHTS_50_DS) = DATASET.build_interactions(INTERACTIONS_50)

USER_MAP_DS = DATASET.mapping()[0]
ITEM_MAP_DS = DATASET.mapping()[2]
ITEM_FEATURE_MAP_DS = DATASET.mapping()[3]

# print(ITEM_FEATURE_MAP_DS)
print(USER_MAP_DS)

{'b2c9e5ef-f0b1-4e97-b8de-9af9e8c63fe5': 0, 'ac014670-d3ba-46a8-9a83-5751976d8548': 1, '684b16e5-f715-4201-8cfc-1c9e143c3929': 2, 'c758ba19-d23d-4ad7-a51c-adcf64706c2b': 3, 'bc3e7785-9146-4951-b8b6-a56465e50312': 4, 'f98ec7f4-bf15-4323-b1df-f80c0491e5ae': 5, '2c1edebf-5c17-463b-b105-cead765d859c': 6, '008beda4-3bb9-46ae-a93f-b6b7fba8cc51': 7, '346011e4-a37e-49b1-8655-10fe8195786f': 8, '9ed6b079-c616-4cc0-aa81-c7063f0ab47f': 9, 'f54d7379-ba6f-463a-8e41-77b8cdad7f97': 10, '9af1acea-ddc9-4e0a-a9d1-ec1194c46ac6': 11, 'ef299ebd-8938-472c-8585-0b4332376cde': 12, '6ee59765-c3fc-4e31-89bf-731f74e87906': 13, '3cc3ae59-1131-47f1-81e7-cd4eb86b6cdc': 14, 'f396bf2d-4cc9-40f8-94ba-6d56425bd2ec': 15, 'd5fa64bb-9ed0-4d0b-ac3c-08dbae4e7f99': 16, 'e00a9d22-6908-4b6b-b41f-980ef896f2b8': 17, 'e8bc7898-2baa-4719-81a1-891d308b046e': 18, 'b6780eac-446f-4a48-92a3-0de3e8654d96': 19, 'aec1682e-91f1-4148-bcab-a198475d0939': 20, 'a2c13abd-3ce3-4bb3-9a6c-3bafd138fa09': 21, '95307440-c9c5-4cee-9273-5043372380c4': 2

## Arvioidaan mallien laatua

### Ajetaan evaluaatiot ja otetaan tulokset talteen 5:ltä ajokerralta

In [30]:
def run_evaluation(model, train, test, evaluation_function, name, item_features=None):    
    print("Calculating %s for train dataset..." % (name))
    train_metric = evaluation_function(model, train, item_features=item_features).mean()
    
    print("Calculating %s for test dataset..." % (name))
    test_metric = evaluation_function(model, test, item_features=item_features).mean()
    
    print('%s: train %.2f, test %.2f.' % (name, train_metric, test_metric))
    print("\n")
    return (train_metric, test_metric)

WARP_AUC = []
WARP_PRECISION = []
WARP_RECALL = []
WARP_RECIPROCAL = []

BPR_AUC = []
BPR_PRECISION = []
BPR_RECALL = []
BPR_RECIPROCAL = []

WARP_NO_ITEM_AUC = []
WARP_NO_ITEM_PRECISION = []
WARP_NO_ITEM_RECALL = []
WARP_NO_ITEM_RECIPROCAL = []

BPR_NO_ITEM_AUC = []
BPR_NO_ITEM_PRECISION = []
BPR_NO_ITEM_RECALL = []
BPR_NO_ITEM_RECIPROCAL = []

for i in [1,2,3,4,5]:
    print("Starting iteration %d" % i)
    
    (TRAIN, TEST) = random_train_test_split(INTERACTIONS_10_DS, test_percentage=0.2)

    MODEL_WARP = LightFM(loss='warp')
    MODEL_WARP.fit(TRAIN, item_features=ITEM_FEATURES_DS, epochs=10, verbose=True)

    MODEL_BPR = LightFM(loss='bpr')
    MODEL_BPR.fit(TRAIN, item_features=ITEM_FEATURES_DS, epochs=10, verbose=True)

    MODEL_WARP_NO_ITEM = LightFM(loss='warp')
    MODEL_WARP_NO_ITEM.fit(TRAIN, epochs=10, verbose=True)

    MODEL_BPR_NO_ITEM = LightFM(loss='bpr')
    MODEL_BPR_NO_ITEM.fit(TRAIN, epochs=10, verbose=True)
    
    
    
    
    WARP_AUC.append(run_evaluation(MODEL_WARP, TRAIN, TEST, auc_score, "AUC_WARP", ITEM_FEATURES_DS))
    BPR_AUC.append(run_evaluation(MODEL_BPR, TRAIN, TEST, auc_score, "AUC_BPR", ITEM_FEATURES_DS))
    
    WARP_PRECISION.append(run_evaluation(MODEL_WARP, TRAIN, TEST, precision_at_k, "PRECISION_WARP", ITEM_FEATURES_DS))
    BPR_PRECISION.append(run_evaluation(MODEL_BPR, TRAIN, TEST, precision_at_k, "PRECISION_BPR", ITEM_FEATURES_DS))
    
    WARP_RECALL.append(run_evaluation(MODEL_WARP, TRAIN, TEST, recall_at_k, "RECALL_WARP", ITEM_FEATURES_DS))
    BPR_RECALL.append(run_evaluation(MODEL_BPR, TRAIN, TEST, recall_at_k, "RECALL_BPR", ITEM_FEATURES_DS))
    
    WARP_RECIPROCAL.append(run_evaluation(MODEL_WARP, TRAIN, TEST, reciprocal_rank, "RECIPROCAL_WARP", ITEM_FEATURES_DS))
    BPR_RECIPROCAL.append(run_evaluation(MODEL_BPR, TRAIN, TEST, reciprocal_rank, "RECIPROCAL_BPR", ITEM_FEATURES_DS))
    
    
    
    
    WARP_NO_ITEM_AUC.append(run_evaluation(MODEL_WARP_NO_ITEM, TRAIN, TEST, auc_score, "AUC_WARP_NO_ITEM"))
    BPR_NO_ITEM_AUC.append(run_evaluation(MODEL_BPR_NO_ITEM, TRAIN, TEST, auc_score, "AUC_BPR_NO_ITEM"))
    
    WARP_NO_ITEM_PRECISION.append(run_evaluation(MODEL_WARP_NO_ITEM, TRAIN, TEST, precision_at_k, "PRECISION_WARP_NO_ITEM"))
    BPR_NO_ITEM_PRECISION.append(run_evaluation(MODEL_BPR_NO_ITEM, TRAIN, TEST, precision_at_k, "PRECISION_BPR_NO_ITEM"))
    
    WARP_NO_ITEM_RECALL.append(run_evaluation(MODEL_WARP_NO_ITEM, TRAIN, TEST, recall_at_k, "RECALL_WARP_NO_ITEM"))
    BPR_NO_ITEM_RECALL.append(run_evaluation(MODEL_BPR_NO_ITEM, TRAIN, TEST, recall_at_k, "RECALL_BPR_NO_ITEM"))
    
    WARP_NO_ITEM_RECIPROCAL.append(run_evaluation(MODEL_WARP_NO_ITEM, TRAIN, TEST, reciprocal_rank, "RECIPROCAL_WARP_NO_ITEM"))
    BPR_NO_ITEM_RECIPROCAL.append(run_evaluation(MODEL_BPR_NO_ITEM, TRAIN, TEST, reciprocal_rank, "RECIPROCAL_BPR_NO_ITEM"))


Starting iteration 1


Epoch: 100%|██████████| 10/10 [00:02<00:00,  3.82it/s]
Epoch: 100%|██████████| 10/10 [00:02<00:00,  4.04it/s]
Epoch: 100%|██████████| 10/10 [00:03<00:00,  2.55it/s]
Epoch: 100%|██████████| 10/10 [00:03<00:00,  3.22it/s]


Calculating AUC_WARP for train dataset...
Calculating AUC_WARP for test dataset...
AUC_WARP: train 0.94, test 0.93.


Calculating AUC_BPR for train dataset...
Calculating AUC_BPR for test dataset...
AUC_BPR: train 0.92, test 0.91.


Calculating PRECISION_WARP for train dataset...
Calculating PRECISION_WARP for test dataset...
PRECISION_WARP: train 0.15, test 0.04.


Calculating PRECISION_BPR for train dataset...
Calculating PRECISION_BPR for test dataset...
PRECISION_BPR: train 0.15, test 0.04.


Calculating RECALL_WARP for train dataset...
Calculating RECALL_WARP for test dataset...
RECALL_WARP: train 0.01, test 0.01.


Calculating RECALL_BPR for train dataset...
Calculating RECALL_BPR for test dataset...
RECALL_BPR: train 0.02, test 0.01.


Calculating RECIPROCAL_WARP for train dataset...
Calculating RECIPROCAL_WARP for test dataset...
RECIPROCAL_WARP: train 0.28, test 0.11.


Calculating RECIPROCAL_BPR for train dataset...
Calculating RECIPROCAL_BPR for test dataset...
RECIPROCAL_BP

Epoch: 100%|██████████| 10/10 [00:02<00:00,  4.30it/s]
Epoch: 100%|██████████| 10/10 [00:02<00:00,  4.58it/s]
Epoch: 100%|██████████| 10/10 [00:03<00:00,  2.71it/s]
Epoch: 100%|██████████| 10/10 [00:02<00:00,  3.38it/s]


Calculating AUC_WARP for train dataset...
Calculating AUC_WARP for test dataset...
AUC_WARP: train 0.94, test 0.93.


Calculating AUC_BPR for train dataset...
Calculating AUC_BPR for test dataset...
AUC_BPR: train 0.92, test 0.91.


Calculating PRECISION_WARP for train dataset...
Calculating PRECISION_WARP for test dataset...
PRECISION_WARP: train 0.14, test 0.04.


Calculating PRECISION_BPR for train dataset...
Calculating PRECISION_BPR for test dataset...
PRECISION_BPR: train 0.14, test 0.03.


Calculating RECALL_WARP for train dataset...
Calculating RECALL_WARP for test dataset...
RECALL_WARP: train 0.01, test 0.01.


Calculating RECALL_BPR for train dataset...
Calculating RECALL_BPR for test dataset...
RECALL_BPR: train 0.02, test 0.01.


Calculating RECIPROCAL_WARP for train dataset...
Calculating RECIPROCAL_WARP for test dataset...
RECIPROCAL_WARP: train 0.30, test 0.11.


Calculating RECIPROCAL_BPR for train dataset...
Calculating RECIPROCAL_BPR for test dataset...
RECIPROCAL_BP

Epoch: 100%|██████████| 10/10 [00:02<00:00,  4.28it/s]
Epoch: 100%|██████████| 10/10 [00:02<00:00,  4.59it/s]
Epoch: 100%|██████████| 10/10 [00:03<00:00,  2.73it/s]
Epoch: 100%|██████████| 10/10 [00:02<00:00,  3.36it/s]


Calculating AUC_WARP for train dataset...
Calculating AUC_WARP for test dataset...
AUC_WARP: train 0.94, test 0.93.


Calculating AUC_BPR for train dataset...
Calculating AUC_BPR for test dataset...
AUC_BPR: train 0.92, test 0.90.


Calculating PRECISION_WARP for train dataset...
Calculating PRECISION_WARP for test dataset...
PRECISION_WARP: train 0.15, test 0.04.


Calculating PRECISION_BPR for train dataset...
Calculating PRECISION_BPR for test dataset...
PRECISION_BPR: train 0.14, test 0.03.


Calculating RECALL_WARP for train dataset...
Calculating RECALL_WARP for test dataset...
RECALL_WARP: train 0.01, test 0.01.


Calculating RECALL_BPR for train dataset...
Calculating RECALL_BPR for test dataset...
RECALL_BPR: train 0.02, test 0.01.


Calculating RECIPROCAL_WARP for train dataset...
Calculating RECIPROCAL_WARP for test dataset...
RECIPROCAL_WARP: train 0.30, test 0.10.


Calculating RECIPROCAL_BPR for train dataset...
Calculating RECIPROCAL_BPR for test dataset...
RECIPROCAL_BP

Epoch: 100%|██████████| 10/10 [00:02<00:00,  4.33it/s]
Epoch: 100%|██████████| 10/10 [00:02<00:00,  4.58it/s]
Epoch: 100%|██████████| 10/10 [00:03<00:00,  2.75it/s]
Epoch: 100%|██████████| 10/10 [00:02<00:00,  3.34it/s]


Calculating AUC_WARP for train dataset...
Calculating AUC_WARP for test dataset...
AUC_WARP: train 0.94, test 0.93.


Calculating AUC_BPR for train dataset...
Calculating AUC_BPR for test dataset...
AUC_BPR: train 0.92, test 0.90.


Calculating PRECISION_WARP for train dataset...
Calculating PRECISION_WARP for test dataset...
PRECISION_WARP: train 0.15, test 0.04.


Calculating PRECISION_BPR for train dataset...
Calculating PRECISION_BPR for test dataset...
PRECISION_BPR: train 0.14, test 0.04.


Calculating RECALL_WARP for train dataset...
Calculating RECALL_WARP for test dataset...
RECALL_WARP: train 0.01, test 0.02.


Calculating RECALL_BPR for train dataset...
Calculating RECALL_BPR for test dataset...
RECALL_BPR: train 0.02, test 0.02.


Calculating RECIPROCAL_WARP for train dataset...
Calculating RECIPROCAL_WARP for test dataset...
RECIPROCAL_WARP: train 0.29, test 0.12.


Calculating RECIPROCAL_BPR for train dataset...
Calculating RECIPROCAL_BPR for test dataset...
RECIPROCAL_BP

Epoch: 100%|██████████| 10/10 [00:02<00:00,  4.30it/s]
Epoch: 100%|██████████| 10/10 [00:02<00:00,  4.45it/s]
Epoch: 100%|██████████| 10/10 [00:03<00:00,  2.69it/s]
Epoch: 100%|██████████| 10/10 [00:02<00:00,  3.38it/s]


Calculating AUC_WARP for train dataset...
Calculating AUC_WARP for test dataset...
AUC_WARP: train 0.94, test 0.93.


Calculating AUC_BPR for train dataset...
Calculating AUC_BPR for test dataset...
AUC_BPR: train 0.92, test 0.90.


Calculating PRECISION_WARP for train dataset...
Calculating PRECISION_WARP for test dataset...
PRECISION_WARP: train 0.15, test 0.04.


Calculating PRECISION_BPR for train dataset...
Calculating PRECISION_BPR for test dataset...
PRECISION_BPR: train 0.15, test 0.04.


Calculating RECALL_WARP for train dataset...
Calculating RECALL_WARP for test dataset...
RECALL_WARP: train 0.02, test 0.01.


Calculating RECALL_BPR for train dataset...
Calculating RECALL_BPR for test dataset...
RECALL_BPR: train 0.02, test 0.01.


Calculating RECIPROCAL_WARP for train dataset...
Calculating RECIPROCAL_WARP for test dataset...
RECIPROCAL_WARP: train 0.29, test 0.12.


Calculating RECIPROCAL_BPR for train dataset...
Calculating RECIPROCAL_BPR for test dataset...
RECIPROCAL_BP

In [15]:
def print_result(result_arr, model_name):
    train_results = [x[0] for x in result_arr]
    test_results = [x[1] for x in result_arr]
    
    print('{name}:\n train mean {train_mean:.2f} ({train_arr})\n test mean {test_mean:.2f} ({test_arr})\n'
          .format(train_mean=statistics.mean(train_results),
                 test_mean=statistics.mean(test_results),
                 train_arr=['%.2f' % x for x in train_results],
                 test_arr=['%.2f' % x for x in test_results],
                 name=model_name))

### Tulostetaan tulokset (min_group_size=10)

In [16]:
print('-----AUC-----')
print_result(WARP_AUC, "WARP")
print_result(BPR_AUC, "BPR")
print_result(WARP_NO_ITEM_AUC, "WARP_NO_ITEM")
print_result(BPR_NO_ITEM_AUC, "BPR_NO_ITEM")

print('\n-----PRECISION-----')
print_result(WARP_PRECISION, "WARP")
print_result(BPR_PRECISION, "BPR")
print_result(WARP_NO_ITEM_PRECISION, "WARP_NO_ITEM")
print_result(BPR_NO_ITEM_PRECISION, "BPR_NO_ITEM")
 
print('\n-----RECALL-----')
print_result(WARP_RECALL, "WARP")
print_result(BPR_RECALL, "BPR")
print_result(WARP_NO_ITEM_RECALL, "WARP_NO_ITEM")
print_result(BPR_NO_ITEM_RECALL, "BPR_NO_ITEM")
    
print('\n-----RECIPROCAL-----')
print_result(WARP_RECIPROCAL, "WARP")
print_result(BPR_RECIPROCAL, "BPR")
print_result(WARP_NO_ITEM_RECIPROCAL, "WARP_NO_ITEM")
print_result(BPR_NO_ITEM_RECIPROCAL, "BPR_NO_ITEM")

-----AUC-----
WARP:
 train mean 0.94 (['0.94'])
 test mean 0.93 (['0.93'])

BPR:
 train mean 0.92 (['0.92'])
 test mean 0.90 (['0.90'])

WARP_NO_ITEM:
 train mean 0.98 (['0.98'])
 test mean 0.92 (['0.92'])

BPR_NO_ITEM:
 train mean 0.92 (['0.92'])
 test mean 0.87 (['0.87'])


-----PRECISION-----
WARP:
 train mean 0.15 (['0.15'])
 test mean 0.04 (['0.04'])

BPR:
 train mean 0.15 (['0.15'])
 test mean 0.04 (['0.04'])

WARP_NO_ITEM:
 train mean 0.29 (['0.29'])
 test mean 0.04 (['0.04'])

BPR_NO_ITEM:
 train mean 0.26 (['0.26'])
 test mean 0.04 (['0.04'])


-----RECALL-----
WARP:
 train mean 0.02 (['0.02'])
 test mean 0.01 (['0.01'])

BPR:
 train mean 0.02 (['0.02'])
 test mean 0.01 (['0.01'])

WARP_NO_ITEM:
 train mean 0.03 (['0.03'])
 test mean 0.02 (['0.02'])

BPR_NO_ITEM:
 train mean 0.02 (['0.02'])
 test mean 0.02 (['0.02'])


-----RECIPROCAL-----
WARP:
 train mean 0.29 (['0.29'])
 test mean 0.11 (['0.11'])

BPR:
 train mean 0.27 (['0.27'])
 test mean 0.11 (['0.11'])

WARP_NO_ITEM:
 t

### Tulostetaan tulokset (min_group_size=50)

In [27]:
print('-----AUC-----')
print_result(WARP_AUC, "WARP")
print_result(BPR_AUC, "BPR")
print_result(WARP_NO_ITEM_AUC, "WARP_NO_ITEM")
print_result(BPR_NO_ITEM_AUC, "BPR_NO_ITEM")

print('\n-----PRECISION-----')
print_result(WARP_PRECISION, "WARP")
print_result(BPR_PRECISION, "BPR")
print_result(WARP_NO_ITEM_PRECISION, "WARP_NO_ITEM")
print_result(BPR_NO_ITEM_PRECISION, "BPR_NO_ITEM")
 
print('\n-----RECALL-----')
print_result(WARP_RECALL, "WARP")
print_result(BPR_RECALL, "BPR")
print_result(WARP_NO_ITEM_RECALL, "WARP_NO_ITEM")
print_result(BPR_NO_ITEM_RECALL, "BPR_NO_ITEM")
    
print('\n-----RECIPROCAL-----')
print_result(WARP_RECIPROCAL, "WARP")
print_result(BPR_RECIPROCAL, "BPR")
print_result(WARP_NO_ITEM_RECIPROCAL, "WARP_NO_ITEM")
print_result(BPR_NO_ITEM_RECIPROCAL, "BPR_NO_ITEM")

-----AUC-----
WARP:
 train mean 0.93 (['0.93', '0.93', '0.93', '0.93', '0.93'])
 test mean 0.93 (['0.93', '0.93', '0.93', '0.93', '0.93'])

BPR:
 train mean 0.91 (['0.91', '0.91', '0.91', '0.91', '0.91'])
 test mean 0.90 (['0.90', '0.90', '0.90', '0.90', '0.90'])

WARP_NO_ITEM:
 train mean 0.98 (['0.98', '0.98', '0.98', '0.98', '0.98'])
 test mean 0.92 (['0.92', '0.92', '0.91', '0.92', '0.92'])

BPR_NO_ITEM:
 train mean 0.92 (['0.92', '0.92', '0.92', '0.93', '0.92'])
 test mean 0.87 (['0.87', '0.87', '0.86', '0.87', '0.86'])


-----PRECISION-----
WARP:
 train mean 0.19 (['0.19', '0.19', '0.20', '0.19', '0.20'])
 test mean 0.05 (['0.05', '0.05', '0.05', '0.05', '0.05'])

BPR:
 train mean 0.19 (['0.19', '0.18', '0.19', '0.19', '0.19'])
 test mean 0.05 (['0.05', '0.04', '0.05', '0.05', '0.05'])

WARP_NO_ITEM:
 train mean 0.38 (['0.39', '0.38', '0.38', '0.39', '0.37'])
 test mean 0.06 (['0.06', '0.05', '0.05', '0.06', '0.06'])

BPR_NO_ITEM:
 train mean 0.36 (['0.37', '0.36', '0.36', '0.37'

## Opit protosta

- AUC antaa uskoa, että kyllä tää jotain saattaa pystyäkin päättelemään
    - Vai kuitenkin ylisovittamista?
- Warp-malli parempi?
    - Etenkin AUC:ssa näkyy, mutta precision ja recall identtiset... -> kenties joku datasetin ominaisuus aiheutti
    - -> Ei voi ehkä vielä pudottaa BPR:ää kokonaan pois
- Item-featurejen poisto jopa nostaa evaluointimetriikoita...
    - Kenties ylisovittamista?
    - Kenties valitut featuret eivät riitä kuvaamaan ryhmien sisäistä koheesiota?
    - Kenties valitut featuret vain sotkee CF-mallia?
    - Kenties ryhmillä ei ole sellaista sisäistä logiikkaa, että sitä pystyisi featurejen (tai edes suosittelujen) avulla kuvaamaan
- Kiinnostaisi myös kokeilla mallin tulokset myös ilman, että eri käyttäjien interaktiot vaikuttaa toisiinsa? Onkohan edes mahdollista?