Splitting Approaches #11

MatthiasKirsch · 2017-10-18T08:20:34Z

Hi,

I'm trying to use BPR with UISplitting. Everytime I execute the settings.conf the log tells me that there haven been 0 items/users splitted. Is this normal?

My procedure: I created two settings.conf files. One is for transforming the testset into binary format and the other one is for the real process with my trainset as trainset and the testset (transformed to binary format) as testset.
I also tried to directly use my testset as non-binary format csv-file together with the trainset but this gives me an error, so I think it might be fine to convert the testset in a first step and then use the output as new testset.

1. create binary format from testset
  Code snippet:

dataset.ratings.lins=/home/[...]/test_carskit_Bundesland.csv
dataset.social.lins=-1
ratings.setup=-threshold -1 -datatransformation 1 -fullstat -1
[...]
evaluation.setup=test-set -f /home/[...]/train_carskit_Bundesland.csv

After this step I extract the converted testset (now: ratings_binary.txt) from the created CARSKit.Workspace folder and put it next to my trainset. Then I deleted the CARSKit.Workspace folder and the debug.log and results.txt file.

1. run normal approach:
  Code snippet:

dataset.ratings.lins=/home/[...]/train_carskit_Bundesland.csv
dataset.social.lins=-1
ratings.setup=-threshold -1 -datatransformation 1 -fullstat -1
recommender= uisplitting -traditional bpr
evaluation.setup=test-set -f /home/[...]/ratings_binary.txt

When I now run this, than it first starts converting the trainset into binary format which is fine. After doing so it starts the UISplit and it tells me 0 items have been splitted and 0 users have been splitted. I don't know if this is okay because the process continues with bpr aftwerwards and doesn't give me an error. But this has finished and I evaluate my results and compare the context splitted results with the one only using BPR the curves seem to be very similar. So I thought it might be that I am doing something wrong here.

Can you help me please :)
Thank you very much!

This is my output:

/**********************************************************************************************************
 *
 * Dataset: /home/[...]/CARSKit.Workspace/ratings_binary.txt
 * 
 * Statistics of U-I-C Matrix:
 * User amount: 508769
 * Item amount: 93689
 * Rate amount: 4118854
 * Context dimensions: 1 (bundesland)
 * Context conditions: 12 (bundesland: 12)
 * Context situations: 11
 * Data density: 0.0007%
 * Scale distribution: [1.0 x 4118854]
 * Average value of all ratings: 1.000000
 * Standard deviation of all ratings: 0.000000
 * Mode of all rating values: 1.000000
 * Median of all rating values: 1.000000
 *
 **********************************************************************************************************/
With Setup: test-set -f /home/jannis/Fileshares/matti/CROSS_VALIDATION/1/Bundesland/ratings_binary.txt
Dataset: ...ION/1/Bundesland/ratings_binary.txt
DataPath: /home/jannis/Fileshares/matti/CROSS_VALIDATION/1/Bundesland/ratings_binary.txt
Rating data set has been successfully loaded.
0 items have been splitted.
0 users have been splitted.
UI Splitting is done... Algorithm 'bpr' will be applied to the transformed data set.
Density of transformed 2D rating matrix ============================== 0.0075777913744593155
Final Results by UISplitting-BPR, Pre1: 0.012440,Pre2: 0.010356,Pre3: 0.008427,Pre4: 0.007442,Pre5: 0.006712,Pre6: 0.006134,Pre7: 0.005674,Pre8: 0.005283,Pre9: 0.004979,Pre10: 0.004729,Pre11: 0.004500,Pre12: 0.004297,Pre13: 0.004124,Pre14: 0.003963,Pre15: 0.003813,Pre16: 0.003681,Pre17: 0.003572,Pre18: 0.003469,Pre19: 0.003379,Pre20: 0.003295, Rec1: 0.005252,Rec2: 0.008719,Rec3: 0.010527,Rec4: 0.012352,Rec5: 0.013880,Rec6: 0.015140,Rec7: 0.016400,Rec8: 0.017554,Rec9: 0.018630, Rec10: 0.019651, Rec11: 0.020667, Rec12: 0.021555, Rec13: 0.022390, Rec14: 0.023233, Rec15: 0.023917, Rec16: 0.024644, Rec17: 0.025447, Rec18: 0.026170, Rec19: 0.026991, Rec20: 0.027826, AUC: 0.531280, MAP: 0.009775, NDCG: 0.017080, MRR: 0.022582, -1.0,10,0.02,-1.0,1.0E-4,1.0E-4,100, Time: '02:11:02','01:02:48'

The text was updated successfully, but these errors were encountered:

MatthiasKirsch · 2017-10-20T06:57:05Z

I created a little dataset with the following data an tried to use itemsplitting. It does not work, maybe the code is broken?

Here the items 1, 2 and 6 are rated in different contexts so the ItemSplitting should divide them but it doesn't. Still "0 items splitted".

irecsys · 2017-10-20T11:12:17Z

What's ur rating, binary one?

On Fri, Oct 20, 2017 at 1:57 AM MatthiasKirsch ***@***.***> wrote: I created a little dataset with the following data an tried to use itemsplitting. It does not work, maybe the code is broken? [image: grafik] <https://user-images.githubusercontent.com/18574614/31808666-7e01b49a-b574-11e7-8813-930743e08de0.png> Here the items 1, 2 and 6 are rated in different contexts so the ItemSplitting should divide them but it doesn't. Still "0 items splitted". — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#11 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHDB5whauMNaCqQD2jntP5tX4F79YP8-ks5suERCgaJpZM4P9VJ5> .

-- Sent from Gmail Mobile

MatthiasKirsch · 2017-10-20T11:36:29Z

My rating has only positive values 1 because it is purchase data (no zeros available). This is why I use BPR because I think this algorithm can handle this kind of data. The BPR works fine but the ItemSplitting in front does not.

irecsys · 2017-10-20T13:16:13Z

Splitting doesn't work if u only have 1 as rarings in ur data

On Fri, Oct 20, 2017 at 6:36 AM MatthiasKirsch ***@***.***> wrote: Yes, my rating has only positive values 1 because it is purchase data. This is why I use BPR because I think this algorithm can handle this kind of data. The BPR works fine but the ItemSplitting in front does not. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#11 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHDB5-P6jXrkCySvCu2go3IJSJoAokHzks5suIW-gaJpZM4P9VJ5> .

-- Sent from Gmail Mobile

hjanh · 2017-10-20T19:47:44Z

Is it possible to change the implementation of Itemsplitting to make it work with positive-only data? For example, using chi2 or entropy instead of a t-test on rating deviation as splitting criteria? Can the BPR-algorithm then also work properly without negative rating? If not, what would you recommend as best-pratice for working with positive-only data? Creating dummy zeros for the very large number of unseen products? I'm concerned that this is impossible with large datasets (e.g. 10.000s of products and 100.000s of users). Thanks for any advice

irecsys · 2017-10-20T20:35:26Z

You can download source codes and modify it by yourself. Or you can assignmen some zero ratings.

…

On Fri, Oct 20, 2017 at 2:47 PM, hankej ***@***.***> wrote: Is it possible to change the implementation of Itemsplitting to make it work with positive-only data? For example, using chi2 instead of a t-test on rating deviation? If not what is the best-pratice for working with positive-only data? Creating dummy zeros for the very large number of unseen products? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#11 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHDB5za66zWMQL1hM9410z_B7ofebCfXks5suPjhgaJpZM4P9VJ5> .

hjanh · 2017-10-20T22:21:23Z

Thanks for your reply. I don't get the point of "inventing" some zero ratings. Wouldn't this distort the results if i start assigning zero values randomly? And if i would assign a "0" to each possible user-item combination, the rating file will not fit into memory.

irecsys · 2017-10-20T22:40:48Z

In this case, it is suggested to get external/other feedback information, such as implicit feedbacks. So that you can distinguish positive and negative ratings. To utilize the splitting based approaches, you should have positive and negative feedbacks, so that they can perform the 'split'

…

On Fri, Oct 20, 2017 at 5:21 PM, Jh ***@***.***> wrote: Thanks for your reply. I don't get the point of "inventing" some zero ratings. Wouldn't this distort the results if i start assigning zero values randomly? And if i would assign a "0" to each possible user-item combination, the rating file will not fit into memory. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#11 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHDB54YOGfUl42tIvADbbNCKtxefOutpks5suRzkgaJpZM4P9VJ5> .

zeyboukli · 2018-07-08T20:35:22Z

please , i interest to splitting approach;
in result.txt i find just :
Final Results by UserSplitting-BiasedMF, Pre5: 0,086689,Pre10: 0,056011, Rec5: 0,221093, Rec10: 0,259981, AUC: 0,661162, MAP: 0,161856, NDCG: 0,214975, MRR: 0,272366, numFactors: 10, numIter: 100, lrate: 0.02, maxlrate: -1.0, regB: 1.0E-4, regU: 1.0E-4, regI: 1.0E-4, regC: 0.001, isBoldDriver: true, Time: '00:00','00:00'
but i would like display the evaluation of this approach by displaying root mean square error (RMSE)
Can you help me please
Thank you

irecsys · 2018-10-18T20:27:05Z

please , i interest to splitting approach;
in result.txt i find just :
Final Results by UserSplitting-BiasedMF, Pre5: 0,086689,Pre10: 0,056011, Rec5: 0,221093, Rec10: 0,259981, AUC: 0,661162, MAP: 0,161856, NDCG: 0,214975, MRR: 0,272366, numFactors: 10, numIter: 100, lrate: 0.02, maxlrate: -1.0, regB: 1.0E-4, regU: 1.0E-4, regI: 1.0E-4, regC: 0.001, isBoldDriver: true, Time: '00:00','00:00'
but i would like display the evaluation of this approach by displaying root mean square error (RMSE)
Can you help me please
Thank you

In the configuration file, there is one option:
item.ranking=on -topN 10
If you set it as "off", you will get results of the rating predictions

irecsys closed this as completed Oct 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Splitting Approaches #11

Splitting Approaches #11

MatthiasKirsch commented Oct 18, 2017 •

edited

Loading

MatthiasKirsch commented Oct 20, 2017

irecsys commented Oct 20, 2017 via email

MatthiasKirsch commented Oct 20, 2017 •

edited

Loading

irecsys commented Oct 20, 2017 via email

hjanh commented Oct 20, 2017 •

edited

Loading

irecsys commented Oct 20, 2017 via email

hjanh commented Oct 20, 2017

irecsys commented Oct 20, 2017 via email

zeyboukli commented Jul 8, 2018

irecsys commented Oct 18, 2018

Splitting Approaches #11

Splitting Approaches #11

Comments

MatthiasKirsch commented Oct 18, 2017 • edited Loading

MatthiasKirsch commented Oct 20, 2017

irecsys commented Oct 20, 2017 via email

MatthiasKirsch commented Oct 20, 2017 • edited Loading

irecsys commented Oct 20, 2017 via email

hjanh commented Oct 20, 2017 • edited Loading

irecsys commented Oct 20, 2017 via email

hjanh commented Oct 20, 2017

irecsys commented Oct 20, 2017 via email

zeyboukli commented Jul 8, 2018

irecsys commented Oct 18, 2018

MatthiasKirsch commented Oct 18, 2017 •

edited

Loading

MatthiasKirsch commented Oct 20, 2017 •

edited

Loading

hjanh commented Oct 20, 2017 •

edited

Loading