Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Splitting Approaches #11

Closed
MatthiasKirsch opened this issue Oct 18, 2017 · 10 comments
Closed

Splitting Approaches #11

MatthiasKirsch opened this issue Oct 18, 2017 · 10 comments

Comments

@MatthiasKirsch
Copy link

MatthiasKirsch commented Oct 18, 2017

Hi,

I'm trying to use BPR with UISplitting. Everytime I execute the settings.conf the log tells me that there haven been 0 items/users splitted. Is this normal?

My procedure: I created two settings.conf files. One is for transforming the testset into binary format and the other one is for the real process with my trainset as trainset and the testset (transformed to binary format) as testset.
I also tried to directly use my testset as non-binary format csv-file together with the trainset but this gives me an error, so I think it might be fine to convert the testset in a first step and then use the output as new testset.

    1. create binary format from testset
      Code snippet:
dataset.ratings.lins=/home/[...]/test_carskit_Bundesland.csv
dataset.social.lins=-1
ratings.setup=-threshold -1 -datatransformation 1 -fullstat -1
[...]
evaluation.setup=test-set -f /home/[...]/train_carskit_Bundesland.csv

After this step I extract the converted testset (now: ratings_binary.txt) from the created CARSKit.Workspace folder and put it next to my trainset. Then I deleted the CARSKit.Workspace folder and the debug.log and results.txt file.

    1. run normal approach:
      Code snippet:
dataset.ratings.lins=/home/[...]/train_carskit_Bundesland.csv
dataset.social.lins=-1
ratings.setup=-threshold -1 -datatransformation 1 -fullstat -1
recommender= uisplitting -traditional bpr
evaluation.setup=test-set -f /home/[...]/ratings_binary.txt

When I now run this, than it first starts converting the trainset into binary format which is fine. After doing so it starts the UISplit and it tells me 0 items have been splitted and 0 users have been splitted. I don't know if this is okay because the process continues with bpr aftwerwards and doesn't give me an error. But this has finished and I evaluate my results and compare the context splitted results with the one only using BPR the curves seem to be very similar. So I thought it might be that I am doing something wrong here.

Can you help me please :)
Thank you very much!

This is my output:

/**********************************************************************************************************
 *
 * Dataset: /home/[...]/CARSKit.Workspace/ratings_binary.txt
 * 
 * Statistics of U-I-C Matrix:
 * User amount: 508769
 * Item amount: 93689
 * Rate amount: 4118854
 * Context dimensions: 1 (bundesland)
 * Context conditions: 12 (bundesland: 12)
 * Context situations: 11
 * Data density: 0.0007%
 * Scale distribution: [1.0 x 4118854]
 * Average value of all ratings: 1.000000
 * Standard deviation of all ratings: 0.000000
 * Mode of all rating values: 1.000000
 * Median of all rating values: 1.000000
 *
 **********************************************************************************************************/
With Setup: test-set -f /home/jannis/Fileshares/matti/CROSS_VALIDATION/1/Bundesland/ratings_binary.txt
Dataset: ...ION/1/Bundesland/ratings_binary.txt
DataPath: /home/jannis/Fileshares/matti/CROSS_VALIDATION/1/Bundesland/ratings_binary.txt
Rating data set has been successfully loaded.
0 items have been splitted.
0 users have been splitted.
UI Splitting is done... Algorithm 'bpr' will be applied to the transformed data set.
Density of transformed 2D rating matrix ============================== 0.0075777913744593155
Final Results by UISplitting-BPR, Pre1: 0.012440,Pre2: 0.010356,Pre3: 0.008427,Pre4: 0.007442,Pre5: 0.006712,Pre6: 0.006134,Pre7: 0.005674,Pre8: 0.005283,Pre9: 0.004979,Pre10: 0.004729,Pre11: 0.004500,Pre12: 0.004297,Pre13: 0.004124,Pre14: 0.003963,Pre15: 0.003813,Pre16: 0.003681,Pre17: 0.003572,Pre18: 0.003469,Pre19: 0.003379,Pre20: 0.003295, Rec1: 0.005252,Rec2: 0.008719,Rec3: 0.010527,Rec4: 0.012352,Rec5: 0.013880,Rec6: 0.015140,Rec7: 0.016400,Rec8: 0.017554,Rec9: 0.018630, Rec10: 0.019651, Rec11: 0.020667, Rec12: 0.021555, Rec13: 0.022390, Rec14: 0.023233, Rec15: 0.023917, Rec16: 0.024644, Rec17: 0.025447, Rec18: 0.026170, Rec19: 0.026991, Rec20: 0.027826, AUC: 0.531280, MAP: 0.009775, NDCG: 0.017080, MRR: 0.022582, -1.0,10,0.02,-1.0,1.0E-4,1.0E-4,100, Time: '02:11:02','01:02:48'
@MatthiasKirsch
Copy link
Author

I created a little dataset with the following data an tried to use itemsplitting. It does not work, maybe the code is broken?

grafik

Here the items 1, 2 and 6 are rated in different contexts so the ItemSplitting should divide them but it doesn't. Still "0 items splitted".

@irecsys
Copy link
Owner

irecsys commented Oct 20, 2017 via email

@MatthiasKirsch
Copy link
Author

MatthiasKirsch commented Oct 20, 2017

My rating has only positive values 1 because it is purchase data (no zeros available). This is why I use BPR because I think this algorithm can handle this kind of data. The BPR works fine but the ItemSplitting in front does not.

@irecsys
Copy link
Owner

irecsys commented Oct 20, 2017 via email

@hjanh
Copy link

hjanh commented Oct 20, 2017

Is it possible to change the implementation of Itemsplitting to make it work with positive-only data? For example, using chi2 or entropy instead of a t-test on rating deviation as splitting criteria? Can the BPR-algorithm then also work properly without negative rating? If not, what would you recommend as best-pratice for working with positive-only data? Creating dummy zeros for the very large number of unseen products? I'm concerned that this is impossible with large datasets (e.g. 10.000s of products and 100.000s of users). Thanks for any advice

@irecsys
Copy link
Owner

irecsys commented Oct 20, 2017 via email

@hjanh
Copy link

hjanh commented Oct 20, 2017

Thanks for your reply. I don't get the point of "inventing" some zero ratings. Wouldn't this distort the results if i start assigning zero values randomly? And if i would assign a "0" to each possible user-item combination, the rating file will not fit into memory.

@irecsys
Copy link
Owner

irecsys commented Oct 20, 2017 via email

@zeyboukli
Copy link

please , i interest to splitting approach;
in result.txt i find just :
Final Results by UserSplitting-BiasedMF, Pre5: 0,086689,Pre10: 0,056011, Rec5: 0,221093, Rec10: 0,259981, AUC: 0,661162, MAP: 0,161856, NDCG: 0,214975, MRR: 0,272366, numFactors: 10, numIter: 100, lrate: 0.02, maxlrate: -1.0, regB: 1.0E-4, regU: 1.0E-4, regI: 1.0E-4, regC: 0.001, isBoldDriver: true, Time: '00:00','00:00'
but i would like display the evaluation of this approach by displaying root mean square error (RMSE)
Can you help me please
Thank you

@irecsys
Copy link
Owner

irecsys commented Oct 18, 2018

please , i interest to splitting approach;
in result.txt i find just :
Final Results by UserSplitting-BiasedMF, Pre5: 0,086689,Pre10: 0,056011, Rec5: 0,221093, Rec10: 0,259981, AUC: 0,661162, MAP: 0,161856, NDCG: 0,214975, MRR: 0,272366, numFactors: 10, numIter: 100, lrate: 0.02, maxlrate: -1.0, regB: 1.0E-4, regU: 1.0E-4, regI: 1.0E-4, regC: 0.001, isBoldDriver: true, Time: '00:00','00:00'
but i would like display the evaluation of this approach by displaying root mean square error (RMSE)
Can you help me please
Thank you

In the configuration file, there is one option:
item.ranking=on -topN 10
If you set it as "off", you will get results of the rating predictions

@irecsys irecsys closed this as completed Oct 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants