Factorization Machines Query #29

ChandraLingam · 2018-05-24T11:58:06Z

I have a very basic query; Is factorization machine designed to work only with binary fields? Do we need to one hot encode all features? How are real-valued featured handled?

Thank you!

chihming · 2018-05-24T12:32:39Z

You can find the data format from Section 2 of libFM 1.4.2 manual.

ChandraLingam · 2018-05-24T19:13:46Z

Thank you. yes, I did review the manual and was attempting to use the perl script for csv to libfm conversion

I created a small csv file using 16 rows from movielens ratings dataset and the script produced ratings_small.csv.libfm. Output does not seem to match the input (or at-least I not able to interpret what the script did)

triple_format_to_libfm.pl -in ratings_small.csv -target 2 -delete_column 3 -separator ","

transforming file ratings_small.csv to ratings_small.csv.libfm...

userId,movieId,rating,timestamp
1,31,2.5,1260759144
2,10,4.0,835355493
2,17,5.0,835355681
2,39,5.0,835355604
2,47,4.0,835355552
2,50,4.0,835355586
2,52,3.0,835356031
2,62,3.0,835355749
2,110,4.0,835355532
2,144,3.0,835356016
2,150,5.0,835355395
3,60,3.0,1298861675
3,110,4.0,1298922049
3,247,3.5,1298861637
3,267,3.0,1298861761
3,7153,2.5,1298921787

rating 0:1 1:1
2.5 2:1 3:1
4.0 4:1 5:1
5.0 4:1 6:1
5.0 4:1 7:1
4.0 4:1 8:1
4.0 4:1 9:1
3.0 4:1 10:1
3.0 4:1 11:1
4.0 4:1 12:1
3.0 4:1 13:1
5.0 4:1 14:1
3.0 15:1 16:1
4.0 15:1 12:1
3.5 15:1 17:1
3.0 15:1 18:1
2.5 15:1 19:1

chihming · 2018-05-25T02:13:20Z

Please remove the first line in ratings_small.csv, and use the same command. You will get

2.5 0:1 1:1
4.0 2:1 3:1
5.0 2:1 4:1
5.0 2:1 5:1
4.0 2:1 6:1
4.0 2:1 7:1
3.0 2:1 8:1
3.0 2:1 9:1
4.0 2:1 10:1
3.0 2:1 11:1
5.0 2:1 12:1
3.0 13:1 14:1
4.0 13:1 10:1
3.5 13:1 15:1
3.0 13:1 16:1
2.5 13:1 17:1

In this case,
the feature index 0 represents userId 1,
the feature index 1 represents movieId 31,
the feature index 2 represents userId 2,
the feature index 3 represents movieId 10,
and so on.

ChandraLingam · 2018-05-25T09:24:24Z

Thank you very much. One more follow up question. Does this script also handle real valued features?
I added another feature at the end with random values. It appears that the script is doing a one hot encoding of this column as-well. Is there a way to preserve the real-valued features as-is?

1,31,2.5,1260759144,0.074345836
2,31,4,835355493,0.428518244
2,10,4,835355493,0.144215787
2,17,5,835355681,0.018740053
2,39,5,835355604,0.793609723
2,47,4,835355552,0.62908026
2,50,4,835355586,0.923838115
2,52,3,835356031,0.920521599
2,62,3,835355749,0.549236466
2,110,4,835355532,0.648895353
2,144,3,835356016,0.697152954
2,150,5,835355395,0.752723242
3,60,3,1298861675,0.803889224
3,110,4,1298922049,0.815850633
3,150,4,835355493,0.08505613
3,247,3.5,1298861637,0.268696775
3,267,3,1298861761,0.235652997
3,7153,2.5,1298921787,0.433312402

Output

2.5 0:1 1:1 2:1
4 3:1 1:1 4:1
4 3:1 5:1 6:1
5 3:1 7:1 8:1
5 3:1 9:1 10:1
4 3:1 11:1 12:1
4 3:1 13:1 14:1
3 3:1 15:1 16:1
3 3:1 17:1 18:1
4 3:1 19:1 20:1
3 3:1 21:1 22:1
5 3:1 23:1 24:1
3 25:1 26:1 27:1
4 25:1 19:1 28:1
4 25:1 23:1 29:1
3.5 25:1 30:1 31:1
3 25:1 32:1 33:1
2.5 25:1 34:1 35:1

chihming · 2018-05-25T09:35:15Z

I guess it doesn't support the real-valued features, so it will be better you write down your own transformation tool.

If you have no idea how to handle it. Maybe you can try this python code:
https://github.com/chihming/DataTransformer
and the instructions about how to convert the data to your required format:
https://github.com/chihming/DataTransformer/wiki/data2sparse
***Note that this project has been abandoned, but it still can meet your requirement.

ChandraLingam · 2018-05-25T09:58:08Z

Thank you for the prompt response/clarification. Appreciate it. I will close this issue for now

ChandraLingam closed this as completed May 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Factorization Machines Query #29

Factorization Machines Query #29

ChandraLingam commented May 24, 2018

chihming commented May 24, 2018

ChandraLingam commented May 24, 2018

chihming commented May 25, 2018

ChandraLingam commented May 25, 2018

chihming commented May 25, 2018

ChandraLingam commented May 25, 2018

Factorization Machines Query #29

Factorization Machines Query #29

Comments

ChandraLingam commented May 24, 2018

chihming commented May 24, 2018

ChandraLingam commented May 24, 2018

chihming commented May 25, 2018

ChandraLingam commented May 25, 2018

chihming commented May 25, 2018

ChandraLingam commented May 25, 2018