Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building datasets #318

Closed
ghost opened this issue Jun 22, 2018 · 16 comments
Closed

Building datasets #318

ghost opened this issue Jun 22, 2018 · 16 comments

Comments

@ghost
Copy link

ghost commented Jun 22, 2018

Hello !

Thank you for this open source package, it help a lot and your work is amazing.

I just a have a silly question about dataset construction. I followed the example for my data:
user (160.000 x 300) and item (4000 x 4).

dataset = Dataset()
dataset.fit(users=(x['id_user'] for x in user),
            items=(x['id_item'] for x in item),
            user_features=((x['id_user'], [[x[col] for col in list_columns_user]]) for x in user),
            item_features=((x['id_item'], [[x[col] for col in list_columns_item]]) for x in item))

But when I try dataset.user_features_shape() I get (160000, 160000). shouldn't I rather have this (160000, 300) ?

Indeed, we can read in the documentation :

Returns
-------
(num user ids, num user features): tuple of ints

and my num user features is 300. So there is an error in what I did?

Sorry for the stupid question!

@maciejkula
Copy link
Collaborator

This is the expected result. By default, LightFM adds a feature per every user and item. You can disable that in the constructor.

@maciejkula
Copy link
Collaborator

(Well, you should get 160000 x 1600300 or something like that. Are your feature names the same as some of your user ids?)

@ghost
Copy link
Author

ghost commented Jun 22, 2018

Oh ok.

No, none of my features names are the same as user ids.

@maciejkula
Copy link
Collaborator

maciejkula commented Jun 22, 2018

What data are you passing as user_features and item_features? It should just be a list (or other iterable) of the names of user/item features.

(So if you have 300 user features it should be 300 long).

@ghost
Copy link
Author

ghost commented Jun 22, 2018

I misunderstood this line in the example : dataset.fit_partial(items=(x['ISBN'] for x in get_book_features()), item_features=(x['Book-Author'] for x in get_book_features()))

I was trying to do the same thing, obviously the wrong way.

by modifying using just the name, I have the right result. Thank you !

@ghost ghost closed this as completed Jun 22, 2018
@ghost
Copy link
Author

ghost commented Jun 22, 2018

Sorry to re-open this, but after that I continued by building the users/items features (always following the example and the documentation):

# Creat user matrix
user_features = dataset.build_user_features(((x['id_user'],list_columns_user) for x in user), True)
print(repr(user_features))

Like in the documentation :

data (iterable of the form) – (user id, [list of feature names])

x[id_user] being my user id and list_columns_user being my list of features names. But when I visualize one row of the user_features, I only get 0 everywhere except in the index of that row. in other term, user_features is just the identity matrix.

Example:

user_features[1, :].todense()
Out[31]: matrix([[0., 1., 0., ..., 0., 0., 0.]], dtype=float32)

Is it the excepted result ? If yes, I think I don't really understand how the user features matrix is build and how it's different from the collaborative filtering.

@ghost ghost reopened this Jun 22, 2018
@maciejkula
Copy link
Collaborator

You need to pass an iterable of tuples of (id, [list of features for that id]) into build_features. It looks like at the moment you're passing the same features for every user?

@ghost
Copy link
Author

ghost commented Jun 22, 2018

At the moment I pass the user features name in fact, that what I read in the doc.
I tried with the actual list of features for each id, like that:

user_features = dataset.build_user_features(((x['id_user'], x[list_columns_user]) for x in user), True)
(user is a csv.DictReader like in your example)

but it did change a thing :(

@maciejkula
Copy link
Collaborator

maciejkula commented Jun 22, 2018

Are you passing features for the second user? Is the resulting matrix an identity matrix? Can you post a short gist that reproduces this?

It may be useful to print some of the elements in your iterator and make sure that they are what you think they are. Is x[list_columns_user] an iterable of things?

@maciejkula
Copy link
Collaborator

(If you think the docs are unclear on this point please make a PR with improvements.)

@maciejkula
Copy link
Collaborator

maciejkula commented Jun 22, 2018

One more pointer: if you are using generators, you can only iterate over a generator once: subsequent iterations will yield zero elements. Maybe you are creating one csv reader, using it for fit, then trying to use it again for build_features?

@ghost
Copy link
Author

ghost commented Jun 25, 2018

Sorry for late answer, I no longer had access to the data.
you nailed it! It was the generator problem, I didn't know that we can't iterate over it plus then once. I don't really know how to deal with this type. So I redid all the treatment with pandas (I know it better), and I think it works!
I build a new user_features and it is a diagonal matrix with 0.0454 everywhere on the diagonal. Is the fact that it's the same value across the diagonal is excepted?

I didn't normalized my values, just used the parameter normalize=True while building user/item features (with build_item_features and build_user_features) may be this can lead to a mistake ?

@maciejkula
Copy link
Collaborator

If all users have the same number of features the value on the diagonal will be the same for all of them.

@ghost
Copy link
Author

ghost commented Jun 25, 2018

Got it. thanks a lot, I really really appreciated your help.

@ghost ghost closed this as completed Jun 25, 2018
@kewlcoder
Copy link

kewlcoder commented Oct 3, 2018

This is the expected result. By default, LightFM adds a feature per every user and item. You can disable that in the constructor.

Hi Maciej,
I couldn't find anything in the constructor that can disable the addition of a feature per every user & item. Can you please help.
Documentation link - [https://lyst.github.io/lightfm/docs/lightfm.html]

class lightfm.LightFM(no_components=10, k=5, n=10, learning_schedule=’adagrad’, loss=’logistic’, learning_rate=0.05, rho=0.95, epsilon=1e-06, item_alpha=0.0, user_alpha=0.0, max_sampled=10, random_state=None)

@igorkf
Copy link

igorkf commented Nov 15, 2020

It's possible to pass more than one user feature for each user?
Like:

[
  (0, {'category': 'horror', 'sex': 'male'}),
  (1, {'category': 'romantic', 'sex': 'female'}),
   ...
]

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants