Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it suitable for a big amount of data? What is mean_max_target? #2

Open
A-khedri opened this issue Jul 30, 2019 · 2 comments
Open

Comments

@A-khedri
Copy link

A-khedri commented Jul 30, 2019

Hi, Thank you for providing the paper and the code,

I wanted to ask if you recommend using this approach for a big amount of data, let's say number of features around 2000. I tried the code with such a number of features, setting k=600 as selected features in input, however the code was taking too much time, and it was doubling the epoch time every time since it is not converging apparently. I wanted to ask approximately does this approach usually takes time ? also I wanted to ask about the convergence, what I understood from the code is that the selection will be done once we reach a 'mean_max_target=0.998' otherwise it will keep doubling the epoch time till it reaches that value, right? However I set the tryout limit to only 4 and epoch number to 50, the training stopped therefore after 400 epochs, however the selected features were redundant, i.e I got an array of indexes with for example [ 305, 310, 822, 310, 310, 310, 310, 305, 305, 310, 310, 310, 310, 305, 305, 310, 305, 310, 305, 305, 310, 305, 310, 310, 310, 771, …. ]
The size was indeed 600 features though. is there any idea how to set the tryout limit and the epoch num in case of this amount of features, and what does mean_max_target mean?

@A-khedri A-khedri changed the title Is it suitable for a big amount of data Is it suitable for a big amount of data? What is mean_max_target? Aug 5, 2019
@mfbalin
Copy link
Owner

mfbalin commented Aug 25, 2019

Yeah, it will take a lot more time for large data. You have to increase the epoch number until the algorithm converges. In our paper, we had a dataset with 100000 points and 10000 features. In order for the algorithm to converge, we had to set the epoch number around 5000 for a decoder with no hidden layers.

@mfbalin
Copy link
Owner

mfbalin commented Aug 25, 2019

Mean_max_target is used for the following: the nodes in the concrete select layer select features with some probability according to the current weights. We want those weights to become one-hot so that the node selects only a single feature each time. Mean_max_target is a target value that makes sure that the feature selection is finished.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants