Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verbose Optimizer Output #58

Closed
felixmaximilian opened this issue Jul 8, 2016 · 5 comments
Closed

Verbose Optimizer Output #58

felixmaximilian opened this issue Jul 8, 2016 · 5 comments

Comments

@felixmaximilian
Copy link

Is there any possibility to see the details of the optimization process during the optimization or as a summary?
I am currently having trouble with a dying kernel when I increase the n_iter and its impossible to find the cause without any debug output.

@felixmaximilian
Copy link
Author

I managed to get one more line out of it by using the pyspark console instead of the notebook:
a segmentation fault which you can see in the very last line.

Welcome

 to                 

      ____              __                                                                                                         
     / __/__  ___ _____/ /__                                                                                                       
    _\ \/ _ \/ _ `/ __/  '_/                                                                                                       
   /__ / .__/\_,_/_/ /_/\_\   version 1.6.1                                                                                        
      /_/                                                                                                                          

Using Python version 2.7.10 (default, Dec  8 2015 18:25:23)                                                                        
SparkContext available as sc, HiveContext available as sqlContext.                                                                 
>>> import cPickle as pickle                                                                                                       
>>> from scipy import io,sparse                                                                                                    
>>> preferencesLocalArray = pickle.load(open("preferences.pickle","rb"))                                                           

>>> features = io.mmread(open("sparse_features.mmw","rb"))                                                                         
>>> features = features.tocsc()                                                                                                    
>>> #shuffle pairwise preferences for train and test split                                                                         
... import numpy as np                                                                                                             
>>> np.random.seed(123L)                                                                                                           
>>> random_indices = np.random.randint(preferencesLocalArray.shape[0], size = preferencesLocalArray.shape[0])                      
>>> preferencesLocalArrayShuffled = np.array(preferencesLocalArray[random_indices])                                                
>>>                                                                                                                                
>>> train_percentage = 95                                                                                                          
>>> trainIdx = range(int(preferencesLocalArray.shape[0]/100.0*train_percentage))                                                   
>>> testIdx = range(int(preferencesLocalArray.shape[0]/100.0*train_percentage),preferencesLocalArray.shape[0])                     
>>> posExamples = preferencesLocalArrayShuffled[testIdx,0]                                                                         
>>> negExamples = preferencesLocalArrayShuffled[testIdx,1]                                                                         
>>> from fastFM import bpr                                                                                                         
>>> import numpy as np                                                                                                             
>>>                                                                                                                                
>>> fm = bpr.FMRecommender(n_iter=70000,init_stdev=0.01,l2_reg_w=.2,l2_reg_V=1.,step_size=.1,rank=100, random_state=11)            
>>>                                                                                                                                
>>> fm.fit(features,preferencesLocalArrayShuffled)                                                                                 
Segmentation fault 

@ibayer
Copy link
Owner

ibayer commented Jul 10, 2016

@felixmaximilian
Looks like the error occurs in the solver which is implemented as a C extension using Cython.
The are basically two ways to debug this.

  1. Use the Cython debugger http://docs.cython.org/src/userguide/debugging.html
  2. Run your data from the C cli https://github.com/ibayer/fastFM-core with gdb.

I usually go with the second option. Unfortunately, the BPR SGD implementation is very sensitive to the hyperparameter settings especially to step_size. Bad settings can lead to vanishing / exploding gradients that crash fastFM. This should definitely be improved at some point.

@felixmaximilian
Copy link
Author

Thanks for your assistance. What would be your suggestion to improve the BPR SGD? Adam like dynamic step_size?

@felixmaximilian
Copy link
Author

Thanks to your help I found the bug within my own code. Actually the solver doesn't like duplicate training samples, which isn`t surprising. The output of the gdb was very helpful here!

@ibayer
Copy link
Owner

ibayer commented Jul 12, 2016

@felixmaximilian
Glad to hear that you fixed your problem.

solver doesn't like duplicate training samples

Can you expand on this?
Could be good to add a check for this on the python side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants