Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot import name 'RGFClassifier' #7

Closed
JoshuaC3 opened this issue Oct 12, 2016 · 15 comments
Closed

Cannot import name 'RGFClassifier' #7

JoshuaC3 opened this issue Oct 12, 2016 · 15 comments

Comments

@JoshuaC3
Copy link

JoshuaC3 commented Oct 12, 2016

I am having the above error. I have made rgf1.2 and have tested using rgf1.2's own perl test script. This works. I have installed rgf_python and run the python setup as specified. I have changed the two folder locations to rgf1.2..\rgf executable and a temp folder that exist.

In python when I try to import I get the error Cannot import name 'RGFClassifier'. I tried to run the exact code in the test.py script provided in with rgf_python and this same error occurs.

Strangely, I have /usr/local/lib/python3.5/dist-packages/rgf_sklearn-0.0.0-py3.5.egg/rgf in my path when I do run

import sys
sys.path

in python. I also in /usr/local/lib/python3.5/dist-packages I only have the rfg-sklearn-0.0.0-py3.5.egg and no rgf-sklearn as I would expect as the following appeared towards the end of the setup.py,

Extracting rgf_sklearn-0.0.0-py3.5.egg to /usr/local/lib/python3.5/dist-packages
Adding rgf-sklearn 0.0.0 to easy-install.pth file
@fukatani fukatani reopened this Oct 12, 2016
@fukatani
Copy link
Member

Thank you for your report!

Unfortunately, rgf is directory name as well as module name.
So we should use rgf.rgf instead of rgf.

But in old version, test.py placed the same directory with rgf.py.
So test.py can import RGFClassifier by from rgf import RGFClassifier.

I transfer test.py to other directory, and fixed import sentence.
Could you try from rgf.rgf import RGFClassifier as now version test.py?

@JoshuaC3
Copy link
Author

JoshuaC3 commented Oct 13, 2016

So I am getting some unusual behaviour now. In a Jupyter notebook from rgf.rgf import RGFClassifier seems to import the RGFClassifier class correctly. The test script ran until I got this error,

ERROR: /run/user/1000/jupyter/kernel-74918c1f-c829-4d4d-9d5b-0251eaa36408 (unittest.loader._FailedTest)
----------------------------------------------------------------------
AttributeError: module '__main__' has no attribute '/run/user/1000/jupyter/kernel-74918c1f-c829-4d4d-9d5b-0251eaa36408'

----------------------------------------------------------------------
Ran 1 test in 0.001s

FAILED (errors=1)
An exception has occurred, use %tb to see the full traceback.

SystemExit: True


---------------------------------------------------------------------------
SystemExit                                Traceback (most recent call last)
/usr/local/lib/python3.5/dist-packages/IPython/core/interactiveshell.py in run_code(self, code_obj, result)
   2868                 #rprint('Running code', repr(code_obj)) # dbg
-> 2869                 exec(code_obj, self.user_global_ns, self.user_ns)
   2870             finally:

<ipython-input-3-35dda55e8cd1> in <module>()
     79 if __name__ == '__main__':
---> 80     unittest.main()

Maybe this is a separate issue to be raised. This looked like a Jupyter notebook error so back to importing.

I tried to import rgf.rgf with python from the terminal using the following,

josh@josh-HP-ZBook-17-G2:~/rgf_python/rgf$ python3 test.py
Traceback (most recent call last):
  File "test.py", line 11, in <module>
    from rgf.rgf import RGFClassifier, RGFRegressor
ImportError: No module named 'rgf.rgf'; 'rgf' is not a package

I then tried to run this from a python shell and got pretty much the same issue,

>>> from rgf.rgf import RGFClassifier
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named 'rgf.rgf'; 'rgf' is not a package

I am very confused but please ask for any extra information that might be of use

@fukatani
Copy link
Member

fukatani commented Oct 13, 2016

Could you try to copy & paste test.py to other directory and execute it?

In terminal, if test.py and rgf.py locate in the same directory,
test.py try to import rgf.py directly.
Not directory but module.
This is caused by python package importation priority rule.

In newest rgf_python, I moved test.py to test/test.py.
Sorry for your inconvinience.

@fukatani
Copy link
Member

In Jupyter notebook, unittest doesn't work as in terminal.

Please see:
http://stackoverflow.com/questions/37895781/unable-to-run-unittests-main-function-in-ipython-jupyter-notebook

I think this is not rgf_python problem.

@JoshuaC3
Copy link
Author

I have copy and pasted this into a different directory. I ran it and the package imported as expected. Thank you for your help.
However, on running the test another issue as arisen. There are 4 errors in the test.py. They are all the same as this one in that they have the IndexError: list index out of range but just for the 4 different model types; classifier, bin, regression and softmax. Here is regression,

ERROR: test_regressor (__main__.TestRGFClassfier)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test.py", line 71, in test_regressor
    y_pred = reg.predict(X_test)
  File "/usr/local/lib/python3.5/dist-packages/rgf_sklearn-0.0.0-py3.5.egg/rgf/rgf.py", line 463, in predict
    latest_model_loc = sorted(glob(model_glob), reverse=True)[0]
IndexError: list index out of range

Many thanks for you help and for creating this python wrapper!!

Yes, I think the Jupyter notebook issue is not a rgf_python issue.

@fukatani
Copy link
Member

fukatani commented Oct 13, 2016

You're welcome! 😃

Your error is caused by failing to load model learning result.
(And I may have to make error message more informative.)

Could you list up all file names under loc_temp?
I think they don't contain model learning result file.

If you can't, the reason is ...

  1. Failed to call rgf exec file.
  2. Failed to save model learning result under loc_temp.
    e.g. indicated to directory writable by superuser only.
  3. Other error related with rgf exec file.

From my experience, if we can pass this error, rgf becomes available.

@JoshuaC3
Copy link
Author

JoshuaC3 commented Oct 14, 2016

No, there is no model file. Here is what I have in there,

~/Documents/Python Scripts/temp$ dir
test.data.x  train.data.x  train.data.y

How is best to check for each of the following possible errors?

Many thanks

@fukatani
Copy link
Member

Thanks!
Good, then you succeeded to write data file, but not model learning result.
So 2. was not occurred.

Now, I suspected that 1. was occurred.

Could you run following example?
https://github.com/fukatani/rgf_python/blob/master/example/cross_validation_for_iris.py

And please paste console output.It is informative.

CAUTION! I updated the example today, so please use the newest version.
You only have to update your local repository, or copy & paste example script any directory.

Beggining of output in my envirionment is here.

"train": 
   algorithm=RGF_Sib
   train_x_fn=temp/train.data.x
   train_y_fn=temp/train.data.y
   Log:ON
   model_fn_prefix=temp/model_c0
--------------------
Sat Oct 15 23:19:14 2016: Reading training data ... 
Sat Oct 15 23:19:14 2016: Start ... #train=99
--------------------
Forest-level: 
   loss=Log
   max_leaf_forest=400
   max_tree=200
   opt_interval=100
   test_interval=100
   num_tree_search=1
   Verbose:ON
   memory_policy=Generous
Turning on Force_to_refresh_all
-------------
Training data: 4x99, nonzero_ratio=1; managed as dense data.

@JoshuaC3
Copy link
Author

JoshuaC3 commented Oct 16, 2016

I copied the new cross_validation_for_iris.py and saved it as new_iris.py in my Documents directory. I ran it from there and got this,

/usr/local/lib/python3.5/dist-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
b'
Usage: /home/josh/rgf1.2/bin/rgf  train  parameters
   "train"         Train and save models to files.
   parameters:     keyword-value pairs (e.g., "algorithm=RGF") and options
                   (e.g., "NormalizeTarget") delimited by "," described below.
Example parameters:
       algorithm=RGF,train_x_fn=data.x,train_y_fn=data.y,reg_L2=0.1,...
   Below, "*" indicates the required parameters that cannot be omitted.
   [ Parameters for "train" ]
   "algorithm="         RGF|RGF_Opt|RGF_Sib (Default:RGF)
 * "train_x_fn="        Path to the feature file of training data.
 * "train_y_fn="        Path to the target file of training data.
 * "model_fn_prefix="   To save models, path names are generated by attaching
                        "-01", "-02",... to this value.
   To optionally specify the weights of individual data points:
   "train_w_fn="        Path to the file of user-defined weights assigned to
                        training data points.
   To optionally do warm-start with an existing model:
   "model_fn_for_warmstart="
                        Path to the input model file from which training
                        should do warm-start.

   [ Parameters for RGF (default algorithm) ]
 * "reg_L2="            lambda.  Regularization coefficient.
   "reg_sL2="           For node search, override lambda with this value.
   "loss="              Loss function (Default:LS)
                        LS|Log|Expo
                        LS: Square loss (p-y)^2/2
                        Log: Log loss log(1+exp(-py)) for y=1,-1
                        Expo: Exponential loss exp(-py) for y=1,-1
   "max_leaf_forest="   Stop training when the number of leaf nodes in the
                        forest reaches this number. (Default:10000)
   "opt_interval="      Weight optimization interval in terms of #leaf.
                        (Default:100)
   "test_interval="     Approximate test interval in terms of #leaf. Must be
                        multiple or divisor of the optimization interval for
                        efficiency; otherwise, it may be changed by the system
                        automatically. (Default:500)
   "num_tree_search="   Number of trees to be searched for the nodes to split.
                        The most recently-grown trees are searched first.
                        (Default:1)
   "reg_depth="         gamma>=1.  A larger value penalizes deeper nodes more
                        severely.  Used with lambda as in lambda*gamma^depth.
                        (Default:1)
   "NormalizeTarget"    For training, normalize training targets so that the
                        average becomes zero.  Intended for regression.
   "num_iteration_opt=" Used in the iterative optimization of weights. 
                        Maximum number of iterations. (Default:10 for square
                        loss; 5 for exponential loss and the likes)
   "opt_stepsize="      Used in the iterative optimization of weights.  Step
                        size of Newton updates. (Default:0.5)
   "min_pop="           Minimum number of training data points in each leaf
                        node. (Default:10)
   "Time"               Measure elapsed time for node search and weight
                        optimization.
   "Verbose"            Print information during training.
   "memory_policy="     Conservative|Generous. (Default:Generous)


   ---------------------------------------------------------
   To display parameters for other algorithms, enter: 
             /home/josh/rgf1.2/bin/rgf train  algorithm_name

      Example:  /home/josh/rgf1.2/bin/rgf train RGF_Sib
   ---------------------------------------------------------

   List of algorithm names: 
   "RGF"                Regularized greedy forest
   "RGF_Opt"            RGF w/min-penalty regularization
   "RGF_Sib"            RGF w/min-penalty regularization w/sum-to-zero sibling
                        constraints
'
None
b'
Usage: /home/josh/rgf1.2/bin/rgf  train  parameters

   "train"         Train and save models to files.

   parameters:     keyword-value pairs (e.g., "algorithm=RGF") and options
                   (e.g., "NormalizeTarget") delimited by "," described below.

     Example parameters:
       algorithm=RGF,train_x_fn=data.x,train_y_fn=data.y,reg_L2=0.1,...

   Below, "*" indicates the required parameters that cannot be omitted.

   [ Parameters for "train" ]
   "algorithm="         RGF|RGF_Opt|RGF_Sib (Default:RGF)
 * "train_x_fn="        Path to the feature file of training data.
 * "train_y_fn="        Path to the target file of training data.
 * "model_fn_prefix="   To save models, path names are generated by attaching
                        "-01", "-02",... to this value.

   To optionally specify the weights of individual data points:
   "train_w_fn="        Path to the file of user-defined weights assigned to
                        training data points.

   To optionally do warm-start with an existing model:
   "model_fn_for_warmstart="
                        Path to the input model file from which training
                        should do warm-start.


   [ Parameters for RGF (default algorithm) ]
 * "reg_L2="            lambda.  Regularization coefficient.
   "reg_sL2="           For node search, override lambda with this value.
   "loss="              Loss function (Default:LS)
                        LS|Log|Expo
                        LS: Square loss (p-y)^2/2
                        Log: Log loss log(1+exp(-py)) for y=1,-1
                        Expo: Exponential loss exp(-py) for y=1,-1
   "max_leaf_forest="   Stop training when the number of leaf nodes in the
                        forest reaches this number. (Default:10000)
   "opt_interval="      Weight optimization interval in terms of #leaf.
                        (Default:100)
   "test_interval="     Approximate test interval in terms of #leaf. Must be
                        multiple or divisor of the optimization interval for
                        efficiency; otherwise, it may be changed by the system
                        automatically. (Default:500)
   "num_tree_search="   Number of trees to be searched for the nodes to split.
                        The most recently-grown trees are searched first.
                        (Default:1)
   "reg_depth="         gamma>=1.  A larger value penalizes deeper nodes more
                        severely.  Used with lambda as in lambda*gamma^depth.
                        (Default:1)
   "NormalizeTarget"    For training, normalize training targets so that the
                        average becomes zero.  Intended for regression.
   "num_iteration_opt=" Used in the iterative optimization of weights. 
                        Maximum number of iterations. (Default:10 for square
                        loss; 5 for exponential loss and the likes)
   "opt_stepsize="      Used in the iterative optimization of weights.  Step
                        size of Newton updates. (Default:0.5)
   "min_pop="           Minimum number of training data points in each leaf
                        node. (Default:10)
   "Time"               Measure elapsed time for node search and weight
                        optimization.
   "Verbose"            Print information during training.
   "memory_policy="     Conservative|Generous. (Default:Generous)


   ---------------------------------------------------------
   To display parameters for other algorithms, enter: 
             /home/josh/rgf1.2/bin/rgf train  algorithm_name

      Example:  /home/josh/rgf1.2/bin/rgf train RGF_Sib
   ---------------------------------------------------------

   List of algorithm names: 
   "RGF"                Regularized greedy forest
   "RGF_Opt"            RGF w/min-penalty regularization
   "RGF_Sib"            RGF w/min-penalty regularization w/sum-to-zero sibling
                        constraints
'
None
b'
Usage: /home/josh/rgf1.2/bin/rgf  train  parameters

   "train"         Train and save models to files.

   parameters:     keyword-value pairs (e.g., "algorithm=RGF") and options
                   (e.g., "NormalizeTarget") delimited by "," described below.

     Example parameters:
       algorithm=RGF,train_x_fn=data.x,train_y_fn=data.y,reg_L2=0.1,...

   Below, "*" indicates the required parameters that cannot be omitted.

   [ Parameters for "train" ]
   "algorithm="         RGF|RGF_Opt|RGF_Sib (Default:RGF)
 * "train_x_fn="        Path to the feature file of training data.
 * "train_y_fn="        Path to the target file of training data.
 * "model_fn_prefix="   To save models, path names are generated by attaching
                        "-01", "-02",... to this value.

   To optionally specify the weights of individual data points:
   "train_w_fn="        Path to the file of user-defined weights assigned to
                        training data points.

   To optionally do warm-start with an existing model:
   "model_fn_for_warmstart="
                        Path to the input model file from which training
                        should do warm-start.


   [ Parameters for RGF (default algorithm) ]
 * "reg_L2="            lambda.  Regularization coefficient.
   "reg_sL2="           For node search, override lambda with this value.
   "loss="              Loss function (Default:LS)
                        LS|Log|Expo
                        LS: Square loss (p-y)^2/2
                        Log: Log loss log(1+exp(-py)) for y=1,-1
                        Expo: Exponential loss exp(-py) for y=1,-1
   "max_leaf_forest="   Stop training when the number of leaf nodes in the
                        forest reaches this number. (Default:10000)
   "opt_interval="      Weight optimization interval in terms of #leaf.
                        (Default:100)
   "test_interval="     Approximate test interval in terms of #leaf. Must be
                        multiple or divisor of the optimization interval for
                        efficiency; otherwise, it may be changed by the system
                        automatically. (Default:500)
   "num_tree_search="   Number of trees to be searched for the nodes to split.
                        The most recently-grown trees are searched first.
                        (Default:1)
   "reg_depth="         gamma>=1.  A larger value penalizes deeper nodes more
                        severely.  Used with lambda as in lambda*gamma^depth.
                        (Default:1)
   "NormalizeTarget"    For training, normalize training targets so that the
                        average becomes zero.  Intended for regression.
   "num_iteration_opt=" Used in the iterative optimization of weights. 
                        Maximum number of iterations. (Default:10 for square
                        loss; 5 for exponential loss and the likes)
   "opt_stepsize="      Used in the iterative optimization of weights.  Step
                        size of Newton updates. (Default:0.5)
   "min_pop="           Minimum number of training data points in each leaf
                        node. (Default:10)
   "Time"               Measure elapsed time for node search and weight
                        optimization.
   "Verbose"            Print information during training.
   "memory_policy="     Conservative|Generous. (Default:Generous)


   ---------------------------------------------------------
   To display parameters for other algorithms, enter: 
             /home/josh/rgf1.2/bin/rgf train  algorithm_name

      Example:  /home/josh/rgf1.2/bin/rgf train RGF_Sib
   ---------------------------------------------------------

   List of algorithm names: 
   "RGF"                Regularized greedy forest
   "RGF_Opt"            RGF w/min-penalty regularization
   "RGF_Sib"            RGF w/min-penalty regularization w/sum-to-zero sibling
                        constraints
'
None
Traceback (most recent call last):
  File "new_iris.py", line 34, in <module>
    rgf_score += rgf.score(xs_test, y_test)
  File "/usr/local/lib/python3.5/dist-packages/sklearn/base.py", line 349, in score
    return accuracy_score(y, self.predict(X), sample_weight=sample_weight)
  File "/usr/local/lib/python3.5/dist-packages/rgf_sklearn-0.0.0-py3.5.egg/rgf/rgf.py", line 231, in predict
    proba = self.predict_proba(X)
  File "/usr/local/lib/python3.5/dist-packages/rgf_sklearn-0.0.0-py3.5.egg/rgf/rgf.py", line 201, in predict_proba
    class_proba = clf.predict_proba(X)
  File "/usr/local/lib/python3.5/dist-packages/rgf_sklearn-0.0.0-py3.5.egg/rgf/rgf.py", line 330, in predict_proba
    latest_model_loc = sorted(glob(model_glob), reverse=True)[0]
IndexError: list index out of range

Please note: The code above was not formatted 100% correctly and so I replaced all \n with actual newlines for readability.

Any more information please ask :)

@fukatani
Copy link
Member

Thanks a lot!
rgf_python succeeded to call execution file, but arguments seems to be invalid.
This is reason why the usage was displayed in your environment.
Let's confirm arguments.

For confirming it, we have to debug by embedding print sentence or using IDE.

If you want to use former method, please rewrite platform_specific_Popen and insert print function as follows.

def platform_specific_Popen(cmd, **kwargs):
    print("Output command arguments... {0}".format(cmd))
    print("Output kwargs... {0}".format(kwargs))
    print("Output sysname... {0}".format(sys_name))
    if sys_name == WINDOWS:
        return subprocess.Popen(cmd.split(), **kwargs)
    elif sys_name == LINUX:
        return subprocess.Popen(cmd, **kwargs)

Here is result in my environment.

Output command arguments... C:\Users\rf\Documents\python\rgf1.2\bin\rgf.exe train Verbose,train_x_fn=temp/train.data.x,train_y_fn=temp/train.data.y,algorithm=RGF_Sib,loss=Log,max_leaf_forest=400,test_interval=100,reg_L2=0.1,reg_sL2=0.1,reg_depth=1,model_fn_prefix=temp/model_c2 2>&1
Output kwargs... {'shell': True, 'stdout': -1}
Output sysname... Windows

@JoshuaC3
Copy link
Author

JoshuaC3 commented Oct 16, 2016

From the print/debug function I get

Output command arguments... /home/josh/rgf1.2/bin/rgf train Verbose,train_x_fn=/home/josh/Documents/Python Scripts/temp/train.data.x,train_y_fn=/home/josh/Documents/Python Scripts/temp/train.data.y,algorithm=RGF_Sib,loss=Log,max_leaf_forest=400,test_interval=100,reg_L2=0.1,reg_sL2=0.1,reg_depth=1,model_fn_prefix=/home/josh/Documents/Python Scripts/temp/model_c2 2>&1
Output kwargs... {'stdout': -1, 'shell': True}
Output sysname... Linux

@fukatani
Copy link
Member

OK!
Could you change loc-temp to other directory which name does not include space?
Space is recognized as delimiter character.

@JoshuaC3
Copy link
Author

JoshuaC3 commented Oct 17, 2016

I have changed it. It all seems to be working now as the iris script ran and gave the 0.95.. answer. Thank you greatly for your help and sorry that it was such a silly error.

@JoshuaC3
Copy link
Author

This is also working with my Jupyter notebook now! :)

@fukatani
Copy link
Member

fukatani commented Oct 17, 2016

That's good!

Don't worry 😃
Thanks to the discussion here, points to be improved of rgf_python is found.
In WINDOWS, path including space sometimes exists.
So rgf_python can be more useful and stable by fixing it.

Please feel free to ask questions or send PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants