Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make pysurvival work with scikit-learn #15

Open
pransito opened this issue Oct 24, 2019 · 42 comments
Open

Make pysurvival work with scikit-learn #15

pransito opened this issue Oct 24, 2019 · 42 comments

Comments

@pransito
Copy link

I have noticed that PySurvival does not really follow the priniciples of scikit-learn. Starting with the fact that you input X, T, E, instead of X, y. Further GridSearchCV cannot be used because of the aforementioned problem but also because there is no set_params method in the model objects. (also see pipeline of scikit-learn, which only works after extensive reworking of many classes and functions in scikit-learn). This is very unfortunate, I think, that this great package keeps outside of sklearn. Is there any plan to fix this and make PySurvival connectable to scikit-learn? Or am I missing something?

@pransito pransito changed the title Make pyurvival work with scikit-learn Make pysurvival work with scikit-learn Oct 24, 2019
@bacalfa
Copy link

bacalfa commented Jan 28, 2020

FYI, I'm working on a solution to this issue. I expect to have something in a few days.

@bacalfa
Copy link

bacalfa commented Jan 29, 2020

I'm happy to announce that I think I have a clean solution to this issue. Please pull from master in my forked repository: https://github.com/bacalfa/pysurvival.

If you installed it with setup.py, first uninstall the current version with:

  • python -m pip uninstall pysurvival

Then reinstall it:

  • python setup.py build_ext --inplace (to rebuild the package)
  • python setup.py install --user (to install the files to your local directories)

Make sure to check out the new notebook explaining the new feature. Comments and feedback are welcome!

@camferna
Copy link

Omg thank you so much! hahaha

@JCCKwong

This comment has been minimized.

@bacalfa
Copy link

bacalfa commented Mar 21, 2020

@JCCKwong, can you give more details on the steps you're taking and what happens after you execute them? Also, did you clone my forked repository (https://github.com/bacalfa/pysurvival) instead of the one from the original author (https://github.com/square/pysurvival)?

@JCCKwong

This comment has been minimized.

@bacalfa
Copy link

bacalfa commented Mar 21, 2020

First change the current directory to C:\Users\Jethro\pysurvival.

cd C:\Users\Jethro\pysurvival

Then run the python commands described above in #15 (comment).

@JCCKwong

This comment has been minimized.

@DashengSong
Copy link

I'm happy to announce that I think I have a clean solution to this issue. Please pull from master in my forked repository: https://github.com/bacalfa/pysurvival.

If you installed it with setup.py, first uninstall the current version with:

  • python -m pip uninstall pysurvival

Then reinstall it:

  • python setup.py build_ext --inplace (to rebuild the package)
  • python setup.py install --user (to install the files to your local directories)

Make sure to check out the new notebook explaining the new feature. Comments and feedback are welcome!

First change the current directory to C:\Users\Jethro\pysurvival.

cd C:\Users\Jethro\pysurvival

Then run the python commands described above in #15 (comment).
Can it work in StratifiedKFold?

@bacalfa
Copy link

bacalfa commented Apr 17, 2020

@DashengSong, have you tried it? I don't think I have.

@KaranMehta21
Copy link

Hi @bacalfa . Thanks for creating a package that can be installed on Windows. I'm trying to use the sklearn compatibility feature you've added. Does it work the random survival forest estimator too?

@bacalfa
Copy link

bacalfa commented Apr 28, 2020

@KaranMehta21, I think it does. But there may be a caveat: #17.

@KaranMehta21
Copy link

@bacalfa OK I'll try it out. Is the benefit of using it to implement cross-validation and hyperparameter tuning and will that lead to higher c-indices? Currently, the RSF model I've trained has a c index of 0.71. I'm looking for ways to increase it closer to 0.80. Any suggestions?

@bacalfa
Copy link

bacalfa commented Apr 28, 2020

Honestly, I haven't used this package that much, so I'm not sure what to suggest. There are simpler and more complex models. It's a good habit to evaluate performance with a validation set (like in CV) and perform hyperparameter tuning. Difficult to know which algorithm will be the best a priori. So try (and tune) as many as you can, and make sure you make a fair comparison between them.

@SurajitTest
Copy link

SurajitTest commented May 17, 2020

Hi All, Would really appreciate if anyone can help me. I have downloaded the package which is at location : C:\Users\User\Downloads\pysurvival-master. For me , I have installed Anaconda at C:\Users\User. I am providing you with the steps that I think I need to follow, please guide so that I can carry out the installation correctly.

Step-1: Create a Directory : C:\Users\User\pysurvival (as Anaconda is installed in C:\Users\User )
Step-2: Copy all contents from C:\Users\User\Downloads\pysurvival-master to C:\Users\User\pysurvival (now setup.py is in this location)
Step-3: Navigate to C:\Users\User\pysurvival (using command prompt)
Step-4: Run the 2 below commands
python setup.py build_ext --inplace (to rebuild the package)
python setup.py install --user (to install the files to your local directories)

@CoteDave
Copy link

Hi @bacalfa ,

I've tried your fork with the setup.py

Unfortunatly, still not working for me, because of this line: extra_compile_args = ["/O2"]

Error occuring when: building 'pysurvival.utils._functions' extension

Error:
gcc: error: /O2: No such file or directory
error: command 'C:\MinGW\bin\gcc.exe' failed with exit status 1

Thanks!

@bacalfa
Copy link

bacalfa commented Sep 23, 2020

@CoteDave, I don't have MinGW installed on my Windows machine (and it's not easy to do so). The error seems to suggest that /O2 is an option for the MS C/C++ compiler, which isn't recognized by MinGW. If you change line 61 in setup.py to the same thing as in line 63, I think it'd work. Let me know.

@CoteDave
Copy link

Hi @bacalfa , changed the line 61.

No more /O2 error, but sadly, a new error occurs at the same place:

building 'pysurvival.utils._functions' extension
c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: C:...\Anaconda3\libs/libpython38.dll.a: error adding symbols: file format not recognized
collect2.exe: error: ld returned 1 exit status
error: command 'C:\MinGW\bin\g++.exe' failed with exit status 1

@bacalfa
Copy link

bacalfa commented Sep 24, 2020

@CoteDave, that error looks similar to this one. See the suggestion there.

@elopezfune
Copy link

I would like as well to make a suggestion.
Could you please as well include a Lasso regularization term into the Linear Multi-Task Logistic Regression and Linear SVM Loss Functions in order to be similar to Sklearn to do Ridge, Lasso or ElasticNet regularizations?
It will be something like adding a new parameter called "penalizer" such that in line 191 of multi_task.py is written:
loss += penalizer*( l2_regtorch.sum(ww)/2. + (1.0-l2_reg)torch.sum(np.sqrt(ww)))

Therefore, if l2_reg=1, one is doing Ridge regularization, if l2_reg=0 one is doing Lasso regularization, and when 0<l2_reg<1 one is doing ElasticNet.

@bacalfa
Copy link

bacalfa commented Oct 22, 2020

@elopezfune, regarding your error, see if this helps.

I'll see if I can help with the regularization request and will let you know.

@bacalfa
Copy link

bacalfa commented Oct 23, 2020

@elopezfune, I'd prefer to create a branch for this request. Let's call it elastic_net_loss.

For MTLR, I'd do:

loss += l2_reg * torch.sum(w * w) / 2. + (1.0 - l2_reg) * torch.sum(torch.abs(w))

For consistency, I should probably apply the same change to other models. For SVM, that would require modifying Cython code (file _svm.pyx). I'll need more time to make sure I understand what changes to make. Any help is welcome. I'm actually not a user of this package at the moment. Just trying to help maintain it for others. :)

@elopezfune
Copy link

Thanks for the quick answer.
I believe ElasticNet will give the users more flexibility to optimize survival models.

@elopezfune
Copy link

Yes, a line of code like this is perfect!
loss += l2_reg * torch.sum(w * w) / 2. + (1.0 - l2_reg) * torch.sum(torch.abs(w))

I tried once to change it manually on the local files, but I didn't have access to the optimization code (Cython), therefore, it didn't work.

@elopezfune
Copy link

Well, indeed, there is the need to introduce a new parameter namely penalizer or something like this, which will be the "penalizer" of the model. l2_reg will be to choose between Ridge, Lasso or ElasticNet.

@bacalfa
Copy link

bacalfa commented Oct 23, 2020

The following packages contain unfulfilled dependencies:
  python3-dev: Depends: libpython3-dev (= 3.8.2-0ubuntu2) but will not be installed
                Depends: python3.8-dev (> = 3.8.2-1 ~) but will not be installed
E: Unable to correct problems, bad packets are in "keep as is" mode.

What Python version do you have installed? It seems to be suggesting that you should have at least 3.8 to be able to install libpython3-dev.

These errors you're experiencing are specific to your Ubuntu system, not really to pysurvival. Once you have all the dependencies installed, you should be able to build pysurvival.

@elopezfune
Copy link

I have Python 3.8.6

@bacalfa
Copy link

bacalfa commented Oct 23, 2020

You'll have to do some searching on the errors you're getting. I can't reproduce it because I currently don't have access to Ubuntu. See this.

@elopezfune
Copy link

Thanks, I m trying to solve this problem that it is driving me crazy

@bacalfa
Copy link

bacalfa commented Oct 24, 2020

Adding support for l1 regularization to SVM isn't trivial. It requires modifications to Cython code (doable), but I can't find the reference for the formulation. And I don't have a lot of time to spend on this. If anyone would like to contribute or help, that'd be appreciated. SVM in this package doesn't use PyTorch (loss, gradient, and Hessian are manually implemented in Cython, so it's important to know the full formulation in order to modify it).

@elopezfune
Copy link

I could help on that!
I just need a way to access the Cython

@bacalfa
Copy link

bacalfa commented Oct 24, 2020

They're in cpp_extensions. For example: https://github.com/bacalfa/pysurvival/blob/master/pysurvival/cpp_extensions/_svm.pyx.

@byronmamamoney
Copy link

Hi,
I've created an Ubuntu 18.04 AWS EC2 instance.
Installed python 3.6, 3.7 and 3.8 (just to make sure it is not due to the Python version)
Followed the installation steps as per https://square.github.io/pysurvival/installation.html
When running the "pip install pysurvival" command I get:

**urvival/cpp_extensions/_functions.o -std=c++11 -O3
pysurvival/cpp_extensions/_functions.cpp:4:10: fatal error: Python.h: No such file or directory
#include "Python.h"
^~~~~~~~~~
compilation terminated.
error: command '/usr/bin/gcc-8' failed with exit status 1

ERROR: Failed building wheel for pysurvival**

Please can you assist with this?

Kind Regards
Byron

@bacalfa
Copy link

bacalfa commented Nov 10, 2020

@byronmamamoney, please see above the discussion with @elopezfune. You'll have to Google the errors that come up. I don't have a way to test it on Ubuntum

@byronmamamoney
Copy link

Thanks @bacalfa I've got it working after removing the various python versions (except 3.6), the default for the Ubuntu box. Did a reboot and reinstalled the libraries.

Great work on the documentation here: https://square.github.io/pysurvival/index.html

Regards
Byron

@elopezfune
Copy link

elopezfune commented Nov 11, 2020

@elopezfune
Copy link

If I understand correctly, it is the percentage of each feature importance. It could be good to include the explanation of this feature in the documentation.

@pransito
Copy link
Author

Hi, the issue comments seem to go a bit off topic. @bacalfa thanks for your work. Any chance to make a pull request to merge your version with the official repo? So that your sklearn add on will become available generally?

@bacalfa
Copy link

bacalfa commented Nov 22, 2020

@pransito I think the original author isn't maintaining this package anymore., unfortunately. That's why I forked it and fixed a few issues. But you can try to reach out to him.

@andreas-kaae
Copy link

andreas-kaae commented Apr 19, 2021

Awesome job @bacalfa, this is exactly what I was looking for!!

If there are other less skilled coders like me who needs a bit more clarification then I can spare you some time by following these slightly more detailed steps based on previously mentioned explanations.

  1. Download the zip folder from the link: https://github.com/bacalfa/pysurvival and unpack the zip to get the folder "pysurvival-master".
  2. Copy folder to your user path. For me, this is: C:\Users\Andreas
  3. Open your "Anaconda Prompt" if you're not using anaconda I assume the normal command prompt might also work, but I have no idea.
  4. If you have pysurvival installed uninstall it by: pip uninstall pysurvival
  5. Set directory by typing cd C:\Users\Andreas\pysurvival-master
  6. Then reinstall by first running this code: python setup.py build_ext --inplace
  7. Lastly this code: python setup.py install --user

The sklearn-adapter should now work.

@dadekandrew2010
Copy link

dadekandrew2010 commented Feb 19, 2022

With your modified version, I can make MultiTaskModel work fine with scikit-learn. However, with NeuralMultiTaskModel, I write my code as follows.

my coding

NMTLModel_skl = sklearn_adapter(NeuralMultiTaskModel,time_col='time', event_col='status',predict_method="predict_survival", scoring_method= concordance_index)
mystructure = [ {'activation': 'ReLU', 'num_units': 150}, ]
nmtlr_model_skl = NMTLModel_skl(structure= mystructure)
nmtlr_model_skl.fit(cli_train, ysur_train,init_method='orthogonal', optimizer = 'rprop', lr=1e-3, num_epochs = 500, bins=150)
nmtlr_model_score = nmtlr_model_skl.score(x_test, y_test)
from sklearn.model_selection import cross_val_score
scores_nmtlr_cli = cross_val_score( estimator= nmtlr_model_skl,fit_params= {"l2_reg": 1E-1}, X = x_test, y= y_test, cv=5)

and get a error as follows. I does not know how to fix it.

the error

Cannot clone object SkLearnNeuralMultiTaskModel(auto_scaler=True, bins=150, structure=[{'activation': 'ReLU', 'num_units': 150}]), as the constructor either does not set or modifies parameter structure

I'm happy to announce that I think I have a clean solution to this issue. Please pull from master in my forked repository: https://github.com/bacalfa/pysurvival.

If you installed it with setup.py, first uninstall the current version with:

* `python -m pip uninstall pysurvival`

Then reinstall it:

* `python setup.py build_ext --inplace` (to rebuild the package)

* `python setup.py install --user` (to install the files to your local directories)

Make sure to check out the new notebook explaining the new feature. Comments and feedback are welcome!

@cynmasetto
Copy link

hey @bacalfa - thanks so much for all the answers I recently came across with the library and after performing all the steps I am still not able to install the library in my windows computer. I get the following error

...\pysurvival> python setup.py build_ext --inplace      
running build_ext
building 'pysurvival.models._non_parametric' extension
"C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.35.32215\bin\HostX86\x64\cl.exe" /c /nologo /O2 /W3 /GL /DNDEBUG /MD -IC:\Users\CMasetto\AppData\Local\anaconda3\include -IC:\Users\CMasetto\AppData\Local\anaconda3\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.35.32215\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.35.32215\ATLMFC\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" /EHsc /Tppysurvival/cpp_extensions/_non_parametric.cpp /Fobuild\temp.win-amd64-cpython-310\Release\pysurvival/cpp_extensions/_non_parametric.obj -std=c++11 -O3
cl : Command line warning D9002 : ignoring unknown option '-std=c++11'
cl : Command line warning D9002 : ignoring unknown option '-O3'  
_non_parametric.cpp
pysurvival/cpp_extensions/_non_parametric.cpp(8246): error C2105: '++' needs l-value
pysurvival/cpp_extensions/_non_parametric.cpp(8248): error C2105: '--' needs l-value
pysurvival/cpp_extensions/_non_parametric.cpp(8510): error C2105: '++' needs l-value
pysurvival/cpp_extensions/_non_parametric.cpp(8512): error C2105: '--' needs l-value
pysurvival/cpp_extensions/_non_parametric.cpp(8947): error C2039: 'tp_print': is not a member of '_typeobject'
C:\Users\CMasetto\AppData\Local\anaconda3\include\cpython/object.h(191): note: see declaration of '_typeobject'
pysurvival/cpp_extensions/_non_parametric.cpp(8971): error C2039: 'tp_print': is not a member of '_typeobject'
C:\Users\CMasetto\AppData\Local\anaconda3\include\cpython/object.h(191): note: see declaration of '_typeobject'
pysurvival/cpp_extensions/_non_parametric.cpp(9661): warning C4996: '_PyUnicode_get_wstr_length': deprecated in 3.3
pysurvival/cpp_extensions/_non_parametric.cpp(9677): warning C4996: '_PyUnicode_get_wstr_length': deprecated in 3.3
error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2022\\BuildTools\\VC\\Tools\\MSVC\\14.35.32215\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2

Am I missing anything? I've installed all the suggested libraries and compilers I don't know what else to do. hope you can help me install it. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests