Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Oracle issue - Yuchen #238

Closed
amva13 opened this issue Apr 6, 2024 · 16 comments
Closed

Oracle issue - Yuchen #238

amva13 opened this issue Apr 6, 2024 · 16 comments
Assignees

Comments

@amva13
Copy link
Member

amva13 commented Apr 6, 2024

Describe the bug

Dear TDC Team,

I hope this message finds you well. I am writing to report some technical issues I encountered while utilizing the oracle provided by TDC. Below are the details of the problems:

Problem 1: I have encountered an error after downloading the Oracle with the name "JNK3". The error message is as follows:
ValueError: node array from the pickle has an incompatible dtype:

  • expected: {'names': ['left_child', 'right_child', 'feature', 'threshold', 'impurity', 'n_node_samples', 'weighted_n_node_samples', 'missing_go_to_left'], 'formats': ['<i8', '<i8', '<i8', '<f8', '<f8', '<i8', '<f8', 'u1'], 'offsets': [0, 8, 16, 24, 32, 40, 48, 56], 'itemsize': 64}
  • got: [('left_child', '<i8'), ('right_child', '<i8'), ('feature', '<i8'), ('threshold', '<f8'), ('impurity', '<f8'), ('n_node_samples', '<i8'), ('weighted_n_node_samples', '<f8')]

This problem also occurs with the Oracle named "GSK3". However, the error arises when I attempt to input a list of smiles. Inputting a single smile into GSK3 does not trigger the error.

Problem 2: As mentioned above, inputting a single smile into the Oracle "GSK3" does not result in an error. However, I have tried multiple active molecules targeting GSK3beta from ChEMBL, and the output value from the oracle is consistently 0. This suggests there might be an issue with the "GSK3" oracle that requires your attention.

I hope you can address these issues promptly. Please let me know if you need any further information or details from my end.

Best regards,

Yuchen

@amva13
Copy link
Member Author

amva13 commented Apr 6, 2024

@abearab ^ you can have a look at this

@miguelgondu
Copy link

miguelgondu commented Apr 9, 2024

I'm having the same issue with GSK3B. Moreover, there's a discrepancy on whether I evaluate a list of SMILES or just a single SMILE. If I evaluate a SMILE, I get 0.0; if I evaluate a list, I get the error @amva13 is getting for JNK3.

I wonder whether something changed in sklearn's random forests and their formatting. That being said, I'm using sklearn==1.3.0, which is the version inside this project's requirements.txt.

@miguelgondu
Copy link

The culprit for the discrepancy between lists/individual SMILES is the try-except block in L656 of the implementation of oracles.

In other words, the loading of the oracle is failing silently, and thus the oracle returns the default value.

So we could try to solve two problems:

  1. Calling oracles on smile_str and [smile_str] should have the same behavior.
  2. Fixing the loading of the oracles for GSK3B and JNK3.

I'm happy to volunteer on any of those!

@amva13
Copy link
Member Author

amva13 commented Apr 9, 2024

Hi @miguelgondu , thanks for the find! For clarity, changing the try-except block would only reveal the real error, not fix it. What version of the package are you using? Could you try 0.4.1 ?

@miguelgondu
Copy link

Hi @amva13,

Yes! Changing the try-except block only reveals the error. Fixing it would involve checking what changed with the pkl files/their loading, I imagine.

I've tried with both 0.4.1 and 0.4.6. Both have the same issue.

@amva13
Copy link
Member Author

amva13 commented Apr 9, 2024

Ok. This was to confirm error is not due to recent release changes. I will be personally inspecting this error starting now. One thing I'd try while I'm looking into it. There might be something to your claim about sklearn==1.3.0 causing a breaking change.

I would try building package 0.4.1 in a virtual environment (i.e. conda). 0.4.1 does not specify versions in requirements.txt and this might fix the behavior.

@amva13 amva13 self-assigned this Apr 9, 2024
@amva13
Copy link
Member Author

amva13 commented Apr 9, 2024

This error is indeed because of a mismatch in the formatting between the pickle object and the format expected by scikit learn. This is in part due to a version upgrade in scikit.

See reverse issue here
yzhao062/pyod#519

Evaluating some fixes and will push new version of package asap.

EDIT: Downgrading scikit-learn fixes the dtype issue but does not solve the underlying problem.

@amva13
Copy link
Member Author

amva13 commented Apr 9, 2024

Hi @miguelgondu I believe I've solved it. Would you mind sharing some of the input SMILES strings which produced a 0.0 value for these oracles for you?

@miguelgondu
Copy link

Hi @amva13, I used the one in the docs: 'CC(C)(C)[C@H]1CCc2c(sc(NC(=O)COc3ccc(Cl)cc3)c2C(N)=O)C1' should have a GSK3B score of 0.03 (at least according to the minimal example provided here)

amva13 added a commit that referenced this issue Apr 9, 2024
@amva13 amva13 closed this as completed in 30dd806 Apr 9, 2024
@amva13
Copy link
Member Author

amva13 commented Apr 9, 2024

Hi @miguelgondu I just pushed the fix and will be releasing the new package now. Will lyk when you can install

@miguelgondu
Copy link

Thanks! Looking forward.

@miguelgondu
Copy link

Just FYI: I'm getting a warning on Thiothixene_Rediscovery that is similar in spirit to this issue:

InconsistentVersionWarning: Trying to unpickle estimator DecisionTreeClassifier from version 0.23.0 when using version 1.3.0. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
  https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations

@amva13
Copy link
Member Author

amva13 commented Apr 9, 2024

Got it. Thanks for pointing out. The best solution is to pickle these solutions with a more modern scikit (or invoke the models with a different method entirely to avoid the dependency issues altogether). For now the downgrade seems to work, though that particular classifier came from version 0.23.0.. so not great. I'll flag this is a longer term issue to look at.

@amva13
Copy link
Member Author

amva13 commented Apr 9, 2024

@miguelgondu it's all fixed. you can install 0.4.7 for the working version

example:
https://colab.research.google.com/drive/17mGlLaVkfA2-0sqhbZlQ4cUI0JnFBpRq?usp=sharing

@miguelgondu
Copy link

Hi @amva13 , thanks for the fix!

Checking with the other oracles in that specific version, something seems to break in deco hop. In the first example of the documentation (the same one I provided above) I went from getting 0.5338... to getting 0.0. Weird!

The rest of the oracles seem to work as expected, except for the ones in the issue I raised recently (#244).

Thanks again for the hard work.

@amva13
Copy link
Member Author

amva13 commented Apr 10, 2024

ack'd issue opened

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants