Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add scikit-fingerprints #359

Closed
4 tasks done
Hrovatin opened this issue Sep 3, 2024 · 12 comments · Fixed by #364
Closed
4 tasks done

Add scikit-fingerprints #359

Hrovatin opened this issue Sep 3, 2024 · 12 comments · Fixed by #364
Assignees
Labels
new feature New functionality

Comments

@Hrovatin
Copy link
Collaborator

Hrovatin commented Sep 3, 2024

Replace mordred and rdkit fingerprints with scikit-fingerpints and enable other fingerprints from the package. Aim to remove rdkit and mordred install.

Dev in: https://github.com/Hrovatin/baybe/tree/feature/scikit_fingerprints

Notes/Discuss:

  • Functions that use RDKit but are not fingerprint related - do we keep RDKit then?
    • is_valid_smiles: not used anywhere
    • get_canonical_smiles
  • New automatic fingerprint naming will not be backward-compatible
  • mordred check in edbo - can this be used for any fingeprint (before was mordred and rdkit)
  • Consider making Fingerprint enum a class to make code prettier (see TODOs in enum code) - EDIT: Not relevant anymore
@Scienfitz Scienfitz added the new feature New functionality label Sep 3, 2024
@Scienfitz
Copy link
Collaborator

Scienfitz commented Sep 3, 2024

thank you for taking this on, summarizing our earlier conversation

  • Update the optional dependency group chem in pyproject.toml
  • Update _optional/info.py and _optional/chem.py, in particular CHEM_INSTALLED
  • Remove the smiles_to_*_features in baybe/utils/chemistry.py. You probably can replace them with a single function. It would likely also be possible without any such function and doing the logic irectly in the substance parameter, however we group all chemistry logic into this file so it can be lazily imported so I guess the best is to have one new utility function here
  • Update the core logic and attributes/validators interfacing users via the SubstanceParameter in baybe/parameters/substance.py
  • Automatically generate the enum SubstanceEncoding with all available choices in scikit-fingerprints. The enum likely has to be moved from baybe/parameters/enum.py to baybe/parameters/substance.py so the lazy import is still done
  • Replace all usages of the old encodings as strings or enums with the new ones
  • It appears tests are already made generalistic and dont need to be updated, but double check this
  • Update the userguide in docs/userguide/parameters.md with the new choices for the encoding
  • Update and/or retest examples, in particular examples/Backtesting/full_lookup.py
  • Add yourself to CONTRIBUTORS.md
  • Mention this change in the CHANGELOG
  • Double check and expand (if needed) the hypothesis strategies for substance parameters in tests/hypothesis_strategies/parameters.py

@Scienfitz
Copy link
Collaborator

Scienfitz commented Sep 3, 2024

its strange that is_valid_smiles is not used soemwhere, we definitley used to valdiate SMILES at some point

but in here I see that the value corresponding to SMILES are validated with a different logic in @data.validator and not using is_valid_smiles @AdrianSosic any idea why?
image

wouldnt value_validator=is_valid_smiles make the most sense? Or was there an issue with lazy loading?

@AdrianSosic
Copy link
Collaborator

Was refactored at some point to handle smiles in canonical form, which also does the check internally (see validator method):
image
But the other function was kept because it's still useful in its own right.

@Hrovatin
Copy link
Collaborator Author

Hrovatin commented Sep 3, 2024

@Scienfitz I ran pytest -fast and there are two errs that I am not sure about - if you could provide some guidance that would be great

FAILED tests/test_searchspace.py::test_searchspace_memory_estimate[grid5-parameter_names0] - AssertionError: ('Comp: ', 699840, 563760)
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-AtomPairFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...

One more question about the test - do I need to run them separately in env where CHEM is not installed?

@Hrovatin
Copy link
Collaborator Author

Hrovatin commented Sep 3, 2024

Also, do I need to do sth to re-generate documentation svgs or will it be done automatically? I guess examples/Backtesting/full_lookup.py creates some of these - should I run it?

@Scienfitz
Copy link
Collaborator

please can you use
tox -e fulltest-py310
tox -e coretest-py312
(and also
tox -e lint-py312
tox -e mypy-py312
for other tests)

will probably give you the same error but to exclude that its any environment misconfiguration

I have a suspicion for the first error, but impossible to help without seeing the code. You can already open the PR in draft mode

@Scienfitz
Copy link
Collaborator

dont care about the pictures at this moment, they actually shouldnt change much if the fingerprints from the package are implemented identically

@AVHopp
Copy link
Collaborator

AVHopp commented Sep 3, 2024

Regarding pictures: Once everything else is fixed, just ping me about the pictures @Hrovatin . I can then give you a heads-up/we can discuss how to update pictures, but as Martin says, this is not really relevant at the moment.

@Hrovatin
Copy link
Collaborator Author

Hrovatin commented Sep 4, 2024

Test results. For mypy I need to do a few updates and will add once finished.

tox -p -e lint-py312
  lint-py312: OK (19.49=setup[1.84]+cmd[0.01,17.63] seconds)
  congratulations :) (19.81 seconds)

tox -p -e coretest-py312
  coretest-py312: OK (269.43=setup[72.26]+cmd[0.01,197.16] seconds)
  congratulations :) (269.77 seconds)

tox -p -e fulltest-py310
=================================================================================================== short test summary info ====================================================================================================
FAILED tests/docs/test_examples.py::test_example[examples/Serialization/basic_serialization.py] - subprocess.CalledProcessError: Command '['python', 'examples/Serialization/basic_serialization.py']' returned non-zero exit status 1.
FAILED tests/test_iterations.py::test_kernels[b3-grid5-i3-AdditiveKernel3] - torch._C._LinAlgError: linalg.eigh: (Batch element 0): The algorithm failed to converge because the input matrix is ill-conditioned or has too many repeated eigenvalues (error code: 2).
FAILED tests/test_searchspace.py::test_searchspace_memory_estimate[grid5-parameter_names0] - AssertionError: ('Comp: ', 699840, 563760)
FAILED tests/test_searchspace.py::test_searchspace_memory_estimate[grid8-parameter_names0] - AssertionError: ('Comp: ', 1119744, 902016)
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-AtomPairFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-AutocorrFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-AvalonFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-E3FPFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-ECFPFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-ERGFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-EStateFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-FunctionalGroupsFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-GETAWAYFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-GhoseCrippenFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-KlekotaRothFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-LaggnerFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-LayeredFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-LingoFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-MACCSFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-MAPFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-MHFPFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-MORSEFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-MQNsFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-MordredFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-PatternFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-PharmacophoreFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-PhysiochemicalPropertiesFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-PubChemFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-RDFFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-RDKit2DDescriptorsFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-RDKitFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-SECFPFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-TopologicalTorsionFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-USRCATFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-USRFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-WHIMFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid5-DefaultFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-AtomPairFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-AutocorrFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-AvalonFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-E3FPFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-ECFPFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-ERGFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-EStateFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-FunctionalGroupsFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-GETAWAYFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-GhoseCrippenFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-KlekotaRothFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-LaggnerFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-LayeredFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-LingoFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-MACCSFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-MAPFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-MHFPFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-MORSEFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-MQNsFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-MordredFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-PatternFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-PharmacophoreFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-PhysiochemicalPropertiesFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-PubChemFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-RDFFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-RDKit2DDescriptorsFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-RDKitFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-SECFPFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-TopologicalTorsionFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-USRCATFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-USRFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-WHIMFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
FAILED tests/test_substance_parameter.py::test_run_iterations[b3-i2-grid8-DefaultFingerprint] - baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...
==================================================================================== 70 failed, 1561 passed, 4 skipped in 405.38s (0:06:45) ====================================================================================
fulltest-py310: exit 1 (410.36 seconds) /Users/karinhrovatin/Documents/code/baybe-Hrovatin> pytest -p no:warnings --cov=baybe --durations=5 pid=77919
  fulltest-py310: FAIL code 1 (413.63=setup[3.27]+cmd[0.00,410.36] seconds)
  evaluation failed :( (414.00 seconds)

@Hrovatin
Copy link
Collaborator Author

Hrovatin commented Sep 4, 2024

For mypy I have multiple issues with SubstanceEncoding, for which I would anyway suggest changes, as briefly mentioned above.
So I did not resolve them for now.

baybe/parameters/enum.py:51: error: Unexpected keyword argument "names" for "ParameterEncoding"  [call-arg]
baybe/parameters/substance.py:60: error: Variable "baybe.parameters.enum.SubstanceEncoding" is not valid as a type  [valid-type]
baybe/parameters/substance.py:60: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#variables-vs-type-aliases
baybe/parameters/substance.py:60: error: No overload variant of "field" matches argument types "Any", "ParameterEncoding"  [call-overload]
baybe/parameters/substance.py:60: note: Possible overload variants:
baybe/parameters/substance.py:60: note:     def field(*, default: None = ..., validator: None = ..., repr: bool | Callable[[Any], str] = ..., hash: bool | None = ..., init: bool = ..., metadata: Mapping[Any, Any] | None = ..., converter: None = ..., factory: None = ..., kw_only: bool = ..., eq: bool | None = ..., order: bool | None = ..., on_setattr: Callable[[Any, Attribute[Any], Any], Any] | list[Callable[[Any, Attribute[Any], Any], Any]] | _NoOpType | None = ..., alias: str | None = ..., type: type | None = ...) -> Any
baybe/parameters/substance.py:60: note:     def [_T] field(*, default: None = ..., validator: Callable[[Any, Attribute[_T], _T], Any] | Sequence[Callable[[Any, Attribute[_T], _T], Any]] | None = ..., repr: bool | Callable[[Any], str] = ..., hash: bool | None = ..., init: bool = ..., metadata: Mapping[Any, Any] | None = ..., converter: Callable[[Any], Any] | Converter[Any, _T] | None = ..., factory: Callable[[], _T] | None = ..., kw_only: bool = ..., eq: bool | Callable[[Any], Any] | None = ..., order: bool | Callable[[Any], Any] | None = ..., on_setattr: Callable[[Any, Attribute[Any], Any], Any] | list[Callable[[Any, Attribute[Any], Any], Any]] | _NoOpType | None = ..., alias: str | None = ..., type: type | None = ...) -> _T
baybe/parameters/substance.py:60: note:     def [_T] field(*, default: _T, validator: Callable[[Any, Attribute[_T], _T], Any] | Sequence[Callable[[Any, Attribute[_T], _T], Any]] | None = ..., repr: bool | Callable[[Any], str] = ..., hash: bool | None = ..., init: bool = ..., metadata: Mapping[Any, Any] | None = ..., converter: Callable[[Any], Any] | Converter[Any, _T] | None = ..., factory: Callable[[], _T] | None = ..., kw_only: bool = ..., eq: bool | Callable[[Any], Any] | None = ..., order: bool | Callable[[Any], Any] | None = ..., on_setattr: Callable[[Any, Attribute[Any], Any], Any] | list[Callable[[Any, Attribute[Any], Any], Any]] | _NoOpType | None = ..., alias: str | None = ..., type: type | None = ...) -> _T
baybe/parameters/substance.py:60: note:     def [_T] field(*, default: _T | None = ..., validator: Callable[[Any, Attribute[_T], _T], Any] | Sequence[Callable[[Any, Attribute[_T], _T], Any]] | None = ..., repr: bool | Callable[[Any], str] = ..., hash: bool | None = ..., init: bool = ..., metadata: Mapping[Any, Any] | None = ..., converter: Callable[[Any], Any] | Converter[Any, _T] | None = ..., factory: Callable[[], _T] | None = ..., kw_only: bool = ..., eq: bool | Callable[[Any], Any] | None = ..., order: bool | Callable[[Any], Any] | None = ..., on_setattr: Callable[[Any, Attribute[Any], Any], Any] | list[Callable[[Any, Attribute[Any], Any], Any]] | _NoOpType | None = ..., alias: str | None = ..., type: type | None = ...) -> Any
baybe/parameters/substance.py:61: error: "ParameterEncoding" has no attribute "DefaultFingerprint"  [attr-defined]
baybe/parameters/substance.py:61: error: Unsupported converter, only named functions, types and lambdas are currently supported  [misc]
baybe/parameters/substance.py:123: error: SubstanceEncoding? has no attribute "name"  [attr-defined]
Found 6 errors in 2 files (checked 102 source files)
mypy-py312: exit 1 (2.70 seconds) /Users/karinhrovatin/Documents/code/baybe-Hrovatin> mypy pid=82438
  mypy-py312: FAIL code 1 (5.70=setup[2.99]+cmd[0.01,2.70] seconds)
  evaluation failed :( (5.99 seconds)

@Scienfitz
Copy link
Collaborator

Functions that use RDKit but are not fingerprint related - do we keep RDKit then?

I think rdkit is a main dep of skfp so we do not have to decide and can keep all other funcs

New automatic fingerprint naming will not be backward-compatible

Is ideally designed to coincide with the namign scheme ie dropping capitalization and Fingerprints. might need an alias/deprecation for the morgan one

mordred check in edbo - can this be used for any fingeprint (before was mordred and rdkit)

Yes for now

Consider making Fingerprint enum a class to make code prettier (see TODOs in enum code)

not sure what you mean but Adrian raised the one point: If we generate the enums automatically, would that destroy the tab completion when I type SubstanceEncoding.<TAB>? Can you check? If so we should not generate the encoding automatically int his PR and leave it for a potential upcoming solution.

@Scienfitz
Copy link
Collaborator

Regarding Errors

AssertionError: ('Comp: ', 699840, 563760)

I suspect the missing dtype cast messes with the size estimation vs actual size in the memory test. E.g. if a fingeprrint returns some of their columns as int the estimation that these are all floats32 doesnt hold anymore

The algorithm failed to converge because the input matrix is ill-conditioned or has too many repeated eigenvalues (error code: 2). (and all other errors re numericals like decomposion, ill defined matrix etc)

Ingore, they appear 40% of the time at random

baybe.exceptions.NotEnoughPointsLeftError: Using the current settings, there are fewer than 3 possible data points left to recommend. This can be either because all data points have been measured at some point (while 'a...

No clear idea. Seems like the overall contruction of the parameter computational representation comp_df is not correct. Did you look at some of those (and compare eg with the one you get from a non substance parameter) ?

Scienfitz added a commit that referenced this issue Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature New functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants