In [2]:
import pandas as pd
import lazyqsar as lq

activities = ['antiinfective','antiinflammatory','antineoplastic','cardio',
              'cns','dermatologic','gastrointestinal','hematologic',
              'lipidregulating','reproductivecontrol','respiratorysystem','urological']

In [3]:
for activity in activities:

    # Get actives
    actives = sorted(set(pd.read_csv(f"./{activity}_smiles_rmsaltol.csv", header=None)[0].tolist()))

    # Get inactives
    inactives = []
    for a in activities:
        if a != activity:
            inactives.extend(sorted(set(pd.read_csv(f"./{a}_smiles_rmsaltol.csv", header=None)[0].tolist())))
    inactives = sorted(set([i for i in inactives if i not in actives]))

    print(activity, len(actives), len(inactives))
    
    # Run LQ
    X = actives + inactives
    Y = [1]*len(actives) + [0]*len(inactives)
    model = lq.LazyBinaryQSAR(descriptor_type='chemeleon', 
                              model_type='random_forest', 
                              num_trials=20,
                              pca=False)  
    model.fit(X, Y)
    model.save_model(f"../../../checkpoints/{activity}")



antiinfective 2398 4550
Fitting inputs to feature descriptors using chemeleon
Transforming inputs to feature descriptors using chemeleon


Transforming CheMeleon descriptors in chunks of 100:   0%|          | 0/70 [00:00<?, ?it/s][15:37:53] Conflicting single bond directions around double bond at index 1.
[15:37:53]   BondStereo set to STEREONONE and single bond directions set to NONE.
[15:37:53] Conflicting single bond directions around double bond at index 1.
[15:37:53]   BondStereo set to STEREONONE and single bond directions set to NONE.
[15:37:53] Conflicting single bond directions around double bond at index 1.
[15:37:53]   BondStereo set to STEREONONE and single bond directions set to NONE.
[15:37:53] Conflicting single bond directions around double bond at index 1.
[15:37:53]   BondStereo set to STEREONONE and single bond directions set to NONE.
[15:37:53] Conflicting single bond directions around double bond at index 1.
[15:37:53]   BondStereo set to STEREONONE and single bond directions set to NONE.
[15:37:53] Conflicting single bond directions around double bond at index 1.
[15:37:53]   BondStereo set to STEREO

Performing predictions on input feature of shape: (6948, 2048)
Quickly deciding the max number of samples to use for the binary classifier.
Decided to use max samples: 10000
Total samples: 6948, positive samples: 2398, negative samples: 4550
Maximum samples per partition: 10000, minimum samples per partition: 30
Positive proportion: 0.35
Original positive samples: 2398, total samples: 6948
Maximum samples: 10000
Sampling 2398 positive and 4550 negative samples from 6948 total samples.


  1%|          | 10/1000 [00:00<00:17, 55.04it/s]


Unique sampled indices matrix shape: (1, 6948)


100%|██████████| 1/1 [00:00<00:00,  2.41it/s]


Indices matrix shape after redundancy removal: (1, 6948)
Original positive negative balance: positive 2398, negative 4550
Avg positive samples: 2398.0, avg negative samples: 4550.0


INFO:flaml.default.suggest:metafeature distance: 2.3931406147607404
[I 2025-07-28 15:39:29,573] A new study created in memory with name: no-name-bde96cda-0281-4951-b19f-1feba3d78acd


Fitting model on 6948 samples, positive samples: 2398, negative samples: 4550, number of features 1947
Suggested zero-shot hyperparameters: {'n_estimators': 501, 'max_features': 0.24484242524861066, 'criterion': 'entropy', 'max_leaf_nodes': 1156, 'random_state': 12032022, 'verbose': 0, 'class_weight': 'balanced_subsample'}
Fitting...


[I 2025-07-28 15:43:19,078] Trial 0 finished with value: 0.9821863090002932 and parameters: {'n_estimators': 501, 'max_features': 0.24484242524861066, 'max_leaf_nodes': 1156, 'criterion': 'entropy'}. Best is trial 0 with value: 0.9821863090002932.


Best hyperparameters: {'n_estimators': 501, 'max_features': 0.24484242524861066, 'max_leaf_nodes': 1156, 'criterion': 'entropy', 'n_jobs': 8, 'random_state': 42, 'class_weight': 'balanced_subsample'}, Inner hyperparameter AUROC: 0.9821863090002932
Internal AUROC CV-0: 0.9811594840222809
Internal AUROC CV-1: 0.9787613603048959
Internal AUROC CV-2: 0.980156112576957
Logistic regression for calibration...
Calibration based on logistic regression fit done.
Average AUROC: 0.9800256523013781




Model fitted.
Fitting completed in 543.60 seconds.
LazyQSAR Saving model to ../../../checkpoints/antiinfective
Model directory already exists: ../../../checkpoints/antiinfective, deleting it...
Creating model directory: ../../../checkpoints/antiinfective
Saving reducer to ../../../checkpoints/antiinfective/partition_000/reducer.joblib
Saving model to ../../../checkpoints/antiinfective/partition_000
Saving done!
antiinflammatory 373 6575
Fitting inputs to feature descriptors using chemeleon
Transforming inputs to feature descriptors using chemeleon


Transforming CheMeleon descriptors in chunks of 100:   6%|▌         | 4/70 [00:04<01:08,  1.03s/it][15:48:36] Conflicting single bond directions around double bond at index 1.
[15:48:36]   BondStereo set to STEREONONE and single bond directions set to NONE.
[15:48:36] Conflicting single bond directions around double bond at index 1.
[15:48:36]   BondStereo set to STEREONONE and single bond directions set to NONE.
[15:48:36] Conflicting single bond directions around double bond at index 1.
[15:48:36]   BondStereo set to STEREONONE and single bond directions set to NONE.
[15:48:36] Conflicting single bond directions around double bond at index 1.
[15:48:36]   BondStereo set to STEREONONE and single bond directions set to NONE.
[15:48:36] Conflicting single bond directions around double bond at index 1.
[15:48:36]   BondStereo set to STEREONONE and single bond directions set to NONE.
[15:48:36] Conflicting single bond directions around double bond at index 1.
[15:48:36]   BondStereo set t

Performing predictions on input feature of shape: (6948, 2048)
Quickly deciding the max number of samples to use for the binary classifier.
Decided to use max samples: 10000
Total samples: 6948, positive samples: 373, negative samples: 6575
Maximum samples per partition: 10000, minimum samples per partition: 30
Positive proportion: 0.05
Original positive samples: 373, total samples: 6948
Maximum samples: 10000
Sampling 373 positive and 6575 negative samples from 6948 total samples.


  1%|          | 10/1000 [00:00<00:16, 59.37it/s]


Unique sampled indices matrix shape: (1, 6948)


100%|██████████| 1/1 [00:00<00:00,  2.92it/s]


Indices matrix shape after redundancy removal: (1, 6948)
Original positive negative balance: positive 373, negative 6575
Avg positive samples: 373.0, avg negative samples: 6575.0


INFO:flaml.default.suggest:metafeature distance: 2.3931406147607404
[I 2025-07-28 15:50:03,406] A new study created in memory with name: no-name-5b0c5820-7122-4647-af3b-13cb6e00d99b


Fitting model on 6948 samples, positive samples: 373, negative samples: 6575, number of features 1947
Suggested zero-shot hyperparameters: {'n_estimators': 501, 'max_features': 0.24484242524861066, 'criterion': 'entropy', 'max_leaf_nodes': 1156, 'random_state': 12032022, 'verbose': 0, 'class_weight': 'balanced_subsample'}
Fitting...


[I 2025-07-28 15:52:46,165] Trial 0 finished with value: 0.9724315202888313 and parameters: {'n_estimators': 501, 'max_features': 0.24484242524861066, 'max_leaf_nodes': 1156, 'criterion': 'entropy'}. Best is trial 0 with value: 0.9724315202888313.


Best hyperparameters: {'n_estimators': 501, 'max_features': 0.24484242524861066, 'max_leaf_nodes': 1156, 'criterion': 'entropy', 'n_jobs': 8, 'random_state': 42, 'class_weight': 'balanced_subsample'}, Inner hyperparameter AUROC: 0.9724315202888313
Internal AUROC CV-0: 0.9683273160139183
Internal AUROC CV-1: 0.967051905920519
Internal AUROC CV-2: 0.9586178478926302
Logistic regression for calibration...
Calibration based on logistic regression fit done.
Average AUROC: 0.9646656899423558




Model fitted.
Fitting completed in 364.77 seconds.
LazyQSAR Saving model to ../../../checkpoints/antiinflammatory
Creating model directory: ../../../checkpoints/antiinflammatory
Saving reducer to ../../../checkpoints/antiinflammatory/partition_000/reducer.joblib
Saving model to ../../../checkpoints/antiinflammatory/partition_000
Saving done!
antineoplastic 1177 5771
Fitting inputs to feature descriptors using chemeleon
Transforming inputs to feature descriptors using chemeleon


Transforming CheMeleon descriptors in chunks of 100:  17%|█▋        | 12/70 [00:14<00:52,  1.11it/s][15:56:22] Conflicting single bond directions around double bond at index 1.
[15:56:22]   BondStereo set to STEREONONE and single bond directions set to NONE.
[15:56:22] Conflicting single bond directions around double bond at index 1.
[15:56:22]   BondStereo set to STEREONONE and single bond directions set to NONE.
[15:56:22] Conflicting single bond directions around double bond at index 1.
[15:56:22]   BondStereo set to STEREONONE and single bond directions set to NONE.
[15:56:22] Conflicting single bond directions around double bond at index 1.
[15:56:22]   BondStereo set to STEREONONE and single bond directions set to NONE.
[15:56:22] Conflicting single bond directions around double bond at index 1.
[15:56:22]   BondStereo set to STEREONONE and single bond directions set to NONE.
[15:56:22] Conflicting single bond directions around double bond at index 1.
[15:56:22]   BondStereo set 

Performing predictions on input feature of shape: (6948, 2048)
Quickly deciding the max number of samples to use for the binary classifier.
Decided to use max samples: 10000
Total samples: 6948, positive samples: 1177, negative samples: 5771
Maximum samples per partition: 10000, minimum samples per partition: 30
Positive proportion: 0.17
Original positive samples: 1177, total samples: 6948
Maximum samples: 10000
Sampling 1177 positive and 5771 negative samples from 6948 total samples.


  1%|          | 10/1000 [00:00<00:17, 56.23it/s]


Unique sampled indices matrix shape: (1, 6948)


100%|██████████| 1/1 [00:00<00:00,  2.98it/s]


Indices matrix shape after redundancy removal: (1, 6948)
Original positive negative balance: positive 1177, negative 5771
Avg positive samples: 1177.0, avg negative samples: 5771.0


INFO:flaml.default.suggest:metafeature distance: 2.3931406147607404
[I 2025-07-28 15:57:32,850] A new study created in memory with name: no-name-f5ef7b41-0bea-4bb5-abe7-eea0deefb5fc


Fitting model on 6948 samples, positive samples: 1177, negative samples: 5771, number of features 1947
Suggested zero-shot hyperparameters: {'n_estimators': 501, 'max_features': 0.24484242524861066, 'criterion': 'entropy', 'max_leaf_nodes': 1156, 'random_state': 12032022, 'verbose': 0, 'class_weight': 'balanced_subsample'}
Fitting...


[I 2025-07-28 16:01:53,930] Trial 0 finished with value: 0.9732931518645804 and parameters: {'n_estimators': 501, 'max_features': 0.24484242524861066, 'max_leaf_nodes': 1156, 'criterion': 'entropy'}. Best is trial 0 with value: 0.9732931518645804.


Best hyperparameters: {'n_estimators': 501, 'max_features': 0.24484242524861066, 'max_leaf_nodes': 1156, 'criterion': 'entropy', 'n_jobs': 8, 'random_state': 42, 'class_weight': 'balanced_subsample'}, Inner hyperparameter AUROC: 0.9732931518645804
Internal AUROC CV-0: 0.9758816901674044
Internal AUROC CV-1: 0.9805688734260164
Internal AUROC CV-2: 0.9714325785754356
Logistic regression for calibration...
Calibration based on logistic regression fit done.
Average AUROC: 0.9759610473896189




Model fitted.
Fitting completed in 587.14 seconds.
LazyQSAR Saving model to ../../../checkpoints/antineoplastic
Creating model directory: ../../../checkpoints/antineoplastic
Saving reducer to ../../../checkpoints/antineoplastic/partition_000/reducer.joblib
Saving model to ../../../checkpoints/antineoplastic/partition_000
Saving done!
cardio 788 6160
Fitting inputs to feature descriptors using chemeleon
Transforming inputs to feature descriptors using chemeleon


Transforming CheMeleon descriptors in chunks of 100:  11%|█▏        | 8/70 [00:09<01:12,  1.16s/it][16:07:29] Conflicting single bond directions around double bond at index 1.
[16:07:29]   BondStereo set to STEREONONE and single bond directions set to NONE.
[16:07:29] Conflicting single bond directions around double bond at index 1.
[16:07:29]   BondStereo set to STEREONONE and single bond directions set to NONE.
[16:07:29] Conflicting single bond directions around double bond at index 1.
[16:07:29]   BondStereo set to STEREONONE and single bond directions set to NONE.
[16:07:29] Conflicting single bond directions around double bond at index 1.
[16:07:29]   BondStereo set to STEREONONE and single bond directions set to NONE.
[16:07:29] Conflicting single bond directions around double bond at index 1.
[16:07:29]   BondStereo set to STEREONONE and single bond directions set to NONE.
[16:07:29] Conflicting single bond directions around double bond at index 1.
[16:07:29]   BondStereo set t

Performing predictions on input feature of shape: (6948, 2048)
Quickly deciding the max number of samples to use for the binary classifier.
Decided to use max samples: 10000
Total samples: 6948, positive samples: 788, negative samples: 6160
Maximum samples per partition: 10000, minimum samples per partition: 30
Positive proportion: 0.11
Original positive samples: 788, total samples: 6948
Maximum samples: 10000
Sampling 788 positive and 6160 negative samples from 6948 total samples.


  1%|          | 10/1000 [00:00<00:19, 49.59it/s]


Unique sampled indices matrix shape: (1, 6948)


100%|██████████| 1/1 [00:00<00:00,  2.23it/s]


Indices matrix shape after redundancy removal: (1, 6948)
Original positive negative balance: positive 788, negative 6160
Avg positive samples: 788.0, avg negative samples: 6160.0


INFO:flaml.default.suggest:metafeature distance: 2.3931406147607404
[I 2025-07-28 16:08:51,247] A new study created in memory with name: no-name-c65053fd-2cc5-4cd9-b519-b287c023f724


Fitting model on 6948 samples, positive samples: 788, negative samples: 6160, number of features 1947
Suggested zero-shot hyperparameters: {'n_estimators': 501, 'max_features': 0.24484242524861066, 'criterion': 'entropy', 'max_leaf_nodes': 1156, 'random_state': 12032022, 'verbose': 0, 'class_weight': 'balanced_subsample'}
Fitting...


[I 2025-07-28 16:12:14,819] Trial 0 finished with value: 0.9640923374425913 and parameters: {'n_estimators': 501, 'max_features': 0.24484242524861066, 'max_leaf_nodes': 1156, 'criterion': 'entropy'}. Best is trial 0 with value: 0.9640923374425913.


Best hyperparameters: {'n_estimators': 501, 'max_features': 0.24484242524861066, 'max_leaf_nodes': 1156, 'criterion': 'entropy', 'n_jobs': 8, 'random_state': 42, 'class_weight': 'balanced_subsample'}, Inner hyperparameter AUROC: 0.9640923374425913
Internal AUROC CV-0: 0.9575927879227373
Internal AUROC CV-1: 0.9552739139033557
Internal AUROC CV-2: 0.9710824708286636
Logistic regression for calibration...
Calibration based on logistic regression fit done.
Average AUROC: 0.9613163908849188




Model fitted.
Fitting completed in 514.07 seconds.
LazyQSAR Saving model to ../../../checkpoints/cardio
Creating model directory: ../../../checkpoints/cardio
Saving reducer to ../../../checkpoints/cardio/partition_000/reducer.joblib
Saving model to ../../../checkpoints/cardio/partition_000
Saving done!
cns 1139 5809
Fitting inputs to feature descriptors using chemeleon
Transforming inputs to feature descriptors using chemeleon


Transforming CheMeleon descriptors in chunks of 100:  17%|█▋        | 12/70 [00:11<00:55,  1.04it/s][16:17:36] Conflicting single bond directions around double bond at index 1.
[16:17:36]   BondStereo set to STEREONONE and single bond directions set to NONE.
[16:17:36] Conflicting single bond directions around double bond at index 1.
[16:17:36]   BondStereo set to STEREONONE and single bond directions set to NONE.
[16:17:36] Conflicting single bond directions around double bond at index 1.
[16:17:36]   BondStereo set to STEREONONE and single bond directions set to NONE.
[16:17:36] Conflicting single bond directions around double bond at index 1.
[16:17:36]   BondStereo set to STEREONONE and single bond directions set to NONE.
[16:17:36] Conflicting single bond directions around double bond at index 1.
[16:17:36]   BondStereo set to STEREONONE and single bond directions set to NONE.
[16:17:36] Conflicting single bond directions around double bond at index 1.
[16:17:36]   BondStereo set 

Performing predictions on input feature of shape: (6948, 2048)
Quickly deciding the max number of samples to use for the binary classifier.
Decided to use max samples: 10000
Total samples: 6948, positive samples: 1139, negative samples: 5809
Maximum samples per partition: 10000, minimum samples per partition: 30
Positive proportion: 0.16
Original positive samples: 1139, total samples: 6948
Maximum samples: 10000
Sampling 1139 positive and 5809 negative samples from 6948 total samples.


  1%|          | 10/1000 [00:00<00:34, 28.80it/s]


Unique sampled indices matrix shape: (1, 6948)


100%|██████████| 1/1 [00:00<00:00,  2.16it/s]


Indices matrix shape after redundancy removal: (1, 6948)
Original positive negative balance: positive 1139, negative 5809
Avg positive samples: 1139.0, avg negative samples: 5809.0


INFO:flaml.default.suggest:metafeature distance: 2.3931406147607404
[I 2025-07-28 16:19:49,647] A new study created in memory with name: no-name-033ab2ac-8bb6-4ce4-b17a-1374b5a106a3


Fitting model on 6948 samples, positive samples: 1139, negative samples: 5809, number of features 1947
Suggested zero-shot hyperparameters: {'n_estimators': 501, 'max_features': 0.24484242524861066, 'criterion': 'entropy', 'max_leaf_nodes': 1156, 'random_state': 12032022, 'verbose': 0, 'class_weight': 'balanced_subsample'}
Fitting...


[I 2025-07-28 16:22:48,461] Trial 0 finished with value: 0.9739089459185153 and parameters: {'n_estimators': 501, 'max_features': 0.24484242524861066, 'max_leaf_nodes': 1156, 'criterion': 'entropy'}. Best is trial 0 with value: 0.9739089459185153.


Best hyperparameters: {'n_estimators': 501, 'max_features': 0.24484242524861066, 'max_leaf_nodes': 1156, 'criterion': 'entropy', 'n_jobs': 8, 'random_state': 42, 'class_weight': 'balanced_subsample'}, Inner hyperparameter AUROC: 0.9739089459185153
Internal AUROC CV-0: 0.9700352810400656
Internal AUROC CV-1: 0.9745179063360881
Internal AUROC CV-2: 0.9737567058141221
Logistic regression for calibration...
Calibration based on logistic regression fit done.
Average AUROC: 0.9727699643967586




Model fitted.
Fitting completed in 438.93 seconds.
LazyQSAR Saving model to ../../../checkpoints/cns
Creating model directory: ../../../checkpoints/cns
Saving reducer to ../../../checkpoints/cns/partition_000/reducer.joblib
Saving model to ../../../checkpoints/cns/partition_000
Saving done!
dermatologic 116 6832
Fitting inputs to feature descriptors using chemeleon
Transforming inputs to feature descriptors using chemeleon


Transforming CheMeleon descriptors in chunks of 100:   3%|▎         | 2/70 [00:01<01:08,  1.01s/it][16:27:09] Conflicting single bond directions around double bond at index 1.
[16:27:09]   BondStereo set to STEREONONE and single bond directions set to NONE.
[16:27:09] Conflicting single bond directions around double bond at index 1.
[16:27:09]   BondStereo set to STEREONONE and single bond directions set to NONE.
[16:27:09] Conflicting single bond directions around double bond at index 1.
[16:27:09]   BondStereo set to STEREONONE and single bond directions set to NONE.
[16:27:09] Conflicting single bond directions around double bond at index 1.
[16:27:09]   BondStereo set to STEREONONE and single bond directions set to NONE.
[16:27:09] Conflicting single bond directions around double bond at index 1.
[16:27:09]   BondStereo set to STEREONONE and single bond directions set to NONE.
[16:27:09] Conflicting single bond directions around double bond at index 1.
[16:27:09]   BondStereo set t

Performing predictions on input feature of shape: (6948, 2048)
Quickly deciding the max number of samples to use for the binary classifier.
Decided to use max samples: 10000
Total samples: 6948, positive samples: 116, negative samples: 6832
Maximum samples per partition: 10000, minimum samples per partition: 30
Positive proportion: 0.02
Original positive samples: 116, total samples: 6948
Maximum samples: 10000
Sampling 116 positive and 6832 negative samples from 6948 total samples.


  1%|          | 10/1000 [00:00<00:18, 54.99it/s]


Unique sampled indices matrix shape: (1, 6948)


100%|██████████| 1/1 [00:00<00:00,  3.23it/s]


Indices matrix shape after redundancy removal: (1, 6948)
Original positive negative balance: positive 116, negative 6832
Avg positive samples: 116.0, avg negative samples: 6832.0


INFO:flaml.default.suggest:metafeature distance: 2.3931406147607404
[I 2025-07-28 16:28:35,709] A new study created in memory with name: no-name-a93fe3ad-4474-47a2-9c3f-f2abcda7c1a9


Fitting model on 6948 samples, positive samples: 116, negative samples: 6832, number of features 1947
Suggested zero-shot hyperparameters: {'n_estimators': 501, 'max_features': 0.24484242524861066, 'criterion': 'entropy', 'max_leaf_nodes': 1156, 'random_state': 12032022, 'verbose': 0, 'class_weight': 'balanced_subsample'}
Fitting...


[I 2025-07-28 16:30:57,494] Trial 0 finished with value: 0.9872775848609653 and parameters: {'n_estimators': 501, 'max_features': 0.24484242524861066, 'max_leaf_nodes': 1156, 'criterion': 'entropy'}. Best is trial 0 with value: 0.9872775848609653.


Best hyperparameters: {'n_estimators': 501, 'max_features': 0.24484242524861066, 'max_leaf_nodes': 1156, 'criterion': 'entropy', 'n_jobs': 8, 'random_state': 42, 'class_weight': 'balanced_subsample'}, Inner hyperparameter AUROC: 0.9872775848609653
Internal AUROC CV-0: 0.9829504158927562
Internal AUROC CV-1: 0.9391504481951062
Internal AUROC CV-2: 0.9557558749899056
Logistic regression for calibration...
Calibration based on logistic regression fit done.
Average AUROC: 0.9592855796925894




Model fitted.
Fitting completed in 307.92 seconds.
LazyQSAR Saving model to ../../../checkpoints/dermatologic
Creating model directory: ../../../checkpoints/dermatologic
Saving reducer to ../../../checkpoints/dermatologic/partition_000/reducer.joblib
Saving model to ../../../checkpoints/dermatologic/partition_000
Saving done!
gastrointestinal 258 6690
Fitting inputs to feature descriptors using chemeleon
Transforming inputs to feature descriptors using chemeleon


Transforming CheMeleon descriptors in chunks of 100:   4%|▍         | 3/70 [00:03<01:09,  1.04s/it][16:33:46] Conflicting single bond directions around double bond at index 1.
[16:33:46]   BondStereo set to STEREONONE and single bond directions set to NONE.
[16:33:46] Conflicting single bond directions around double bond at index 1.
[16:33:46]   BondStereo set to STEREONONE and single bond directions set to NONE.
[16:33:46] Conflicting single bond directions around double bond at index 1.
[16:33:46]   BondStereo set to STEREONONE and single bond directions set to NONE.
[16:33:46] Conflicting single bond directions around double bond at index 1.
[16:33:46]   BondStereo set to STEREONONE and single bond directions set to NONE.
[16:33:46] Conflicting single bond directions around double bond at index 1.
[16:33:46]   BondStereo set to STEREONONE and single bond directions set to NONE.
[16:33:46] Conflicting single bond directions around double bond at index 1.
[16:33:46]   BondStereo set t

Performing predictions on input feature of shape: (6948, 2048)
Quickly deciding the max number of samples to use for the binary classifier.
Decided to use max samples: 10000
Total samples: 6948, positive samples: 258, negative samples: 6690
Maximum samples per partition: 10000, minimum samples per partition: 30
Positive proportion: 0.04
Original positive samples: 258, total samples: 6948
Maximum samples: 10000
Sampling 258 positive and 6690 negative samples from 6948 total samples.


  1%|          | 10/1000 [00:00<00:17, 57.94it/s]


Unique sampled indices matrix shape: (1, 6948)


100%|██████████| 1/1 [00:00<00:00,  3.33it/s]


Indices matrix shape after redundancy removal: (1, 6948)
Original positive negative balance: positive 258, negative 6690
Avg positive samples: 258.0, avg negative samples: 6690.0


INFO:flaml.default.suggest:metafeature distance: 2.3931406147607404
[I 2025-07-28 16:35:11,964] A new study created in memory with name: no-name-939956ad-6940-4ed9-93bd-301f13cac715


Fitting model on 6948 samples, positive samples: 258, negative samples: 6690, number of features 1947
Suggested zero-shot hyperparameters: {'n_estimators': 501, 'max_features': 0.24484242524861066, 'criterion': 'entropy', 'max_leaf_nodes': 1156, 'random_state': 12032022, 'verbose': 0, 'class_weight': 'balanced_subsample'}
Fitting...


[I 2025-07-28 16:38:27,689] Trial 0 finished with value: 0.9502982416816099 and parameters: {'n_estimators': 501, 'max_features': 0.24484242524861066, 'max_leaf_nodes': 1156, 'criterion': 'entropy'}. Best is trial 0 with value: 0.9502982416816099.


Best hyperparameters: {'n_estimators': 501, 'max_features': 0.24484242524861066, 'max_leaf_nodes': 1156, 'criterion': 'entropy', 'n_jobs': 8, 'random_state': 42, 'class_weight': 'balanced_subsample'}, Inner hyperparameter AUROC: 0.9502982416816099
Internal AUROC CV-0: 0.951266437537358
Internal AUROC CV-1: 0.9645892483562463
Internal AUROC CV-2: 0.9656492827256425
Logistic regression for calibration...
Calibration based on logistic regression fit done.
Average AUROC: 0.9605016562064156




Model fitted.
Fitting completed in 397.50 seconds.
LazyQSAR Saving model to ../../../checkpoints/gastrointestinal
Creating model directory: ../../../checkpoints/gastrointestinal
Saving reducer to ../../../checkpoints/gastrointestinal/partition_000/reducer.joblib
Saving model to ../../../checkpoints/gastrointestinal/partition_000
Saving done!
hematologic 267 6681
Fitting inputs to feature descriptors using chemeleon
Transforming inputs to feature descriptors using chemeleon


Transforming CheMeleon descriptors in chunks of 100:   4%|▍         | 3/70 [00:04<01:26,  1.29s/it][16:41:53] Conflicting single bond directions around double bond at index 1.
[16:41:53]   BondStereo set to STEREONONE and single bond directions set to NONE.
[16:41:53] Conflicting single bond directions around double bond at index 1.
[16:41:53]   BondStereo set to STEREONONE and single bond directions set to NONE.
[16:41:53] Conflicting single bond directions around double bond at index 1.
[16:41:53]   BondStereo set to STEREONONE and single bond directions set to NONE.
[16:41:53] Conflicting single bond directions around double bond at index 1.
[16:41:53]   BondStereo set to STEREONONE and single bond directions set to NONE.
[16:41:53] Conflicting single bond directions around double bond at index 1.
[16:41:53]   BondStereo set to STEREONONE and single bond directions set to NONE.
[16:41:53] Conflicting single bond directions around double bond at index 1.
[16:41:53]   BondStereo set t

Performing predictions on input feature of shape: (6948, 2048)
Quickly deciding the max number of samples to use for the binary classifier.
Decided to use max samples: 10000
Total samples: 6948, positive samples: 267, negative samples: 6681
Maximum samples per partition: 10000, minimum samples per partition: 30
Positive proportion: 0.04
Original positive samples: 267, total samples: 6948
Maximum samples: 10000
Sampling 267 positive and 6681 negative samples from 6948 total samples.


  1%|          | 10/1000 [00:00<00:21, 46.25it/s]


Unique sampled indices matrix shape: (1, 6948)


100%|██████████| 1/1 [00:00<00:00,  3.14it/s]


Indices matrix shape after redundancy removal: (1, 6948)
Original positive negative balance: positive 267, negative 6681
Avg positive samples: 267.0, avg negative samples: 6681.0


INFO:flaml.default.suggest:metafeature distance: 2.3931406147607404
[I 2025-07-28 16:43:21,210] A new study created in memory with name: no-name-daed9394-6cab-4ad1-ac96-f4a7c47ad7ee


Fitting model on 6948 samples, positive samples: 267, negative samples: 6681, number of features 1947
Suggested zero-shot hyperparameters: {'n_estimators': 501, 'max_features': 0.24484242524861066, 'criterion': 'entropy', 'max_leaf_nodes': 1156, 'random_state': 12032022, 'verbose': 0, 'class_weight': 'balanced_subsample'}
Fitting...


[I 2025-07-28 16:46:01,825] Trial 0 finished with value: 0.9667545506002919 and parameters: {'n_estimators': 501, 'max_features': 0.24484242524861066, 'max_leaf_nodes': 1156, 'criterion': 'entropy'}. Best is trial 0 with value: 0.9667545506002919.


Best hyperparameters: {'n_estimators': 501, 'max_features': 0.24484242524861066, 'max_leaf_nodes': 1156, 'criterion': 'entropy', 'n_jobs': 8, 'random_state': 42, 'class_weight': 'balanced_subsample'}, Inner hyperparameter AUROC: 0.9667545506002919
Internal AUROC CV-0: 0.9653364911967112
Internal AUROC CV-1: 0.9740727500223434
Internal AUROC CV-2: 0.9544061131468407
Logistic regression for calibration...
Calibration based on logistic regression fit done.
Average AUROC: 0.9646051181219649




Model fitted.
Fitting completed in 373.30 seconds.
LazyQSAR Saving model to ../../../checkpoints/hematologic
Creating model directory: ../../../checkpoints/hematologic
Saving reducer to ../../../checkpoints/hematologic/partition_000/reducer.joblib
Saving model to ../../../checkpoints/hematologic/partition_000
Saving done!
lipidregulating 164 6784
Fitting inputs to feature descriptors using chemeleon
Transforming inputs to feature descriptors using chemeleon


Transforming CheMeleon descriptors in chunks of 100:   3%|▎         | 2/70 [00:02<01:12,  1.07s/it][16:49:35] Conflicting single bond directions around double bond at index 1.
[16:49:35]   BondStereo set to STEREONONE and single bond directions set to NONE.
[16:49:35] Conflicting single bond directions around double bond at index 1.
[16:49:35]   BondStereo set to STEREONONE and single bond directions set to NONE.
[16:49:35] Conflicting single bond directions around double bond at index 1.
[16:49:35]   BondStereo set to STEREONONE and single bond directions set to NONE.
[16:49:35] Conflicting single bond directions around double bond at index 1.
[16:49:35]   BondStereo set to STEREONONE and single bond directions set to NONE.
[16:49:36] Conflicting single bond directions around double bond at index 1.
[16:49:36]   BondStereo set to STEREONONE and single bond directions set to NONE.
[16:49:36] Conflicting single bond directions around double bond at index 1.
[16:49:36]   BondStereo set t

Performing predictions on input feature of shape: (6948, 2048)
Quickly deciding the max number of samples to use for the binary classifier.
Decided to use max samples: 10000
Total samples: 6948, positive samples: 164, negative samples: 6784
Maximum samples per partition: 10000, minimum samples per partition: 30
Positive proportion: 0.02
Original positive samples: 164, total samples: 6948
Maximum samples: 10000
Sampling 164 positive and 6784 negative samples from 6948 total samples.


  1%|          | 10/1000 [00:00<00:18, 52.54it/s]

Unique sampled indices matrix shape: (1, 6948)







100%|██████████| 1/1 [00:00<00:00,  3.19it/s]


Indices matrix shape after redundancy removal: (1, 6948)
Original positive negative balance: positive 164, negative 6784
Avg positive samples: 164.0, avg negative samples: 6784.0


INFO:flaml.default.suggest:metafeature distance: 2.3931406147607404
[I 2025-07-28 16:51:02,855] A new study created in memory with name: no-name-8ecb1f2f-0ebc-4ea1-a402-8e1e6ad3ebc0


Fitting model on 6948 samples, positive samples: 164, negative samples: 6784, number of features 1947
Suggested zero-shot hyperparameters: {'n_estimators': 501, 'max_features': 0.24484242524861066, 'criterion': 'entropy', 'max_leaf_nodes': 1156, 'random_state': 12032022, 'verbose': 0, 'class_weight': 'balanced_subsample'}
Fitting...


[I 2025-07-28 16:53:17,104] Trial 0 finished with value: 0.9714632228869459 and parameters: {'n_estimators': 501, 'max_features': 0.24484242524861066, 'max_leaf_nodes': 1156, 'criterion': 'entropy'}. Best is trial 0 with value: 0.9714632228869459.


Best hyperparameters: {'n_estimators': 501, 'max_features': 0.24484242524861066, 'max_leaf_nodes': 1156, 'criterion': 'entropy', 'n_jobs': 8, 'random_state': 42, 'class_weight': 'balanced_subsample'}, Inner hyperparameter AUROC: 0.9714632228869459
Internal AUROC CV-0: 0.9739199838932352
Internal AUROC CV-1: 0.9745815117349287
Internal AUROC CV-2: 0.9893580303727565
Logistic regression for calibration...
Calibration based on logistic regression fit done.
Average AUROC: 0.9792865086669735




Model fitted.
Fitting completed in 292.02 seconds.
LazyQSAR Saving model to ../../../checkpoints/lipidregulating
Creating model directory: ../../../checkpoints/lipidregulating
Saving reducer to ../../../checkpoints/lipidregulating/partition_000/reducer.joblib
Saving model to ../../../checkpoints/lipidregulating/partition_000
Saving done!
reproductivecontrol 148 6800
Fitting inputs to feature descriptors using chemeleon
Transforming inputs to feature descriptors using chemeleon


Transforming CheMeleon descriptors in chunks of 100:   1%|▏         | 1/70 [00:01<01:48,  1.57s/it][16:55:55] Conflicting single bond directions around double bond at index 1.
[16:55:55]   BondStereo set to STEREONONE and single bond directions set to NONE.
[16:55:55] Conflicting single bond directions around double bond at index 1.
[16:55:55]   BondStereo set to STEREONONE and single bond directions set to NONE.
[16:55:55] Conflicting single bond directions around double bond at index 1.
[16:55:55]   BondStereo set to STEREONONE and single bond directions set to NONE.
[16:55:55] Conflicting single bond directions around double bond at index 1.
[16:55:55]   BondStereo set to STEREONONE and single bond directions set to NONE.
Transforming CheMeleon descriptors in chunks of 100:   3%|▎         | 2/70 [00:02<01:29,  1.32s/it][16:55:56] Conflicting single bond directions around double bond at index 1.
[16:55:56]   BondStereo set to STEREONONE and single bond directions set to NONE.
[16:55:

Performing predictions on input feature of shape: (6948, 2048)
Quickly deciding the max number of samples to use for the binary classifier.
Decided to use max samples: 10000
Total samples: 6948, positive samples: 148, negative samples: 6800
Maximum samples per partition: 10000, minimum samples per partition: 30
Positive proportion: 0.02
Original positive samples: 148, total samples: 6948
Maximum samples: 10000
Sampling 148 positive and 6800 negative samples from 6948 total samples.


  1%|          | 10/1000 [00:00<00:17, 55.42it/s]


Unique sampled indices matrix shape: (1, 6948)


100%|██████████| 1/1 [00:00<00:00,  3.23it/s]


Indices matrix shape after redundancy removal: (1, 6948)
Original positive negative balance: positive 148, negative 6800
Avg positive samples: 148.0, avg negative samples: 6800.0


INFO:flaml.default.suggest:metafeature distance: 2.3931406147607404
[I 2025-07-28 16:57:25,654] A new study created in memory with name: no-name-cc1a8f83-ff8f-45a3-ae9a-8679838c0007


Fitting model on 6948 samples, positive samples: 148, negative samples: 6800, number of features 1947
Suggested zero-shot hyperparameters: {'n_estimators': 501, 'max_features': 0.24484242524861066, 'criterion': 'entropy', 'max_leaf_nodes': 1156, 'random_state': 12032022, 'verbose': 0, 'class_weight': 'balanced_subsample'}
Fitting...


[I 2025-07-28 16:59:18,779] Trial 0 finished with value: 0.992284048754637 and parameters: {'n_estimators': 501, 'max_features': 0.24484242524861066, 'max_leaf_nodes': 1156, 'criterion': 'entropy'}. Best is trial 0 with value: 0.992284048754637.
[I 2025-07-28 17:02:21,525] Trial 1 finished with value: 0.9879464758876524 and parameters: {'n_estimators': 476, 'max_features': 0.4253823424614421, 'max_leaf_nodes': 2890, 'criterion': 'entropy'}. Best is trial 0 with value: 0.992284048754637.


Best hyperparameters: {'n_estimators': 501, 'max_features': 0.24484242524861066, 'max_leaf_nodes': 1156, 'criterion': 'entropy', 'n_jobs': 8, 'random_state': 42, 'class_weight': 'balanced_subsample'}, Inner hyperparameter AUROC: 0.992284048754637
Internal AUROC CV-0: 0.9926788553259143
Internal AUROC CV-1: 0.9750158982511924
Internal AUROC CV-2: 0.9985691573926868
Logistic regression for calibration...
Calibration based on logistic regression fit done.
Average AUROC: 0.9887546369899312




Model fitted.
Fitting completed in 426.79 seconds.
LazyQSAR Saving model to ../../../checkpoints/reproductivecontrol
Creating model directory: ../../../checkpoints/reproductivecontrol
Saving reducer to ../../../checkpoints/reproductivecontrol/partition_000/reducer.joblib
Saving model to ../../../checkpoints/reproductivecontrol/partition_000
Saving done!
respiratorysystem 101 6847
Fitting inputs to feature descriptors using chemeleon
Transforming inputs to feature descriptors using chemeleon


Transforming CheMeleon descriptors in chunks of 100:   1%|▏         | 1/70 [00:01<01:34,  1.36s/it][17:04:33] Conflicting single bond directions around double bond at index 1.
[17:04:33]   BondStereo set to STEREONONE and single bond directions set to NONE.
[17:04:33] Conflicting single bond directions around double bond at index 1.
[17:04:33]   BondStereo set to STEREONONE and single bond directions set to NONE.
[17:04:33] Conflicting single bond directions around double bond at index 1.
[17:04:33]   BondStereo set to STEREONONE and single bond directions set to NONE.
[17:04:33] Conflicting single bond directions around double bond at index 1.
[17:04:33]   BondStereo set to STEREONONE and single bond directions set to NONE.
Transforming CheMeleon descriptors in chunks of 100:   3%|▎         | 2/70 [00:02<01:25,  1.26s/it][17:04:34] Conflicting single bond directions around double bond at index 1.
[17:04:34]   BondStereo set to STEREONONE and single bond directions set to NONE.
[17:04:

Performing predictions on input feature of shape: (6948, 2048)
Quickly deciding the max number of samples to use for the binary classifier.
Decided to use max samples: 10000
Total samples: 6948, positive samples: 101, negative samples: 6847
Maximum samples per partition: 10000, minimum samples per partition: 30
Positive proportion: 0.01
Original positive samples: 101, total samples: 6948
Maximum samples: 10000
Sampling 101 positive and 6847 negative samples from 6948 total samples.


  1%|          | 10/1000 [00:00<00:19, 49.58it/s]


Unique sampled indices matrix shape: (1, 6948)


100%|██████████| 1/1 [00:00<00:00,  3.16it/s]


Indices matrix shape after redundancy removal: (1, 6948)
Original positive negative balance: positive 101, negative 6847
Avg positive samples: 101.0, avg negative samples: 6847.0


INFO:flaml.default.suggest:metafeature distance: 2.3931406147607404
[I 2025-07-28 17:06:00,300] A new study created in memory with name: no-name-960af00e-416f-47ae-b2e8-6a14c4a84e60


Fitting model on 6948 samples, positive samples: 101, negative samples: 6847, number of features 1947
Suggested zero-shot hyperparameters: {'n_estimators': 501, 'max_features': 0.24484242524861066, 'criterion': 'entropy', 'max_leaf_nodes': 1156, 'random_state': 12032022, 'verbose': 0, 'class_weight': 'balanced_subsample'}
Fitting...


[I 2025-07-28 17:07:56,744] Trial 0 finished with value: 0.9852414330218068 and parameters: {'n_estimators': 501, 'max_features': 0.24484242524861066, 'max_leaf_nodes': 1156, 'criterion': 'entropy'}. Best is trial 0 with value: 0.9852414330218068.
[I 2025-07-28 17:11:11,092] Trial 1 finished with value: 0.9816355140186915 and parameters: {'n_estimators': 476, 'max_features': 0.4253823424614421, 'max_leaf_nodes': 2890, 'criterion': 'entropy'}. Best is trial 0 with value: 0.9852414330218068.


Best hyperparameters: {'n_estimators': 501, 'max_features': 0.24484242524861066, 'max_leaf_nodes': 1156, 'criterion': 'entropy', 'n_jobs': 8, 'random_state': 42, 'class_weight': 'balanced_subsample'}, Inner hyperparameter AUROC: 0.9852414330218068
Internal AUROC CV-0: 0.9649883177570093
Internal AUROC CV-1: 0.940233644859813
Internal AUROC CV-2: 0.8748364485981309
Logistic regression for calibration...
Calibration based on logistic regression fit done.
Average AUROC: 0.926686137071651
Model fitted.
Fitting completed in 459.44 seconds.
LazyQSAR Saving model to ../../../checkpoints/respiratorysystem
Creating model directory: ../../../checkpoints/respiratorysystem
Saving reducer to ../../../checkpoints/respiratorysystem/partition_000/reducer.joblib
Saving model to ../../../checkpoints/respiratorysystem/partition_000




Saving done!
urological 26 6922
Fitting inputs to feature descriptors using chemeleon
Transforming inputs to feature descriptors using chemeleon


Transforming CheMeleon descriptors in chunks of 100:   1%|▏         | 1/70 [00:00<01:03,  1.08it/s][17:13:40] Conflicting single bond directions around double bond at index 1.
[17:13:40]   BondStereo set to STEREONONE and single bond directions set to NONE.
[17:13:40] Conflicting single bond directions around double bond at index 1.
[17:13:40]   BondStereo set to STEREONONE and single bond directions set to NONE.
[17:13:40] Conflicting single bond directions around double bond at index 1.
[17:13:40]   BondStereo set to STEREONONE and single bond directions set to NONE.
[17:13:40] Conflicting single bond directions around double bond at index 1.
[17:13:40]   BondStereo set to STEREONONE and single bond directions set to NONE.
[17:13:40] Conflicting single bond directions around double bond at index 1.
[17:13:40]   BondStereo set to STEREONONE and single bond directions set to NONE.
[17:13:40] Conflicting single bond directions around double bond at index 1.
[17:13:40]   BondStereo set t

Performing predictions on input feature of shape: (6948, 2048)
Quickly deciding the max number of samples to use for the binary classifier.
Decided to use max samples: 10000
Total samples: 6948, positive samples: 26, negative samples: 6922
Maximum samples per partition: 10000, minimum samples per partition: 30
Positive proportion: 0.00
Original positive samples: 26, total samples: 6948
Maximum samples: 10000
Sampling 26 positive and 2574 negative samples from 2600 total samples.


  1%|          | 8/1000 [00:00<00:12, 81.02it/s]


All indices seen at least 1 times. Stopping sampling.
Unique sampled indices matrix shape: (9, 2600)


100%|██████████| 9/9 [00:01<00:00,  8.17it/s]
INFO:flaml.default.suggest:metafeature distance: 2.3932450513801595
[I 2025-07-28 17:15:08,743] A new study created in memory with name: no-name-4ab65e72-0b6c-4e59-b753-821cd6213d18


Indices matrix shape after redundancy removal: (3, 2600)
Original positive negative balance: positive 26, negative 6922
Avg positive samples: 26.0, avg negative samples: 2574.0
Fitting model on 2600 samples, positive samples: 26, negative samples: 2574, number of features 1947
Suggested zero-shot hyperparameters: {'n_estimators': 501, 'max_features': 0.24484242524861066, 'criterion': 'entropy', 'max_leaf_nodes': 1156, 'random_state': 12032022, 'verbose': 0, 'class_weight': 'balanced_subsample'}
Fitting...


[I 2025-07-28 17:15:34,401] Trial 0 finished with value: 0.9899068322981366 and parameters: {'n_estimators': 501, 'max_features': 0.24484242524861066, 'max_leaf_nodes': 1156, 'criterion': 'entropy'}. Best is trial 0 with value: 0.9899068322981366.
[I 2025-07-28 17:16:16,936] Trial 1 finished with value: 0.9886559696342306 and parameters: {'n_estimators': 476, 'max_features': 0.4253823424614421, 'max_leaf_nodes': 1408, 'criterion': 'entropy'}. Best is trial 0 with value: 0.9899068322981366.
[I 2025-07-28 17:16:29,784] Trial 2 finished with value: 0.9908126293995858 and parameters: {'n_estimators': 521, 'max_features': 0.1116027783762826, 'max_leaf_nodes': 698, 'criterion': 'entropy'}. Best is trial 2 with value: 0.9908126293995858.
[I 2025-07-28 17:17:03,751] Trial 3 finished with value: 0.983652518978606 and parameters: {'n_estimators': 412, 'max_features': 0.39200309009026957, 'max_leaf_nodes': 1200, 'criterion': 'entropy'}. Best is trial 2 with value: 0.9908126293995858.
[I 2025-07-2

Best hyperparameters: {'n_estimators': 543, 'max_features': 0.058127631650270836, 'max_leaf_nodes': 1880, 'criterion': 'entropy', 'n_jobs': 8, 'random_state': 42, 'class_weight': 'balanced_subsample'}, Inner hyperparameter AUROC: 0.9924516908212561
Internal AUROC CV-0: 0.9820134575569358
Internal AUROC CV-1: 0.9974120082815735
Internal AUROC CV-2: 0.8830227743271222
Logistic regression for calibration...
Calibration based on logistic regression fit done.
Average AUROC: 0.9541494133885439


INFO:flaml.default.suggest:metafeature distance: 2.3932450513801595
[I 2025-07-28 17:17:22,480] A new study created in memory with name: no-name-a410a0f0-383f-4c98-a644-a4970b27ad61


Model fitted.
Fitting model on 2600 samples, positive samples: 26, negative samples: 2574, number of features 1947
Suggested zero-shot hyperparameters: {'n_estimators': 501, 'max_features': 0.24484242524861066, 'criterion': 'entropy', 'max_leaf_nodes': 1156, 'random_state': 12032022, 'verbose': 0, 'class_weight': 'balanced_subsample'}
Fitting...


[I 2025-07-28 17:17:47,446] Trial 0 finished with value: 0.9892167011732229 and parameters: {'n_estimators': 501, 'max_features': 0.24484242524861066, 'max_leaf_nodes': 1156, 'criterion': 'entropy'}. Best is trial 0 with value: 0.9892167011732229.
[I 2025-07-28 17:18:28,993] Trial 1 finished with value: 0.9884834368530021 and parameters: {'n_estimators': 476, 'max_features': 0.4253823424614421, 'max_leaf_nodes': 1408, 'criterion': 'entropy'}. Best is trial 0 with value: 0.9892167011732229.
[I 2025-07-28 17:18:41,573] Trial 2 finished with value: 0.9922360248447205 and parameters: {'n_estimators': 521, 'max_features': 0.1116027783762826, 'max_leaf_nodes': 698, 'criterion': 'entropy'}. Best is trial 2 with value: 0.9922360248447205.
[I 2025-07-28 17:19:14,893] Trial 3 finished with value: 0.9901224982746722 and parameters: {'n_estimators': 412, 'max_features': 0.39200309009026957, 'max_leaf_nodes': 1200, 'criterion': 'entropy'}. Best is trial 2 with value: 0.9922360248447205.
[I 2025-07-

Early stopping: No significant improvement in the last 5 trials.
Best hyperparameters: {'n_estimators': 543, 'max_features': 0.058127631650270836, 'max_leaf_nodes': 1880, 'criterion': 'entropy', 'n_jobs': 8, 'random_state': 42, 'class_weight': 'balanced_subsample'}, Inner hyperparameter AUROC: 0.9922791580400275
Internal AUROC CV-0: 0.9934006211180124
Internal AUROC CV-1: 1.0
Internal AUROC CV-2: 0.8891045548654244
Logistic regression for calibration...
Calibration based on logistic regression fit done.
Average AUROC: 0.9608350586611456


INFO:flaml.default.suggest:metafeature distance: 2.3932450513801595
[I 2025-07-28 17:19:48,991] A new study created in memory with name: no-name-ca87a393-7859-40fd-8b14-33bf8f6e8624


Model fitted.
Fitting model on 2600 samples, positive samples: 26, negative samples: 2574, number of features 1947
Suggested zero-shot hyperparameters: {'n_estimators': 501, 'max_features': 0.24484242524861066, 'criterion': 'entropy', 'max_leaf_nodes': 1156, 'random_state': 12032022, 'verbose': 0, 'class_weight': 'balanced_subsample'}
Fitting...


[I 2025-07-28 17:20:14,897] Trial 0 finished with value: 0.9957298136645963 and parameters: {'n_estimators': 501, 'max_features': 0.24484242524861066, 'max_leaf_nodes': 1156, 'criterion': 'entropy'}. Best is trial 0 with value: 0.9957298136645963.
[I 2025-07-28 17:20:56,830] Trial 1 finished with value: 0.9927104899930987 and parameters: {'n_estimators': 476, 'max_features': 0.4253823424614421, 'max_leaf_nodes': 1408, 'criterion': 'entropy'}. Best is trial 0 with value: 0.9957298136645963.
[I 2025-07-28 17:21:09,578] Trial 2 finished with value: 0.9959454796411319 and parameters: {'n_estimators': 521, 'max_features': 0.1116027783762826, 'max_leaf_nodes': 698, 'criterion': 'entropy'}. Best is trial 2 with value: 0.9959454796411319.
[I 2025-07-28 17:21:43,424] Trial 3 finished with value: 0.9893029675638371 and parameters: {'n_estimators': 412, 'max_features': 0.39200309009026957, 'max_leaf_nodes': 1200, 'criterion': 'entropy'}. Best is trial 2 with value: 0.9959454796411319.
[I 2025-07-

Best hyperparameters: {'n_estimators': 521, 'max_features': 0.1116027783762826, 'max_leaf_nodes': 698, 'criterion': 'entropy', 'n_jobs': 8, 'random_state': 42, 'class_weight': 'balanced_subsample'}, Inner hyperparameter AUROC: 0.9959454796411319
Internal AUROC CV-0: 0.9965062111801242
Internal AUROC CV-1: 0.9992236024844721
Internal AUROC CV-2: 0.9235248447204969
Logistic regression for calibration...
Calibration based on logistic regression fit done.
Average AUROC: 0.9730848861283644
Model fitted.
Fitting completed in 420.42 seconds.
LazyQSAR Saving model to ../../../checkpoints/urological
Creating model directory: ../../../checkpoints/urological
Saving reducer to ../../../checkpoints/urological/partition_000/reducer.joblib
Saving model to ../../../checkpoints/urological/partition_000
Saving reducer to ../../../checkpoints/urological/partition_001/reducer.joblib
Saving model to ../../../checkpoints/urological/partition_001
Saving reducer to ../../../checkpoints/urological/partition_00