Releases: jvalegre/robert
Releases · jvalegre/robert
v1.2.1
- NN solver are now set to 'lbfgs' by default in the MLPRegressor to work with small datasets
- Thres_x is now set to 0.7 by default in the CURATE module
- Fixing bug in the PREDICT module when using EVALUATE module (it was not showing the linear model equation)
- Adding linear model equation in the REPORT module
- Changing the threshold for correlated features in predict_utils to adjust to the new thres_x
- Changing the way missing values are treated (previously filled with 0s, now using KNN imputer)
- Adding .csv in --csv_test in case the user forgets to add it
- Adding ROBERT score number in the REPORT module
- Creating --descp_lvl to select which descriptors to use in the AQME-ROBERT workflow (interpret/denovo/full)
- The AQME-ROBERT workflow now uses interpretable descriptors by default (--descp_lvl interpret)
v1.2.0
- Changing cross-validation (CV) in VERIFY to LOOCV for datasets with less than 50 points
- Changing MAPIE in PREDICT to LOOCV for datasets with less than 50 points
- By default, RFECV uses LOOCV for small datasets and 5-fold CV for larger datasets
- The external test set is chosen more evenly along the range of y values (not fully random)
- Changing the format of the VERIFY plot, from donut to bar plots
- Automatic KN data splitting for databases with less than 250 datapoints
- Change CV_test from ShuffleSplit to Kfold
- Predictions from CV are now represented in a graph and stored in a CSV
- Changing the ROBERT score to depend more heavily on results from CV
- Fixing auto_test (now it works as specified in the documentation)
- Adding clas predictions to report PDF
- Adding new pytests that cover the ROBERT score section from the report PDF
- Adding the EVALUATE module to evaluate linear models with user-defined descriptors and partitions
- Adding Pearson heatmap in PREDICT for the two models, with individual variable correlation analysis
- Adding y-distribution graphs and analysis of uniformity
- Major changes to the report PDF file to include sections rather than modules
- Improving explanation of the ROBERT score on Read The Docs
- Printing coefficients in MVL models inside PREDICT.dat
- Fixing bug in RFECV for classification problems, now it uses RandomForestClassifier()
- Automatic recognition of classification problems
v1.1.2
v1.1.1
v1.1.0
- Adding RFECV in CURATE to fix the maximum number of descriptors to 1/3 of datapoints
- Added the possibility to use more than 1 SMILES column in the AQME module
- Change the scoring criteria in the PFI workflow (from R2 to RMSE)
- Fixing models where R2 in validation is much better than in training (if the validation set is very small or unrepresentative, the model may appear to perform excellently simply by chance)
- Fixing PFI_plot bug (now takes all the features into account)
- Fixing a bad allocation memory issue in GENERATE
- Fixing bug in classification models when more than 2 classes of the target variable are present
- Fixing reproducibility when using a specific seed in GENERATE module
- Change CV_test from Kflod to ShuffleSplit and adding a random_state to ensure reproducibility
- Allows CSV inputs that use ; as separator
- Fixing CV_test bug in VERIFY (now it uses equal test size to the model tested)
- Adding variability in the prediction with MAPIE python library
- Adding sd in the predictions table when using external test set
- Fixing error_type bug for classification models
- MCC as default metric for classification models (better to check performance in unbalanced datasets)
- PFI workflow now uses the same metric as error_type
v1.0.5
v1.0.4
- Fixing outlier bug for negative t-values
- csv_test is treated separately from the test set from GENERATE
- Table of score thresholds in ROBERT_report.pdf
- Showing predictions at the end of the PREDICT section of ROBERT_report.pdf
- Adding --csv_test to AQME workflows
- Adding the --crest option to AQME workflows
- Auto adjusting the convergence criteria and xTB accuracy of QDESCP based on number
of datapoints
v1.0.3
- Changing default split to RND
- Adding the scikit-learn-intelex accelerator (now it's compatible for scikit-learn 1.3)
- Changing the thres_test default value to 0.25 (before: 0.20)
- Automatic KN data splitting for databases with less than 100 datapoints
- Droping 90% and 80% training sizes for small databases (less than 50 and 30 datapoints)
- Better print for command lines (more reproducible commands)
- Adding more information in the --help option
- Introducing SCORE and REPRODUBILITY to ROBERT_report.pdf
- Added the auto_test option
- Fixed empty spaces in heatmaps from GENERATE
- Mantain the ordering of GENERATE heatmaps across No_PFI and PFI
- Added pytest to full workflows with classification and tests
- Fixed " separators in command lines with options that had more than one word (i.e.
--qdescp_keywords) - Fixed length of outlier names for long words