The following are the steps for using AutoML for a regression task:

_Note: Setting the flag for featurization= 'True' generates represents molecules using 5 representation techniques._ 

1. Requires an input pandas dataframe consisting of two columns:<br>
    * SMILES strings<br>
    * target property values<br>


2. Molecules are represented as:<br>
    * coloumb matrix<br>
    * rdkit morgan fingerprints<br>
    * MACCs<br>
    * rdkit hashed topological torsion<br>
    * rdkit molecular descriptors (all)<br>



3. Screens through various sklearn regressor models:<br>
    * [MLPRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html)<br>
    * [GradientBoostingRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html)<br>
    * [RandomForestRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html)<br>
    * [Ridge](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html)<br>
    * [Lasso](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html)<br>
    * [SVR](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVR.html)<br>
    * [ElasticNet](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNet.html)<br>
    * [DecisionTreeRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html)<br>


Yields 'n-best' models, with optimized hyperparamters.

Returns dataframe of error metrics, machine learning model, algorithm, tuned hyperparameter values and featurization technique.

# Load your data 

In [1]:
import pandas as pd
import numpy as np
from chemml.chem import Molecule
from chemml.datasets import load_organic_density

In [2]:
molecules, target, dragon_subset = load_organic_density()
df=pd.concat([molecules, target], axis=1)
df = df.sample(25)
df

Unnamed: 0,smiles,density_Kg/m3
188,n1ccc(cc1)c1scnc1c1ncccc1c1cccc2c1cccc2,1203.16
111,Cc1cc2ccccc2c(c1)c1sccc1c1cscc1,1199.41
0,C1CSC(CS1)c1ncc(s1)CC1CCCC1,1184.64
30,Cc1c(cc2c(c1c1cnccn1)cccc2)c1cscn1,1213.76
328,c1ccc(nc1)c1nnc(s1)Sc1cccs1,1374.07
270,Oc1ccc(c2c1cccc2c1ncsc1)c1ccco1,1290.56
68,SC1CCC(C1c1cnccn1)C1CSCCS1,1238.45
293,OC1NCN(CN1)c1ccc(cc1)c1scnn1,1366.07
379,C1CSC(CS1)c1ccccc1c1ccc(cc1)c1ccco1,1193.12
253,c1cnc(cn1)c1ccc2c(c1)cccc2c1csc(n1)c1ccco1,1258.63


# Run autoML for a regression task

In [3]:
from chemml.autoML import ModelScreener
MS = ModelScreener(df, target="density_Kg/m3", featurization=True, smiles="smiles", 
                   screener_type="regressor", output_file="testing.txt")
scores = MS.screen_models(n_best=4)

featurizing molecules in batches of 2 ...
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 614ms/step
Merging batch features ...    [DONE]




split done!



--- 1460.8564009666443 seconds ---
split done!



--- 1754.8138766288757 seconds ---
split done!



--- 556.0454123020172 seconds ---
split done!



--- 2021.2931699752808 seconds ---
split done!



--- 450.1350498199463 seconds ---


ValueError: No objects to concatenate

In [4]:
scores

NameError: name 'scores' is not defined

# Save scores to csv

In [5]:
scores.to_csv("autoML_test.csv",index=False)