<a href="https://colab.research.google.com/github/harikrishna-chem/opv_aem_2018/blob/main/predict.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook imports models published in the article titled "Toward predicting efficiency of organic solar cells via machine learning and improved descriptors" and offers predictions for the power conversion efficiency (PCE) of new candidates. For more detailed information, please refer to
[Adv. Energy Mater. 2018, 8, 1801032](https://doi.org/10.1002/aenm.201801032).

In [1]:
#@title Predict the PCE for a Donor-Acceptor Pair
#@markdown ***Please first input your parameters, and then press on the left button to run.***

# Import packages
import os
import pickle
import pandas as pd
import sklearn
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import MinMaxScaler

# Import models
if os.path.exists("/usr/local/opv_aem_2018"):
  pass
else:
  %cd /usr/local
  !git clone -q https://github.com/harikrishna-chem/opv_aem_2018.git

# Load models
# Load the LR model
with open('/usr/local/opv_aem_2018/models/lr_aem.pkl', 'rb') as file:
    lr = pickle.load(file)

# Load the ANN model
with open('/usr/local/opv_aem_2018/models/ann_aem.pkl', 'rb') as file:
    ann = pickle.load(file)

# Load the KNN model
with open('/usr/local/opv_aem_2018/models/knn_aem.pkl', 'rb') as file:
    knn = pickle.load(file)

# Load the KNN model
with open('/usr/local/opv_aem_2018/models/rf_aem.pkl', 'rb') as file:
    rf = pickle.load(file)

# Load the KNN model
with open('/usr/local/opv_aem_2018/models/gbrt_aem.pkl', 'rb') as file:
    gbrt = pickle.load(file)

# @markdown ** This will predict the PCE for a new material using given molecular properties calculated at the M06-2X/6-31G(d) level.:**

# @markdown 1. Polarizability of the donor molecule (Bohr$^3$):
polarizability = 1143.852667 # @param {type:"number"}
# @markdown 2. The energetic difference of LUMO and LUMO+1 of the acceptor, $\Delta_{\rm L}^{\rm A}$ (eV):
delLA = 0.033470 # @param {type:"number"}
# @markdown 3. The energetic difference of LUMO and LUMO+1 of the donor, $\Delta_{\rm L}$ (eV):
delLD = 0.047620 # @param {type:"number"}
# @markdown 4. Number of unsaturated atoms in the main conjugation path of the donor molecule, $N_{\rm atom}^{\rm D}$ :
N_atom = 58 # @param {type:"number"}
# @markdown 5. Energy of the electronic transition to a singlet excited state with the largest oscillator strength, $E_{\rm g}$ (nm):
Eg = 453.34 # @param {type:"number"}
# @markdown 6. Reorganization energy for holes in the donor molecule, $\lambda_{\rm h}$ (eV):
lambda_h = 0.384475 # @param {type:"number"}
# @markdown 7. Vertical ionization potential for the donor molecule, IP($v$) (eV):
DIP = 6.522940 # @param {type:"number"}
# @markdown 8. The energetic difference of HOMO of donor and LUMO of acceptor, $E_{\rm HL}^{\rm DA}$ (eV):
AL_DH = 3.650407 # @param {type:"number"}
# @markdown 9. The energetic differences of HOMO and HOMO−1 for the donor molecule, $\Delta_{\rm H}$ (eV):
delHD = 0.345585 # @param {type:"number"}
# @markdown 10. Hole–electron binding energy in the donor molecule, $E_{\rm bind}$ (eV):
E_bind = 2.080714 # @param {type:"number"}
# @markdown 11. The energetic difference of LUMO of donor and LUMO of acceptor, $E_{\rm LL}^{\rm DA}$ (eV):
DL_AL = 0.500962 # @param {type:"number"}
# @markdown 12. Change in dipole moment in going from the ground state to the first excited state for donor molecule, $\Delta_{\rm ge}$ (Debye):
delGE = 1.282700 # @param {type:"number"}
# @markdown 13. Energy of the electronic transition to the lowest-lying triplet state, E$_{\rm T_1}$ (eV):
E_T1 = 1.8899 # @param {type:"number"}

# Create a pandas DataFrame
column_names = ['polarizability', 'delLA', 'delLD', 'N_atom', 'Eg',
       'lamda_h', 'DIP', 'AL-DH', 'delHD', 'E_bind', 'DL-AL', 'delGE', 'E_T1']
input_data_list = [polarizability,delLA, delLD, N_atom, Eg, lambda_h, DIP, AL_DH, delHD,E_bind, DL_AL, delGE, E_T1]
input_data = pd.DataFrame(columns=column_names)
input_data.loc[len(input_data)] = input_data_list

### Descriptor scaling ###
train = pd.read_csv("/usr/local/opv_aem_2018/datasets/Train.csv")
trainX = train.to_numpy()[:,2:]
testX = input_data.to_numpy()
scalerS=StandardScaler()
scalerS.fit(trainX)
trainX_S = scalerS.transform(trainX)
testX_S = scalerS.transform(testX)

scalerM=MinMaxScaler()
scalerM.fit(trainX)
trainX_M = scalerM.transform(trainX)
testX_M = scalerM.transform(testX)

# Predict the PCE
list_pred_PCE = []

#LR
test_pred_LR = lr.predict(testX)
list_pred_PCE.extend(test_pred_LR)

#KNN
test_pred_KNN = knn.predict(testX_M)
list_pred_PCE.extend(test_pred_KNN)

#ANN
test_pred_ANN = ann.predict(testX_S)
list_pred_PCE.extend(test_pred_ANN)

#RF
test_pred_RF = rf.predict(testX)
list_pred_PCE.extend(test_pred_RF)

#GBRT
test_pred_GB = gbrt.predict(testX)
list_pred_PCE.extend(test_pred_GB)

model_list = ["LR", "kNN", "ANN", "RF", "GBRT"]
predicted_data = pd.DataFrame(columns=model_list)
predicted_data.loc[len(predicted_data)] = list_pred_PCE
predicted_data.index = ['D/A']

print("Predicted PCEs (%) using linear regression (LR), k-nearest neighbor (kNN), artificial neural networks (ANN), random forest (RF), and gradient boosting regression tree (GBRT) models are provided below:")
print(predicted_data)

/usr/local
Predicted PCEs (%) using linear regression (LR), k-nearest neighbor (kNN), artificial neural networks (ANN), random forest (RF), and gradient boosting regression tree (GBRT) models are provided below:
           LR       kNN       ANN        RF      GBRT
D/A  5.957946  6.894793  6.657736  6.867779  7.537297


### Batch Prediction for Multiple Donor-Acceptor Pairs
In this section, we will demonstrate how to utilize our models for predicting the Power Conversion Efficiency (PCE) of a number of donor-acceptor (D/A) pairs. Please follow the instructions provided on page 33 of the Supporting Information document from the AEM paper to compile a dataset in CSV format ("candidate.csv"), containing all required information for the donor-acceptor pairs you intend to use for PCE prediction. Once you have prepared your CSV file, upload it to Colab using the icon in the left column and then click the left button to initiate the process. To illustrate this process, I will use a sample candidate dataset. Users can subsequently download the predicted results, which are saved in the 'predicted_PCE.csv' file accessible through the icon in the left column.

In [3]:
# Import packages
import os
import pickle
import pandas as pd
import sklearn
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import MinMaxScaler

# Import models
if os.path.exists("/usr/local/opv_aem_2018"):
  pass
else:
  %cd /usr/local
  !git clone -q https://github.com/harikrishna-chem/opv_aem_2018.git

# Load models
# Load the LR model
with open('/usr/local/opv_aem_2018/models/lr_aem.pkl', 'rb') as file:
    lr = pickle.load(file)

# Load the ANN model
with open('/usr/local/opv_aem_2018/models/ann_aem.pkl', 'rb') as file:
    ann = pickle.load(file)

# Load the KNN model
with open('/usr/local/opv_aem_2018/models/knn_aem.pkl', 'rb') as file:
    knn = pickle.load(file)

# Load the KNN model
with open('/usr/local/opv_aem_2018/models/rf_aem.pkl', 'rb') as file:
    rf = pickle.load(file)

# Load the KNN model
with open('/usr/local/opv_aem_2018/models/gbrt_aem.pkl', 'rb') as file:
    gbrt = pickle.load(file)

if os.path.exists("/content/candidate.csv"):
    pass
else:
    !cp /usr/local/opv_aem_2018/datasets/Test.csv /content/candidate.csv

### Read Train and Test data ###
train = pd.read_csv("/usr/local/opv_aem_2018/datasets/Train.csv")
test = pd.read_csv("/content/candidate.csv")
trainX = train.to_numpy()[:,2:]
testX = test.to_numpy()[:,2:]

### Descriptor scaling ###
scalerS=StandardScaler()
scalerS.fit(trainX)
trainX_S = scalerS.transform(trainX)
testX_S = scalerS.transform(testX)

scalerM=MinMaxScaler()
scalerM.fit(trainX)
trainX_M = scalerM.transform(trainX)
testX_M = scalerM.transform(testX)

# Predict the PCE
#LR
test_pred_LR = lr.predict(testX)

#KNN
test_pred_KNN = knn.predict(testX_M)

#ANN
test_pred_ANN = ann.predict(testX_S)

#RF
test_pred_RF = rf.predict(testX)

#GBRT
test_pred_GB = gbrt.predict(testX)

# Create a pandas DataFrame for predicted PCEs
pred_df = pd.DataFrame() #test[:,0]
pred_df['SN'] = test.iloc[:, 0]
pred_df['LR'] = test_pred_LR
pred_df['kNN'] = test_pred_KNN
pred_df['ANN'] = test_pred_ANN
pred_df['RF'] = test_pred_RF
pred_df['GBRT'] = test_pred_GB

# Save the data in a CSV file
pred_df.to_csv("/content/predicted_PCE.csv")

print("Predicted PCEs (%) using linear regression (LR), k-nearest neighbor (kNN), artificial neural networks (ANN), random forest (RF), and gradient boosting regression tree (GBRT) models are provided below:")
print(pred_df)

Predicted PCEs (%) using linear regression (LR), k-nearest neighbor (kNN), artificial neural networks (ANN), random forest (RF), and gradient boosting regression tree (GBRT) models are provided below:
    SN        LR       kNN       ANN        RF      GBRT
0   49  5.957947  6.894791  6.657738  6.863784  7.537297
1  200  3.713405  4.709338  3.869178  3.341831  3.368673
2  267  1.403432  1.178437  1.898073  1.822651  1.799663
3   16  7.295168  8.020000  7.735599  7.583088  7.951715
4    4  6.436520  7.676667  7.237478  7.663054  7.715267
5   27  7.741166  7.821983  8.010157  6.994241  7.106885


# References and further reading

- Harikrishna Sahu,  Weining Rao,  Alessandro Troisi,  Haibo Ma, *Toward predicting efficiency of organic solar cells via machine learning and improved descriptors*, Adv. Energy Mater. 2018, 8, 1801032. DOI: [10.1002/aenm.201801032](https://doi.org/10.1002/aenm.201801032)