<a href="https://colab.research.google.com/github/kinranlau/COMSOL_colloid_interaction/blob/main/SVM_prediction/%5BGUI%5D_Predict_colloid_interaction_SVM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Predict whether the interaction is favorable

**By fitting the simulated data with a SVM, this notebook provides an interactive GUI to predict whether a particular interaction is favorable (below or above kT) given a set of parameters.**

<br>

See the original paper:

*A Multi-Parameter Study of the Colloidal
Interaction between Au and TiO<sub>2</sub>: The Role of
Surface Potential, Concentration and Defects*

---
<p align="center">
  <img src="https://github.com/kinranlau/COMSOL_colloid_interaction/blob/main/SVM_prediction/Model%20configuration.gif?raw=true" width="600">
</p>

- this model considers the interaction between a **negative particle (e.g. Au)** and a **negative surface with positive defects (e.g. TiO<sub>2</sub>)** in water,
- with 80640 different combinations of these 7 parameters:
  - $V_{Part}:$ Particle potential
  - $V_{Surf}:$ Surface potential
  - $V_{Def}:$ Defect potential
  - $DD:$ Defect density
  - $Conc:$ Concentration
  - $R:$ Particle radius
  - $A_H:$ Hamaker constant

<br>

- the electrostatic interaction energy was computed with the software COMSOL
- and the van der Waals contribution was added subsequently

<br>

- this notebook takes the computed data and fit them with SVM (Support Vector Machine)
- so you can input a combination of these 7 parameters 
- and predict whether the interaction is energetically feasible or not (below or above kT)


<br>

> This notebook is divided into various "*cells*", and you can run them one by one by clicking the "*Run cell*" button which looks like a play button.

In [None]:
#@title Import data and libraries { display-mode: "form" }

# import data
import pandas as pd
results = pd.read_csv('https://raw.githubusercontent.com/kinranlau/COMSOL_colloid_interaction/main/SVM_prediction/results_varyvdw_SVM.csv')

# select features and label
# split into training and test set
# scale the features
# run SVM
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC

import numpy as np

# first select the 7 features that we want:
# 1. colloid radius
# 2. V_particle
# 3. V_defect
# 4. V_surface
# 5. defect density
# 6. conc.
# 7. hamaker constant

features = results[['R/l_D',
                    'Particle potential',
                    'Defect potential',
                    'Surface potential',
                    'Defect density',
                    'Concentration / mM',
                    'Hamaker constant / 10^-21 J']]

# our label is the whether the final energy is "above/below kT"
# add another column to show below/above kT
# 1: below kT; 0: above kT
below_kT = lambda x: 1 if x <= 1.001 else 0
results['below kT'] = results['Energy / kT'].apply(below_kT)

label = results['below kT']

In [None]:
#@title Split data into training and test set (80:20) { display-mode: "form" }

# split into training and test set
training_data, validation_data, training_labels, validation_labels = train_test_split(features, label, test_size = 0.2)

# scale the feature data so it has mean = 0 and standard deviation = 1
scaler = StandardScaler()
training_data = scaler.fit_transform(training_data)
validation_data = scaler.fit_transform(validation_data)

In [None]:
#@title Fit data with SVM { display-mode: "form" }
#@markdown This might take ~30 seconds to run.

#@markdown You should get an accuracy of about 97%.

# fit with SVM
# using default C and gamma
SVM = SVC()
SVM.fit(training_data, training_labels) 

# score the SVM model
SVM_score = SVM.score(validation_data, validation_labels)
print(f'The model has an accuracy of {SVM_score*100:.2f}%.')

The model has an accuracy of 97.72%.


In [None]:
#@title Helper functions for predicting "below kT" or "above kT" { display-mode: "form" }

# predict below kT or not with SVM
def predict_below_kT(para):
    # predict "below kT" or not
    # 1: below kT; 0: above kT

    # scale input
    para = scaler.transform(para)
    
    # predict by SVM
    SVM_pred = SVM.predict(para)[0]

    # decision function:
    # larger absolute value = higher confidence
    # close to zero = very low confidence
    SVM_decision_func = SVM.decision_function(para)[0]

    if SVM_pred == 0:
      print(f'Prediction by SVM: Above kT! The decision function is {SVM_decision_func:.3f}.')
    else:
      print(f'Prediction by SVM: Below kT! The decision function is {SVM_decision_func:.3f}.')


# Compare with original data if it existed
def compare_data():
  df = results[\
      (results['R/l_D'] == radius) & \
      (results['Particle potential'] == V_particle) & \
      (results['Defect potential'] == V_defect) & \
      (results['Surface potential'] == V_surface) & \
      (results['Defect density'] == DD) & \
      (results['Concentration / mM'] == conc) & \
      (results['Hamaker constant / 10^-21 J'] == hamaker)
      ]

  if df.empty == True:
    print("NB: This set of parameters does not exist in the original dataset.")
  else:
    real_E = df['below kT'].iloc[0]

    if real_E == 0:
      print('NB: This set of parameters exists in the original dataset. Results: above kT!')
    else:
      print('NB: This set of parameters exists in the original dataset. Results: below kT!')

In [None]:
#@markdown <img src="https://github.com/kinranlau/COMSOL_colloid_interaction/blob/main/SVM_prediction/Parameters.png?raw=true" width="400">

#@title Input your parameters and run the prediction!  { display-mode: "form" }
#@markdown <br>

#@markdown ### 1. $V_{Part}:$ Particle potential (mV or dimensionless potential)
#@markdown - You can select the unit of the potentials ("mV" or "Dimensionless potential"). 
#@markdown - At 298 K, dimensionless potential can be converted to mV by simply multiplying by 25.7.
unit_of_potential = 'mV' #@param ["mV", "Dimensionless potential"] {allow-input: false}
V_particle = -35 #@param {type:"number"}
#@markdown ---

#@markdown ### 2. $V_{Surf}:$ Surface potential (mV or dimensionless potential)
V_surface =  -35#@param {type:"number"}
#@markdown ---

#@markdown ### 3. $V_{Def}:$ Defect potential (mV or dimensionless potential)
V_defect =  0#@param {type:"number"}
#@markdown ---

#@markdown ### 4. $DD:$ Defect density
#@markdown - Defects are arranged in a cubic primitive array, and defect density is defined by $\frac{\pi {l_D}^{2}}{a^2}$.
#@markdown - $l_D$ is 0.5 nm in our model.

#@markdown <img src="https://github.com/kinranlau/COMSOL_colloid_interaction/blob/main/SVM_prediction/Defect%20definition.png?raw=true" width="500">
DD =  0#@param {type:"number"}
#@markdown ---

#@markdown ### 5. $Conc:$ Concentration of salt (mM)
conc =  0.5#@param {type:"number"}
#@markdown ---

#@markdown ### 6. $R:$ Particle radius (nm)
radius =  2.5#@param {type:"number"}
#@markdown ---

#@markdown ### 7. $A_H:$ Hamaker constant (zJ)
#@markdown - TiO<sub>2</sub> - TiO<sub>2</sub>: 50 - 60 zJ
#@markdown - Au - Au: 90 - 300 zJ
#@markdown - The Hamaker constant for Au - TiO<sub>2</sub> can be estimated by taking the harmonic or geometric mean (100 zJ was used in the paper).



# TiO2 - TiO2: ~50-60
# Au - Au: ~150
hamaker = 50
hamaker =  100#@param {type:"number"}
#@markdown ---

############################################################
# conversion of units if needed
# the model fitted the potentials in dimensionless potential
# so if the input is in mV, then conversion to dimensionless potential is needed
if unit_of_potential == 'mV':
  V_particle /= 25.7
  V_surface /= 25.7
  V_defect /= 25.7

# in our model, the particle radius is defined relative to l_D which is 0.5 nm
# so we need to convert nm back to units of l_D
radius /= 0.5


# concat parameters
para_array = np.array([radius, V_particle, V_defect, V_surface, DD, conc, hamaker])
para_array = para_array.reshape(1,7)

para_label = ['R/l_D',
              'Particle potential',
              'Defect potential',
              'Surface potential',
              'Defect density',
              'Concentration / mM',
              'Hamaker constant / 10^-21 J']

para = pd.DataFrame(para_array, columns =  para_label)

# predict above/below kT with probability
predict_below_kT(para)
print('Decision function: a larger absolute value means higher confidence; closer to zero means less confidence.')

# compare with original data if it existed
print('\n')
compare_data()

Prediction by SVM: Above kT! The decision function is -1.142.
Decision function: a larger absolute value means higher confidence; closer to zero means less confidence.


NB: This set of parameters does not exist in the original dataset.
