# Assignment 1

A. Consider again the potential energy problem of Lecture 1. Consider three systems, 1) $r0_{ij}$ = 0.10 nm, $K_{ij}$ = 200 kcal/mol nm^2; 2) $r0_{ij}$ = 0.16 nm, $K_{ij}$ = 150 kcal/mol nm^2; 3) $r0_{ij}$ = 0.22 nm, $K_{ij}$ = 500 kcal/mol nm^2:

1. Calculate the potential energy for atomic distances in the range [$r0_{ij}$-0.10, $r0_{ij}$+0.10], with step 0.002 for each system; recall that
$$V_{ij}(r) = \frac{1}{2}K_{ij}(r - r0_{ij})^2$$
2. Create and save a Pandas dataframe with three distance columns and three potential energy columns (one for each system)

B. Write a Python code to process a CSV file with chemical data. We want to calculate the normalized Polar Surface Area and AlogP values for the molecules in the *sars_cov_2_fret.txt* dataset, and store the normalized values in two new columns. Hint: you can apply a python function to an entire pandas column, e.g.

df['new value'] = function(df['old value'])

The dataset is derived from the CHEMBL4495583 assay, containing physico-chemical data of 109 compounds tested for SARS-CoV-2 3CL-Pro inhibition.

In [1]:
import numpy as np
import pandas as pd

# define constants for each system
r0_ij_1, K_ij_1 = 0.10, 200  # system 1
r0_ij_2, K_ij_2 = 0.16, 150  # system 2
r0_ij_3, K_ij_3 = 0.22, 500  # system 3

# create array of atomic distances, r, for each system
r_vals_1 = np.arange(r0_ij_1 - 0.10, r0_ij_1 + 0.10 + 0.002, 0.002)
r_vals_2 = np.arange(r0_ij_2 - 0.10, r0_ij_2 + 0.10 + 0.002, 0.002)
r_vals_3 = np.arange(r0_ij_3 - 0.10, r0_ij_3 + 0.10 + 0.002, 0.002)

# define function to calculate potential energy & calculate for each system
def pot_energy(r, r0_ij, K_ij):
    return 0.5 * K_ij * (r - r0_ij) ** 2

v_1 = pot_energy(r_vals_1, r0_ij_1, K_ij_1)
v_2 = pot_energy(r_vals_2, r0_ij_2, K_ij_2)
v_3 = pot_energy(r_vals_3, r0_ij_3, K_ij_3)

# create dataframe w/ results
df_potential_energies = pd.DataFrame({
    'Distance (system 1)': r_vals_1,
    'Potential energy (system 1)': v_1,
    'Distance (system 2)': r_vals_2,
    'Potential energy (system 2)': v_2,
    'Distance (system 3)': r_vals_3,
    'Potential energy (system 3)': v_3
})

print(df_potential_energies)

# save dataframe to csv
df_potential_energies.to_csv('potential_energies.csv')

     Distance (system 1)  Potential energy (system 1)  Distance (system 2)  \
0                  0.000                       1.0000                0.060   
1                  0.002                       0.9604                0.062   
2                  0.004                       0.9216                0.064   
3                  0.006                       0.8836                0.066   
4                  0.008                       0.8464                0.068   
..                   ...                          ...                  ...   
96                 0.192                       0.8464                0.252   
97                 0.194                       0.8836                0.254   
98                 0.196                       0.9216                0.256   
99                 0.198                       0.9604                0.258   
100                0.200                       1.0000                0.260   

     Potential energy (system 2)  Distance (system 3)  \
0     

In [3]:
df = pd.read_csv('/Users/marwanbakr/docs/uOttawa/2024-25/CHM4390 - Machine Learning for Chemistry/sars_cov_2_mpro_fret.csv', delimiter=';')

# define normalization function (min-max)
def normalize(column):
    return (column - column.min()) / (column.max() - column.min())

# apply function to normalize the 'Polar Surface Area' and 'AlogP' columns
df['Normalized Polar Surface Area'] = normalize(df['Polar Surface Area'])
df['Normalized AlogP'] = normalize(df['AlogP'])

print(df[['Polar Surface Area', 'AlogP', 'Normalized Polar Surface Area', 'Normalized AlogP']])

     Polar Surface Area  AlogP  Normalized Polar Surface Area  \
0                 43.70   2.85                       0.117371   
1                 34.14   3.28                       0.089638   
2                 89.99   2.29                       0.251654   
3                147.68   0.89                       0.419007   
4                 80.92   3.57                       0.225342   
..                  ...    ...                            ...   
104               58.56   2.98                       0.160478   
105                 NaN    NaN                            NaN   
106              201.67  -0.63                       0.575627   
107              267.84   0.39                       0.767579   
108              112.51   1.39                       0.316982   

     Normalized AlogP  
0            0.565041  
1            0.608740  
2            0.508130  
3            0.365854  
4            0.638211  
..                ...  
104          0.578252  
105               NaN  
106