# Chapter 5: Machine Learning for Drug Discovery - Interactive Notebook

This notebook demonstrates code and concepts from [Chapter 5: Machine Learning for Drug Discovery](../chapters/chapter5-ml-for-drug-discovery.qmd) of the book.

You can run and modify the code cells below to explore ML for molecules hands-on.

## Install and Import Libraries

We'll use scikit-learn, RDKit, and pandas for ML workflows. If not installed, uncomment the pip command below.

In [None]:
# !pip install scikit-learn rdkit-pypi pandas
from rdkit import Chem
from rdkit.Chem import Descriptors
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

## Prepare Data and Features

Let's create a small dataset of SMILES and molecular weights for regression.

In [None]:
data = {'smiles': ['CCO', 'CC(=O)OC1=CC=CC=C1C(=O)O', 'CCN(CC)CCCC(C)NC1=C2C=CC(=CC2=NC=C1)Cl'],
        'property': [46.07, 180.16, 318.86]}
df = pd.DataFrame(data)
df['mol'] = df['smiles'].apply(Chem.MolFromSmiles)
df['mw'] = df['mol'].apply(Descriptors.MolWt)

X = df[['mw']]
y = df['property']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

## Train and Evaluate a Random Forest Model

Let's train a Random Forest regressor and evaluate its performance.

In [None]:
model = RandomForestRegressor()
model.fit(X_train, y_train)
preds = model.predict(X_test)
print("MSE:", mean_squared_error(y_test, preds))

## Link to Book Chapter

For more details and explanations, see [Chapter 5: Machine Learning for Drug Discovery](../chapters/chapter5-ml-for-drug-discovery.qmd) in the book.