# Predictions Demo

This demo exhibits end-to-end usage of lambda max, retention time, mass spec, and product prediction utilities:

- Create a `ChemicalReaction` object with reactants and solvent
- Fetch products from ASKCOS
- Predict retention times (rtpred.ca)
- Predict lambda max values (chemprop in conda env)
- Predict mass spec values

## Requirements

This notebook requires the `ppenv` conda environment to be activated. Run the following command before starting:

```bash
conda activate ppenv
```

If the environment doesn't exist, create it with:

```bash
conda env create -f ppenv.yml
conda activate ppenv
```

## Important Note

This notebook uses `async/await` syntax because Jupyter notebooks run inside an event loop. The functions that interact with web browsers (ASKCOS scraping and retention time prediction) must use async versions to avoid conflicts with the notebook's event loop.

In [1]:
"""
Step 1: Initiate reaction object with LC-MS method parameters
"""

from utils.rxn_classes import ChemicalReaction

# input reaction: acetic anhydride + ethanol (standard esterification)
reactants = ["CC(=O)OC(C)=O"]
solvent = "CCO"

# LC-MS method parameters (from enaminert setup)
lcms_solvents = {
    'A': [{'CC#N': 95, 'O': 5}, {'C(=O)O': 0.0265}],
    'B': [{'O': 100}, {'C(=O)O': 0.0265}]
}
lcms_gradient = [(0, 99), (0.01, 99), (1.5, 0), (1.73, 0), (1.74, 99), (2, 99)]
lcms_column = ('RP', 4.6, 30, 2.7)
lcms_flow_rate = 3.0
lcms_temp = 30.0

# creating reaction object with LC-MS method parameters
rxn = ChemicalReaction(
    reactants=reactants,
    solvent=solvent,
    lcms_solvents=lcms_solvents,
    lcms_gradient=lcms_gradient,
    lcms_column=lcms_column,
    lcms_flow_rate=lcms_flow_rate,
    lcms_temp=lcms_temp
)

print(rxn)
print(f"\nLC-MS Method:")
print(f"  Column: {lcms_column}")
print(f"  Flow rate: {lcms_flow_rate} mL/min")
print(f"  Temperature: {lcms_temp}째C")
print(f"  Gradient: {lcms_gradient}")

ChemicalReaction(reactants=['CC(=O)OC(C)=O'], solvent='CCO', num_products=0, lcms_method=RP)

LC-MS Method:
  Column: ('RP', 4.6, 30, 2.7)
  Flow rate: 3.0 mL/min
  Temperature: 30.0째C
  Gradient: [(0, 99), (0.01, 99), (1.5, 0), (1.73, 0), (1.74, 99), (2, 99)]


In [2]:
"""
Step 2: Fetch products from ASKCOS
"""

import asyncio

async def fetch_products():
    return await rxn.fetch_products_from_askcos()

products = await fetch_products()

print(f"Found {len(products)} Products From ASKCOS")
print("\nListing Top 10 Products:")

for product in products[:10]:
    print(f"{product.get_smiles()}: probability={product.get_probability()}, mol_weight={product.get_mol_weight()}")

Combined reactants: CC(=O)OC(C)=O
Navigating to ASKCOS forward page...
Navigating to Product Prediction tab...
Clicked Product Prediction tab
Navigating to Reactants tab
Entered Reactants
Navigating to Solvents tab
Entered Solvents
Navigating to Results button...
Clicked Get Results button
Navigating to Export button...
Clicked Export button
Found 34 Products From ASKCOS

Listing Top 10 Products:
CC(=O)OC(C)=O: probability=1.0, mol_weight=102.031694052
CCO: probability=1.0, mol_weight=46.041864812
CC(=O)O: probability=0.8295780936282661, mol_weight=60.021129368000004
CCOC(C)=O: probability=0.15112044300780572, mol_weight=88.052429496
CC(=O)CC(=O)O: probability=0.018900324420628507, mol_weight=102.031694052
CC: probability=0.00016634156724500148, mol_weight=30.046950192
CC(C)=O: probability=0.0001276938716719426, mol_weight=58.041864812
CC(O)CC(=O)O: probability=6.656509044533401e-05, mol_weight=104.047344116
CC(=O)OC(C)O: probability=2.7149543896150086e-05, mol_weight=104.047344116
CC(

In [3]:
"""
Step 3: Predict retention times using ReTiNA_XGB1

Note: Retention times are predicted in SECONDS.
Uses the LC-MS method parameters set in Step 1.
"""

print(f"Predicting Retention Times for {len(rxn.get_products())} Products...")
print(f"Using LC-MS method: {rxn.get_lcms_method()['column']} column, "
      f"{rxn.get_lcms_method()['flow_rate']} mL/min, {rxn.get_lcms_method()['temp']}째C\n")

predicted_rts = rxn.predict_products_retention_times()

print(f"Predicted Retention Times for {len(predicted_rts)} Products (in seconds):")

print("\nDisplaying Retention Times for First 10 Products:")

for product in predicted_rts[:10]:
    rt = product.get_retention_time()
    if rt is not None:
        print(f"{product.get_smiles()}: predicted_rt={rt:.2f}s ({rt/60:.2f} min)")
    else:
        print(f"{product.get_smiles()}: predicted_rt=None")

INFO:rt_pred.pred_rt:Loading ReTiNA_XGB1 model from /Users/nathanleung/Documents/Programming/Research Projects/peak_prophet/predictions/rt_pred/ReTiNA_XGB1/ReTINA_XGB1.json


Predicting Retention Times for 34 Products...
Using LC-MS method: ('RP', 4.6, 30, 2.7) column, 3.0 mL/min, 30.0째C



INFO:rt_pred.pred_rt:ReTiNA_XGB1 model loaded successfully
INFO:rt_pred.pred_rt:Making predictions for 34 compounds


Predicted Retention Times for 34 Products (in seconds):

Displaying Retention Times for First 10 Products:
CC(=O)OC(C)=O: predicted_rt=-7.10s (-0.12 min)
CCO: predicted_rt=-15.34s (-0.26 min)
CC(=O)O: predicted_rt=-29.11s (-0.49 min)
CCOC(C)=O: predicted_rt=4.46s (0.07 min)
CC(=O)CC(=O)O: predicted_rt=-35.71s (-0.60 min)
CC: predicted_rt=-11.89s (-0.20 min)
CC(C)=O: predicted_rt=6.55s (0.11 min)
CC(O)CC(=O)O: predicted_rt=0.66s (0.01 min)
CC(=O)OC(C)O: predicted_rt=32.59s (0.54 min)
CC(=O)OC=O: predicted_rt=-9.47s (-0.16 min)


In [4]:
"""
Step 4: Predict lambda max using AMAX_XGB1

Note: Lambda max values are predicted in nanometers.
"""

print(f"Predicting lambda max for {len(rxn.get_products())} products...")

pred_lmaxs = rxn.predict_products_lambda_max()

print(f"Predicted lambda max for {len(pred_lmaxs)} products (in nanometers):")

print("\nDisplaying Lambda Max for First 10 Products:")

for product in pred_lmaxs[:10]:
    lmax = product.get_lambda_max()
    if lmax is not None:
        print(f"{product.get_smiles()}: lambda_max={lmax:.2f} nm")
    else:
        print(f"{product.get_smiles()}: lambda_max=None")

INFO:lmax_pred.pred_lmax:Loading AMAX_XGB1 model from /Users/nathanleung/Documents/Programming/Research Projects/peak_prophet/predictions/lmax_pred/AMAX_XGB1/AMAX_XGB1.json


Predicting lambda max for 34 products...


INFO:lmax_pred.pred_lmax:AMAX_XGB1 model loaded successfully
INFO:lmax_pred.pred_lmax:Making predictions for 34 (compound, solvent) pairs


Predicted lambda max for 34 products (in nanometers):

Displaying Lambda Max for First 10 Products:
CC(=O)OC(C)=O: lambda_max=313.96 nm
CCO: lambda_max=221.25 nm
CC(=O)O: lambda_max=243.95 nm
CCOC(C)=O: lambda_max=264.58 nm
CC(=O)CC(=O)O: lambda_max=242.91 nm
CC: lambda_max=227.25 nm
CC(C)=O: lambda_max=263.04 nm
CC(O)CC(=O)O: lambda_max=260.94 nm
CC(=O)OC(C)O: lambda_max=271.22 nm
CC(=O)OC=O: lambda_max=314.81 nm


In [5]:
"""
Step 5: Predict mass spectrometry adducts

Predicts MS adducts for all products in positive ionization mode.
"""

print(f"Predicting MS adducts for {len(rxn.get_products())} products...")

products = rxn.predict_products_ms_adducts(mode="positive")

print(f"Predicted MS adducts for {len(products)} products:")

print("\nDisplaying Top 5 MS Adducts for First 10 Products:")

for product in products[:10]:
    ms_values = product.get_ms_values()
    if ms_values:
        print(f"\n{product.get_smiles()}: {len(ms_values)} adducts")
        # showing top 5 adducts by probability
        sorted_adducts = sorted(ms_values.items(), key=lambda x: x[1][1], reverse=True)[:5]
        for adduct_name, (mass, prob) in sorted_adducts:
            print(f"  {adduct_name}: {mass:.4f} Da (prob={prob:.3f})")
    else:
        print(f"{product.get_smiles()}: No MS adducts predicted")

Predicting MS adducts for 34 products...
Predicted MS adducts for 34 products:

Displaying Top 5 MS Adducts for First 10 Products:

CC(=O)OC(C)=O: 31 adducts
  [M+H]+: 103.0390 Da (prob=1.000)
  [M+Na]+: 125.0209 Da (prob=0.250)
  [M+NH4]+: 120.0655 Da (prob=0.200)
  [M+K]+: 140.9949 Da (prob=0.150)
  [M+2H]2+: 52.0231 Da (prob=0.120)

CCO: 31 adducts
  [M+H]+: 47.0491 Da (prob=1.000)
  [M+Na]+: 69.0311 Da (prob=0.250)
  [M+NH4]+: 64.0757 Da (prob=0.200)
  [M+K]+: 85.0050 Da (prob=0.150)
  [M+2H]2+: 24.0282 Da (prob=0.120)

CC(=O)O: 31 adducts
  [M+H]+: 61.0284 Da (prob=1.000)
  [M+Na]+: 83.0103 Da (prob=0.250)
  [M+NH4]+: 78.0550 Da (prob=0.200)
  [M+K]+: 98.9843 Da (prob=0.150)
  [M+2H]2+: 31.0178 Da (prob=0.120)

CCOC(C)=O: 31 adducts
  [M+H]+: 89.0597 Da (prob=1.000)
  [M+Na]+: 111.0416 Da (prob=0.250)
  [M+NH4]+: 106.0863 Da (prob=0.200)
  [M+K]+: 127.0156 Da (prob=0.150)
  [M+2H]2+: 45.0335 Da (prob=0.120)

CC(=O)CC(=O)O: 31 adducts
  [M+H]+: 103.0390 Da (prob=1.000)
  [M+Na]+: 1

In [6]:
"""
Step 6: Display predicted products
"""

product_data = rxn.get_products()

print("Displaying Top 10 Products:\n")

for product in product_data[:10]:
    print(f"""{product.get_smiles()}: 
    probability={product.get_probability()}, 
    mol_weight={product.get_mol_weight()}, 
    retention_time={product.get_retention_time()}, 
    lambda_max={product.get_lambda_max()}""")
    print()

Displaying Top 10 Products:

CC(=O)OC(C)=O: 
    probability=1.0, 
    mol_weight=102.031694052, 
    retention_time=-7.097846508026123, 
    lambda_max=313.96240234375

CCO: 
    probability=1.0, 
    mol_weight=46.041864812, 
    retention_time=-15.343818664550781, 
    lambda_max=221.24668884277344

CC(=O)O: 
    probability=0.8295780936282661, 
    mol_weight=60.021129368000004, 
    retention_time=-29.113544464111328, 
    lambda_max=243.94625854492188

CCOC(C)=O: 
    probability=0.15112044300780572, 
    mol_weight=88.052429496, 
    retention_time=4.4556684494018555, 
    lambda_max=264.5806579589844

CC(=O)CC(=O)O: 
    probability=0.018900324420628507, 
    mol_weight=102.031694052, 
    retention_time=-35.71480941772461, 
    lambda_max=242.91366577148438

CC: 
    probability=0.00016634156724500148, 
    mol_weight=30.046950192, 
    retention_time=-11.8944673538208, 
    lambda_max=227.24819946289062

CC(C)=O: 
    probability=0.0001276938716719426, 
    mol_weight=58.0418