# Predictions Demo

This demo exhibits end-to-end usage of lambda max, retention time, mass spec, and product prediction utilities:

- Create a `ChemicalReaction` with reactants and solvent
- Fetch products from ASKCOS
- Predict retention times (rtpred.ca)
- Predict lambda max values (chemprop in conda env)
- Predict mass spec values

## Requirements

This notebook requires the `ppenv` conda environment to be activated. Run the following command before starting:

```bash
conda activate ppenv
```

If the environment doesn't exist, create it with:

```bash
conda env create -f ../ppenv.yml
conda activate ppenv
```

## Important Note

This notebook uses **async/await** syntax because Jupyter notebooks run inside an event loop. The functions that interact with web browsers (ASKCOS scraping and retention time prediction) must use async versions to avoid conflicts with the notebook's event loop.

In [1]:
# environment verification
import sys
import os

print("=== Environment Verification ===")
print(f"Python executable: {sys.executable}")
print(f"Current working directory: {os.getcwd()}")

# Check if we're in the correct environment
if 'ppenv' in sys.executable:
    print("✓ Running in ppenv environment")
else:
    print("⚠️  WARNING: Not running in ppenv environment!")
    print("   Please run: conda activate ppenv")
    print("   Then restart the kernel and run all cells again")

# Check required packages
packages_to_check = ['rdkit', 'playwright', 'pyppeteer', 'pandas']
for package in packages_to_check:
    try:
        if package == 'playwright':
            from playwright.async_api import async_playwright
        elif package == 'pyppeteer':
            import pyppeteer
        else:
            __import__(package)
        print(f"✓ {package} is available")
    except ImportError:
        print(f"✗ {package} is not available - please activate ppenv environment")

# Check uvvismlenv environment
import subprocess
try:
    result = subprocess.run(['conda', 'env', 'list'], capture_output=True, text=True)
    if 'uvvismlenv' in result.stdout:
        print("✓ uvvismlenv conda environment exists")
        
        # Check if chemprop is available in uvvismlenv
        try:
            result = subprocess.run(['conda', 'run', '-n', 'uvvismlenv', 'chemprop_predict', '--help'], 
                                 capture_output=True, text=True, timeout=10)
            if result.returncode == 0:
                print("✓ chemprop is available in uvvismlenv")
            else:
                print("✗ chemprop is not available in uvvismlenv")
        except Exception as e:
            print(f"✗ Error checking chemprop in uvvismlenv: {e}")
    else:
        print("✗ uvvismlenv conda environment does not exist")
except Exception as e:
    print(f"✗ Error checking conda environments: {e}")

print("=== End Verification ===")


=== Environment Verification ===
Python executable: /opt/anaconda3/envs/ppenv/bin/python
Current working directory: /Users/nathanleung/Documents/Programming/Research Projects/peak_prophet/predictions
✓ Running in ppenv environment
✓ rdkit is available
✓ playwright is available
✓ pyppeteer is available
✓ pandas is available
✓ uvvismlenv conda environment exists
✓ chemprop is available in uvvismlenv
=== End Verification ===


In [3]:
from rxn_classes import ChemicalReaction

# input reaction: acetic anhydride + ethanol (standard esterification)
reactants = ["CC(=O)OC(C)=O"]
solvent = "CCO"

# creating reaction object
rxn = ChemicalReaction(reactants=reactants, solvents=solvent)

print(rxn)


ChemicalReaction(reactants='['CC(=O)OC(C)=O']', solvents='CCO', products=[])


In [4]:
import asyncio

async def fetch_products():
    return await rxn.fetch_products_from_askcos()

# fetching predicted products from ASKCOS
# note: This launches a headless browser and can take time.
products = await fetch_products()

print(f"Found {len(products)} products from ASKCOS")

for product in products:
    print(f"{product.get_smiles()}: probability={product.get_probability()}, mol_weight={product.get_mol_weight()}")

Combined reactants: CC(=O)OC(C)=O
Navigating to ASKCOS forward page...
Navigating to Product Prediction tab...
Clicked Product Prediction tab
Navigating to Reactants tab
Entered Reactants
Navigating to Solvents tab
Entered Solvents
Navigating to Results button...
Clicked Get Results button
Navigating to Export button...
Clicked Export button
Downloaded CSV saved to /Users/nathanleung/Documents/Programming/Research Projects/peak_prophet/predictions/forward.csv
Deleted file: /Users/nathanleung/Documents/Programming/Research Projects/peak_prophet/predictions/forward.csv
Found 34 products from ASKCOS
CC(=O)OC(C)=O: probability=1.0, mol_weight=102.031694052
CCO: probability=1.0, mol_weight=46.041864812
CC(=O)O: probability=0.8295843300088681, mol_weight=60.021129368000004
CCOC(C)=O: probability=0.15111447322452107, mol_weight=88.052429496
CC(=O)CC(=O)O: probability=0.018900082975033396, mol_weight=102.031694052
CC: probability=0.00016632927518114913, mol_weight=30.046950192
CC(C)=O: probabi

In [5]:
# predicting retention times for all predicted products
# note: This requires pyppeteer and will launch a browser to scrape rtpred.ca

print(f"Predicting retention times for {len(rxn.get_products())} products...")

async def predict_retention_times():
    return await rxn.predict_products_retention_times()

predicted_rts = await predict_retention_times()

print(f"Predicted retention times for {len(predicted_rts)} products:")

for product in predicted_rts:
    print(f"{product.get_smiles()}: predicted_rt={product.get_retention_time()}")

Predicting retention times for 34 products...


INFO:pyppeteer.launcher:Browser listening on: ws://127.0.0.1:54294/devtools/browser/08ddaef4-4e84-448c-bf93-4d52fa119602
INFO:rt_pred.rt_pred:Navigating to rtpred.ca...
INFO:rt_pred.rt_pred:Selecting CS22 method...
INFO:rt_pred.rt_pred:Uploading file: /var/folders/j0/nw42476j777dmyrtz9d_sjqh0000gn/T/tmp1pl_vmgc.csv
INFO:rt_pred.rt_pred:Submitting prediction request...
INFO:rt_pred.rt_pred:Submit button clicked successfully
INFO:rt_pred.rt_pred:Waiting for results...
INFO:rt_pred.rt_pred:Extracted 34 predictions
INFO:pyppeteer.launcher:terminate chrome process...


Predicted retention times for 34 products:
CC(=O)OC(C)=O: predicted_rt=663.9711
CCO: predicted_rt=176.24898
CC(=O)O: predicted_rt=146.31873
CCOC(C)=O: predicted_rt=5.027264
CC(=O)CC(=O)O: predicted_rt=69.173645
CC: predicted_rt=133.58699
CC(C)=O: predicted_rt=14.874266
CC(O)CC(=O)O: predicted_rt=66.53999
CC(=O)OC(C)O: predicted_rt=177.12035
CC(=O)OC=O: predicted_rt=643.475
CO: predicted_rt=27.473927
O=CCC(=O)O: predicted_rt=60.32571
CCOC(=O)OC(C)=O: predicted_rt=202.79634
CCOC(C)O: predicted_rt=77.67454
CCOC(=O)CC(C)=O: predicted_rt=137.46822
O=C(O)CCO: predicted_rt=43.46891
CCOC(=O)CC(=O)O: predicted_rt=102.84081
CC(C)O: predicted_rt=31.48149
CC1(O)CC(=O)O1: predicted_rt=202.38968
O=C1CC(=O)O1: predicted_rt=524.62744
CCOC=O: predicted_rt=2.9411273
CCOC(C)(O)CC(=O)O: predicted_rt=108.24716
CC=O: predicted_rt=57.143612
CC(=O)OCO: predicted_rt=1274.3223
CCOC(=O)O: predicted_rt=74.10894
CC(O)CC=O: predicted_rt=105.13137
CC(=O)CC=O: predicted_rt=88.65347
CCOC(=O)CC(C)O: predicted_rt=128.51

In [6]:
# predicting lambda max for all predicted products

print(f"Getting UV peaks for {len(rxn.get_products())} molecules...")

pred_lmaxs = rxn.predict_products_lambda_max(conda_env="uvvismlenv")

for product in pred_lmaxs:
    print(f"{product.get_smiles()}: lambda_max={product.get_lambda_max()}")

INFO:lmax_pred.lmax_pred:Predicting lambda max for 34 molecules
INFO:lmax_pred.lmax_pred:Created input CSV: input.csv (34 rows)
INFO:lmax_pred.lmax_pred:Running chemprop in uvvismlenv...


Getting UV peaks for 34 molecules...


INFO:lmax_pred.lmax_pred:Prediction completed
INFO:lmax_pred.lmax_pred:Extracted 34 predictions


CC(=O)OC(C)=O: lambda_max=384.9128452710439
CCO: lambda_max=428.71184747818734
CC(=O)O: lambda_max=297.68356650905037
CCOC(C)=O: lambda_max=383.20130053637666
CC(=O)CC(=O)O: lambda_max=390.8554126817203
CC: lambda_max=278.6657265468516
CC(C)=O: lambda_max=314.45758448874557
CC(O)CC(=O)O: lambda_max=509.0056562932714
CC(=O)OC(C)O: lambda_max=445.1956062832576
CC(=O)OC=O: lambda_max=430.33779833320034
CO: lambda_max=262.83774399037577
O=CCC(=O)O: lambda_max=465.53683686895226
CCOC(=O)OC(C)=O: lambda_max=439.4226647395434
CCOC(C)O: lambda_max=446.3434917581933
CCOC(=O)CC(C)=O: lambda_max=398.21854830148806
O=C(O)CCO: lambda_max=506.99119586887446
CCOC(=O)CC(=O)O: lambda_max=487.36102241903734
CC(C)O: lambda_max=388.1242348900782
CC1(O)CC(=O)O1: lambda_max=538.2875371440858
O=C1CC(=O)O1: lambda_max=563.5719137386441
CCOC=O: lambda_max=340.7100434819081
CCOC(C)(O)CC(=O)O: lambda_max=492.49737933905834
CC=O: lambda_max=348.56136074076664
CC(=O)OCO: lambda_max=460.59468623782504
CCOC(=O)O: la

In [10]:
# predicting mass spectrometry adducts for all products

print(f"Predicting MS adducts for {len(rxn.get_products())} products...")

products = rxn.predict_products_ms_adducts()

for product in products:
    print(f"{product.get_smiles()}: ms_values={product.get_ms_values()}")
    print()

Predicting MS adducts for 34 products...
CC(=O)OC(C)=O: ms_values={34.67773503716: 0.02, 42.00504903716: 0.015, 49.33291203716: 0.01, 56.65967703716: 0.005, 52.023123026: 0.6, 60.536397026: 0.1, 63.014094026: 0.08, 71.001064026: 0.06, 72.536397026: 0.05, 74.00506502600001: 0.04, 93.049670026: 0.03, 113.562944026: 0.02, 103.03897005200001: 1.0, 120.065517052: 0.75, 125.020912052: 0.85, 135.065183052: 0.3, 140.994852052: 0.65, 144.06551705200002: 0.25, 147.002854052: 0.12, 163.097034052: 0.15, 166.04745905200002: 0.2, 178.950734052: 0.08, 181.052914052: 0.06, 185.092064052: 0.04, 186.086804052: 0.03, 205.070664104: 0.08, 222.09721110400002: 0.06, 227.052606104: 0.05, 243.026546104: 0.04, 246.097211104: 0.03, 268.079153104: 0.02, 33.00328864998944: 0.01, 50.008571026000006: 0.08, 83.01330405200001: 0.55, 101.024418052: 0.95, 123.006360052: 0.5, 137.001096052: 0.7, 138.98030005200002: 0.45, 147.029895052: 0.4, 161.04554505200002: 0.35, 180.95057905200002: 0.15, 215.017280052: 0.1, 203.0561

In [11]:
# displaying predicted results

product_data = rxn.get_products()

print("Predicted Products:")

for product in product_data:
    print(f"""{product.get_smiles()}: 
    probability={product.get_probability()}, 
    mol_weight={product.get_mol_weight()}, 
    retention_time={product.get_retention_time()}, 
    lambda_max={product.get_lambda_max()}""")
    print()

Predicted Products:
CC(=O)OC(C)=O: 
    probability=1.0, 
    mol_weight=102.031694052, 
    retention_time=663.9711, 
    lambda_max=384.9128452710439

CCO: 
    probability=1.0, 
    mol_weight=46.041864812, 
    retention_time=176.24898, 
    lambda_max=428.71184747818734

CC(=O)O: 
    probability=0.8295843300088681, 
    mol_weight=60.021129368000004, 
    retention_time=146.31873, 
    lambda_max=297.68356650905037

CCOC(C)=O: 
    probability=0.15111447322452107, 
    mol_weight=88.052429496, 
    retention_time=5.027264, 
    lambda_max=383.20130053637666

CC(=O)CC(=O)O: 
    probability=0.018900082975033396, 
    mol_weight=102.031694052, 
    retention_time=69.173645, 
    lambda_max=390.8554126817203

CC: 
    probability=0.00016632927518114913, 
    mol_weight=30.046950192, 
    retention_time=133.58699, 
    lambda_max=278.6657265468516

CC(C)=O: 
    probability=0.00012769416938999284, 
    mol_weight=58.041864812, 
    retention_time=14.874266, 
    lambda_max=314.457584