# Prototype a Streamlit Web Application: LCIA QSAR Model

**Date:** October 20, 2023

Streamlit is a Python library for turning data scripts into simple web applications. Streamlit also offers a Community Cloud for developers to freely deploy web applications directly from a GitHub repository, provided that the repository stays below a 1 GB resource limit. 

Here is an example of a Streamlit application designed for cheminformatics: https://molecule-icon-generator.streamlit.app/#molecule-icon-generator

I've been working on prototyping a Streamlit application for the LCIA QSAR Model in this Jupyter notebook. I thought it could be helpful to provide a user two options:
1. Lookup pre-computed data for a chemical and effect category of interest
2. Predict the point of departure for a new chemical

In [1]:
%matplotlib notebook

import streamlit as st
import pandas as pd

import data_management as dm



In [2]:
config = dm.load_config()
config

{'data_dir': 'Data',
 'exposure_file_name': 'exposure-data.parquet',
 'pod_file_name': 'points-of-departure.parquet',
 'features_file_name': 'features.parquet',
 'moe_file_name': 'margins-of-exposure.parquet',
 'chem_ids_file_name': 'qsar-ready-smiles.parquet',
 'pod_fig_file_name': 'pod-figure.pkl',
 'moe_fig_file_name': 'moe-figure.pkl',
 'exposure_key_mapper': {'5th percentile (mg/kg/day)': 'Median_Exposure_5%ile',
  '50th percentile (mg/kg/day)': 'Median_Exposure_50%ile',
  '95th percentile (mg/kg/day)': 'Median_Exposure_95%ile'},
 'pod_key_mapper': {'lb': 'POD_5%ile',
  'moe': 'POD_50%ile',
  'ub': 'POD_95%ile',
  'cum_count': 'Cum_Count'},
 'effect_for_label': {'General Noncancer': 'general',
  'Reproductive/Developmental': 'repro_dev'}}

In [3]:
effect_label = 'General Noncancer'

pod_data = dm.load_points_of_departure(config, effect_label)

pod_data

2023-11-08 11:50:41.540 
  command:

    streamlit run C:\Users\jkvas\.conda\envs\streamlit-jupyter\Lib\site-packages\ipykernel_launcher.py [ARGUMENTS]
2023-11-08 11:50:41.541 No runtime found, using MemoryCacheStorageManager


Unnamed: 0_level_0,POD,Cum_Proportion,POD,Cum_Proportion,POD,Cum_Proportion
Unnamed: 0_level_1,Regulatory,Regulatory,ToxValDB,ToxValDB,QSAR,QSAR
DTXSID,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
DTXSID2021315,-7.112,0.001355,-4.097466,0.000559,-4.097466,0.000002
DTXSID7023801,-3.358,0.002710,,,,
DTXSID7037505,-2.690,0.004065,-1.638169,0.002793,-1.638169,0.000381
DTXSID5020100,-2.462,0.005420,,,,
DTXSID4023886,-2.444,0.006775,,,,
...,...,...,...,...,...,...
DTXSID10196367,,,,,2.725588,0.999973
DTXSID20503902,,,,,2.746328,0.999982
DTXSID10242277,,,,,2.746328,0.999984
DTXSID70152328,,,,,2.746328,0.999987


In [4]:
moe_data = dm.load_margins_of_exposure(config, effect_label)

moe_data

2023-11-08 11:50:41.743 No runtime found, using MemoryCacheStorageManager


Unnamed: 0_level_0,POD_50%ile,Cum_Count,POD_5%ile,POD_95%ile,POD_50%ile,Cum_Count,POD_5%ile,POD_95%ile,POD_50%ile,Cum_Count,POD_5%ile,POD_95%ile
Unnamed: 0_level_1,Median_Exposure_95%ile,Median_Exposure_95%ile,Median_Exposure_95%ile,Median_Exposure_95%ile,Median_Exposure_50%ile,Median_Exposure_50%ile,Median_Exposure_50%ile,Median_Exposure_50%ile,Median_Exposure_5%ile,Median_Exposure_5%ile,Median_Exposure_5%ile,Median_Exposure_5%ile
DTXSID,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2
DTXSID9047623,-3.199838,1,-4.334702,-2.064973,1.183485,1,0.048621,2.318350,8.957967,156900,7.823103,10.092832
DTXSID3024944,-2.180856,2,-3.315721,-1.045992,3.037223,6,1.902359,4.172088,8.177911,13097,7.043047,9.312776
DTXSID7027625,-1.922027,3,-3.056891,-0.787163,2.409988,2,1.275124,3.544853,9.642775,384006,8.507911,10.777639
DTXSID3036238,-1.855226,4,-2.990091,-0.720362,3.783795,43,2.648930,4.918659,10.450293,447450,9.315428,11.585157
DTXSID9020584,-1.835367,5,-2.970231,-0.700503,3.375676,18,2.240812,4.510541,9.281084,287176,8.146219,10.415948
...,...,...,...,...,...,...,...,...,...,...,...,...
DTXSID4032116,12.686215,450640,11.551350,13.821079,15.540227,450641,14.405362,16.675091,17.644966,450626,16.510102,18.779831
DTXSID8074158,12.926303,450641,11.791438,14.061167,15.400466,450636,14.265602,16.535330,17.498432,450622,16.363568,18.633297
DTXSID3038307,13.013690,450642,11.878826,14.148554,15.846850,450644,14.711986,16.981715,17.952654,450634,16.817790,19.087519
DTXSID8038300,13.111811,450643,11.976946,14.246675,15.624344,450642,14.489480,16.759208,17.808153,450632,16.673288,18.943017


In [4]:
f = r"G:\My Drive\Repositories\LCIA-QSAR-Model\Input\Features\OPERA-2.9-predictions.csv"
import pandas as pd 

X = pd.read_csv(f, index_col=0)

list(X)

['CERAPP_Ago_pred_discrete',
 'CERAPP_Anta_pred_discrete',
 'CERAPP_Bind_pred_discrete',
 'CoMPARA_Ago_pred_discrete',
 'CoMPARA_Anta_pred_discrete',
 'CoMPARA_Bind_pred_discrete',
 'CATMoS_LD50_pred',
 'FUB_pred',
 'Clint_pred',
 'CACO2_pred',
 'OH_pred',
 'BCF_pred',
 'BioDeg_HalfLife_pred',
 'ReadyBiodeg_pred_discrete',
 'HL_pred',
 'KM_pred',
 'KOA_pred',
 'Koc_pred',
 'P_pred',
 'MP_pred',
 'MolWeight',
 'nbAtoms_discrete',
 'nbHeavyAtoms_discrete',
 'nbC_discrete',
 'nbO_discrete',
 'nbN_discrete',
 'nbAromAtom_discrete',
 'nbRing_discrete',
 'nbHeteroRing_discrete',
 'Sp3Sp2HybRatio',
 'nbRotBd_discrete',
 'nbHBdAcc_discrete',
 'ndHBdDon_discrete',
 'nbLipinskiFailures_discrete',
 'TopoPolSurfAir',
 'MolarRefract',
 'CombDipolPolariz',
 'VP_pred',
 'WS_pred']

In [7]:
X['CACO2_pred'].describe()

count    294058.000000
mean         -4.710065
std           0.282112
min          -7.530000
25%          -4.860000
50%          -4.680000
75%          -4.510000
max          -3.920000
Name: CACO2_pred, dtype: float64