# Notebook for editing and searching the compound table
First load the table by running the following code cell. Then you will be able to:
- Edit a compound
- Search by keyword (exact)
- Search by field and value
- Search by partial name

### Run the next cell to load the table and required library

In [33]:
# import required libraries and load table
import pandas as pd
compound_table = pd.read_parquet('compound_table.parquet')

## Get existing details by compound_id

In [28]:
#-------------! Input compound_id
compound_id = 14

#---------------! code
print(compound_table.loc[compound_table.id==compound_id,:].to_dict('index')[0])
if len(compound_table.loc[compound_table.id==compound_id,'id'].to_list()) >1:
    print("Duplicate entries detected, the first entry was returned. Please drop duplicate entries or the edit compound code won't run.")

{'id': 14, 'name': '2-(1,3-dimethyl-2H-benzimidazol-2-yl)phenol (HDMBI)', 'formula': 'C15H16N2O', 'molecular_weight': 293.0, 'CAS': '3652-93-5', 'source_dois': 'doi:10.1016/j.tet.2006.03.061, https://pubchem.ncbi.nlm.nih.gov/compound/789497, ', 'recyclable': 0, 'type': 'electron donor', 'compound_family': 'benzimidazole', 'used_in_photocat': 1, 'used_in_rfbs': 0, 'used_as_hcarrier': 0}


## Edit a compound
To edit a compound entry you must:
1. set the compound_id variable to the id of the compound that you want to edit
2. edit the entries in the compound_details dictionary by adding the new information next to the corresponding column names/keys
3. run the next cell
#### Please note that your changes will be saved when you run the next cell

In [25]:
compound_details = {'id': 5, 'name': '1,3-dimethyl-2-phenylbenzimidazoline (BIH)', 'formula': 'C15H16N2', 'molecular_weight': 224.0, 'CAS': '3652-92-4', 'source_dois': 'https://doi.org/10.1016/j.crci.2015.11.026, https://pubchem.ncbi.nlm.nih.gov/compound/199049#section=CAS, ', 'recyclable': 0, 'type': 'electron donor', 'compound_family': 'benzimidazole', 'used_in_photocat': 1, 'used_in_rfbs': 0, 'used_as_hcarrier': 1}
compound_id = compound_details['id']
for col_name, new_info in compound_details.items():
    old_info = compound_table.loc[compound_table.id==compound_id, col_name]
    if old_info.item() != new_info:
        compound_table.loc[compound_table.id==compound_id, col_name] = new_info
compound_table.to_parquet('compound_table.parquet', engine='pyarrow', compression=None)
print(f"Following changes saved to compound {compound_id}:")
display(compound_table.loc[compound_table.id==compound_id,:])

Following changes saved to compound 5:


Unnamed: 0,id,name,formula,molecular_weight,CAS,source_dois,recyclable,type,compound_family,used_in_photocat,used_in_rfbs,used_as_hcarrier
0,5,"1,3-dimethyl-2-phenylbenzimidazoline (BIH)",C15H16N2,224.0,3652-92-4,"https://doi.org/10.1016/j.crci.2015.11.026, ht...",0,electron donor,benzimidazole,1,0,1


## Search by Keyword (exact)
1. set the variable search_term to a string containing the exact keyword you wish to search for
2. run the following cell

In [30]:
# Set the following variable value to the keyword you want to search (don't forget the speechmarks '' "")
search_term = 'benzimidazole'

## --- The following code exectutes the search
check_mask = compound_table.isin([search_term])
display(compound_table.loc[check_mask.any(axis=1), :])

Unnamed: 0,id,name,formula,molecular_weight,CAS,source_dois,recyclable,type,compound_family,used_in_photocat,used_in_rfbs,used_as_hcarrier
0,5,"1,3-dimethyl-2-phenylbenzimidazoline (BIH)",C15H16N2,224.0,3652-92-4,"https://doi.org/10.1016/j.crci.2015.11.026, ht...",0,electron donor,benzimidazole,1,0,1
0,11,"2-(4-methoxyphenyl)-1,3-dimethyl-2H-benzimidaz...",C16H18N2O,254.0,54825-26-2,"doi:10.1016/j.tet.2006.03.061, https://pubchem...",0,electron donor,benzimidazole,1,0,0
0,12,"1,3-dimethyl-2-(4-methylphenyl)-2H-benzimidazo...",C16H18N2,238.0,100672-38-6,"doi:10.1016/j.tet.2006.03.061, https://pubchem...",0,electron donor,benzimidazole,1,0,0
0,13,"2-(3,5-dichlorophenyl)-1,3-dimethyl-2H-benzimi...",C15H14Cl2N2,293.0,,"doi:10.1016/j.tet.2006.03.061, https://pubchem...",0,electron donor,benzimidazole,1,0,0
0,14,"2-(1,3-dimethyl-2H-benzimidazol-2-yl)phenol (H...",C15H16N2O,293.0,3652-93-5,"doi:10.1016/j.tet.2006.03.061, https://pubchem...",0,electron donor,benzimidazole,1,0,0
0,19,"1,3-dimethyl-2-(2,4,6-trimethoxyphenyl)-2H-be...",C18H22N2O3,314.0,,"DOI:10.1021/acs.jpcc.2c03541, https://pubchem....",1,electron donor,benzimidazole,1,0,0


## Search by field (column name) and value
You can search the following fields for specific values by altering the field_variable and search_value.
Fields and value types:
- id (integer > =1)
- name (string name of compound)
- formula (string formula of the compound i.e. 'C2H4O')
- molecular_weight (integer > 0)
- CAS (string of CAS number
- source_dois (string list of paper doi numbers for relevant references to compound)
- recyclable (boolean True/False or 1/0 value) 
- type (string describing donor type i.e. hydride donor)
- compound_family (string describing type of compound i.e. thiazine or amine) 
- used_in_photocat (boolean True/False or 1/0 value) 
- used_in_rfbs (boolean True/False or 1/0 value) 
- used_as_hcarrier (boolean True/False or 1/0 value)

In [4]:
#----! Set the following 2 variables !----#
# don't forget that the field variable should be a string of a column name
field_variable = 'id'

search_value = 9

#----------------------
# search code
if isinstance(field_variable, str):
    if field_variable in compound_table.columns.to_list():
        display(compound_table.loc[compound_table[field_variable]== search_value, :])
    else:
        print("Incorrect column name")
else:
    print("Did you forget quotation marks around the field_variable value?")

Unnamed: 0,id,name,formula,molecular_weight,CAS,source_dois,recyclable,type,compound_family,used_in_photocat,used_in_rfbs,used_as_hcarrier
0,9,ethylenediaminetetraacetic acid (EDTA),(HO2CCH2)2NCH2CH2N(CH2CO2H)2,292,60-00-4,"https://doi.org/10.1039/c3cp55023k, https://ww...",0,electron donor,organic acid,1,0,0


## Search by partial name

In [26]:
# Set the following variable value to the keyword you want to search (don't forget the speechmarks '' "")
partial_search_term = 'benzimidazole'

## --- The following code executes the search
check_mask = compound_table.name.str.contains(partial_search_term, case=False)

# note because check_mask is a series any is not needed to collapse index
display(compound_table.loc[check_mask, :])

Unnamed: 0,id,name,formula,molecular_weight,CAS,source_dois,recyclable,type,compound_family,used_in_photocat,used_in_rfbs,used_as_hcarrier
0,11,"2-(4-methoxyphenyl)-1,3-dimethyl-2H-benzimidaz...",C16H18N2O,254.0,54825-26-2,"doi:10.1016/j.tet.2006.03.061, https://pubchem...",0,electron donor,benzimidazole,1,0,0
0,12,"1,3-dimethyl-2-(4-methylphenyl)-2H-benzimidazo...",C16H18N2,238.0,100672-38-6,"doi:10.1016/j.tet.2006.03.061, https://pubchem...",0,electron donor,benzimidazole,1,0,0
0,13,"2-(3,5-dichlorophenyl)-1,3-dimethyl-2H-benzimi...",C15H14Cl2N2,293.0,,"doi:10.1016/j.tet.2006.03.061, https://pubchem...",0,electron donor,benzimidazole,1,0,0


In [32]:
display(compound_table)

Unnamed: 0,id,name,formula,molecular_weight,CAS,source_dois,recyclable,type,compound_family,used_in_photocat,used_in_rfbs,used_as_hcarrier
0,1,triethylamine,(C2H5)3N,101.0,121-44-8,"https://doi.org/10.1016/j.crci.2015.11.026, si...",0,electron donor,amine,1,0,0
0,2,triethanolamine,(HOCH2CH2)3N,149.0,102-71-6,"https://doi.org/10.1016/j.crci.2015.11.026, si...",0,electron donor,amine,1,0,0
0,3,"N,N-dimethylaniline (DMA)",C6H5N(CH3)2,121.0,121-69-7,"https://doi.org/10.1016/j.crci.2015.11.026, si...",0,electron donor,amine,1,0,0
0,4,4-dimethylaminotoluene (DMT),CH3C6H4N(CH3)2,135.0,99-97-8,"https://doi.org/10.1016/j.crci.2015.11.026, si...",0,electron donor,amine,1,0,0
0,5,"1,3-dimethyl-2-phenylbenzimidazoline (BIH)",C15H16N2,224.0,3652-92-4,"https://doi.org/10.1016/j.crci.2015.11.026, ht...",0,electron donor,benzimidazole,1,0,1
0,6,L-ascorbic acid (vitamin C),C6H8O6,176.0,50-81-7,"https://doi.org/10.1016/j.crci.2015.11.026, ht...",0,electron donor,organic acid,1,0,0
0,7,oxalate,[C2O4]-2,88.0,338-70-5,"https://doi.org/10.1016/j.crci.2015.11.026, ht...",0,electron donor,organic anion,1,0,0
0,8,triphenylphosphine,(C6H5)P3,262.0,603-35-0,"https://doi.org/10.1016/j.crci.2015.11.026, ht...",0,electron donor,phosphine,1,0,0
0,9,ethylenediaminetetraacetic acid (EDTA),(HO2CCH2)2NCH2CH2N(CH2CO2H)2,292.0,60-00-4,"https://doi.org/10.1039/c3cp55023k, https://ww...",0,electron donor,organic acid,1,0,0
0,10,NADH,C21H29N7O14P2,665.0,58-68-4,"https://doi.org/10.1016/0302-4598(74)85011-7, ...",0,electron donor,nicotinamide biomimetic,1,0,0


In [20]:
compound_table.drop_duplicates(subset=['id', 'name', 'formula', 'CAS', 'source_dois', 'compound_family'], keep='first', inplace=True)
display(compound_table.loc[compound_table.id==5,:])

Unnamed: 0,id,name,formula,molecular_weight,CAS,source_dois,recyclable,type,compound_family,used_in_photocat,used_in_rfbs,used_as_hcarrier
0,5,"1,3-dimethyl-2-phenylbenzimidazoline (BIH)",C15H16N2,224.0,3652-92-4,"https://doi.org/10.1016/j.crci.2015.11.026, ht...",0,electron donor,benzimidazole,1,0,1
