# Tox 21

The main goal of Tox21 is to develop better toxicity assessment methods that use high-throughput screening and computational methods to replace traditional animal testing.

In the assay they use thousands of chemicals to test their interaction with a wide range of biological targets using high-throughput assays. Focuses on understanding how chemicals affect biological pathways relevant to disease or adverse outcomes.

→ Common use of the dataset (Task):  
- Focus on **understanding chemical toxicity** at a systemic level, including **effects on various biological pathways**.
- Data is often used for risk assessment and regulatory decision-making.
- We will focus on the toxocity pathwa : NR signaling , by retrieving assay readout 

## The files

- *Tox21_assay_list.xltx* : Exhaustive list of assay conducted on NR. A total of 38 assay are conducted on NR 
- *tox21_10k_library_info.xls* : List of compouds tested in the assay with their CID and CAS identifier, unified identifier for 
- *Tox21Assay_SLPs and Descriptoons_2016* : 38 assay compressed, each folder have 4 files, we are interested in concatenating the __.aggregated.txt__ files into one csv file
    - The code to concatenate the assay was previously done therefore we will load the result saved under *Tox21_assay_aggregated.csv*

- Known target toward NR ; AR, ERα, FXR, PPARδ, PPARγ, RXR, TRβ, VDR, GR, hPXR, AhR, rPXR, CAR, ERR

## Import Libraries

In [1]:
import re
import sys
import warnings
import glob , os
import pandas as pd

In [2]:

def create_directory(path):
    """
    Checks if a directory exists at the given path. If it doesn't, the directory is created.
    
    Args:
        path (str): The path of the directory to check and create.
    """
    if not os.path.exists(path):
        os.mkdir(path)
        print(f"Directory '{path}' created.")
    else:
        print(f"Directory '{path}' already exists.")


## Load the files

In [3]:
assay_list = pd.read_excel('../Data/Annotations/Tox21_assay_list.xltx')
assay_list.head()

Unnamed: 0,Protocol Name,Assay Target,Target Category,Cell Line,Cell Type
0,tox21-ache-p3,AChE (colormetric),Neurotoxicity,SH-SY5Y,Neuroblast
1,tox21-ache-p4,AChE (fluorescent),Neurotoxicity,SH-SY5Y,Neuroblast
2,tox21-ache-p5,AChE,Neurotoxicity,,Biochemical
3,tox21-ahr-p1,AhR,NR,HepG2,Liver
4,tox21-ap1-agonist-p1,AP-1 agonist,SR,ME-180,Cervical Cancer


In [4]:
assay_list_nr = assay_list[assay_list['Target Category'] == 'NR']
print(f'There are {len(assay_list_nr)} NR different assays in the Tox21 dataset')
assay_list_nr

There are 38 NR different assays in the Tox21 dataset


Unnamed: 0,Protocol Name,Assay Target,Target Category,Cell Line,Cell Type
3,tox21-ahr-p1,AhR,NR,HepG2,Liver
5,tox21-ar-bla-agonist-p1,AR-BLA agonist,NR,HEK293,Kidney
6,tox21-ar-bla-antagonist-p1,AR-BLA antagonist,NR,HEK293,Kidney
7,tox21-ar-mda-kb2-luc-agonist-p1,AR-MDA agonist,NR,MDA-MB-453,Breast Cancer
8,tox21-ar-mda-kb2-luc-agonist-p3,AR-MDA agonist (with antagonist),NR,MDA-MB-453,Breast Cancer
9,tox21-ar-mda-kb2-luc-antagonist-p1,AR-MDA antagonist,NR,MDA-MB-453,Breast Cancer
10,tox21-ar-mda-kb2-luc-antagonist-p2,AR-MDA antagonist (lower agonist),NR,MDA-MB-453,Breast Cancer
13,tox21-car-agonist-p1,CAR agonist,NR,HepG2,Liver
14,tox21-car-antagonist-p1,CAR antagonist,NR,HepG2,Liver
19,tox21-er-bla-agonist-p2,ER-BLA agonist,NR,HEK293,Kidney


In [5]:
receptor =  assay_list_nr['Protocol Name'].str.replace(r'[\xa0]','', regex=True).str.split('-').str[1]
receptor = list(set(receptor))
print(f'There are {len(receptor)} different NR receptors tested in the Tox21 dataset')

There are 18 different NR receptors tested in the Tox21 dataset


We will only retain the protocols conducted under the same assay readout, specifically the BLA (b-lactamase) reporter gene assay. Different assay readouts are as follows:

- **Luminescence**: e.g., AR-MDA_TOX21_SLP_Version1.0 (non-receptor signaling)
- **Luciferase Reporter**: e.g., tox21-ahr-p1 (Aryl Hydrocarbon Receptor (AhR)); CAR1_TOX21_SLP_Version1.0

Please note that the assay version **Version1.0** is indicated as **-p1** in the assay name.

In [6]:
blactamase_assay = []

for assay in assay_list_nr['Protocol Name']:

    if assay.split('-')[2] == 'bla' or (len(assay.split('-')) > 3 and assay.split('-')[3] == 'bla'):
        blactamase_assay.append(assay)

unique_receptors = list(set(item.split('-')[1] for item in blactamase_assay))

# Print the results
print(f"There are {len(blactamase_assay)} BLA assays for {len(unique_receptors)} unique receptors, namely {unique_receptors}.")

There are 19 BLA assays for 10 unique receptors, namely ['pr', 'pparg', 'gr', 'vdr', 'erb', 'ar', 'rxr', 'fxr', 'er', 'ppard'].


In [7]:
blactamase_assay = [assay.replace('\xa0', '') for assay in blactamase_assay]

We load the library of 10,000 compounds and the Tox21Assay_SLPS assay from 2016, available for download [here](https://clowder.edap-cluster.com/datasets/63602c6de4b04f6bb13dc4d4)

In [8]:
''' 
path = ('../Data_reviewed/Tox21/Assay_aggregrated')

all_assay = glob.glob(os.path.join(path, "*.txt"))
non_empty_assays = [pd.read_table(f) for f in all_assay if not pd.read_table(f).empty]
Tox21_assay = pd.concat(non_empty_assays, ignore_index=True)

Tox21_assay.to_csv('../Data_reviewed/Tox21/Tox21_assay_aggregrated.csv', index=False)
print(f"There is {Tox21_assay['CAS'].nunique()} unique compounds in the Tox21 aggegated dataset")
'''

' \npath = (\'../Data_reviewed/Tox21/Assay_aggregrated\')\n\nall_assay = glob.glob(os.path.join(path, "*.txt"))\nnon_empty_assays = [pd.read_table(f) for f in all_assay if not pd.read_table(f).empty]\nTox21_assay = pd.concat(non_empty_assays, ignore_index=True)\n\nTox21_assay.to_csv(\'../Data_reviewed/Tox21/Tox21_assay_aggregrated.csv\', index=False)\nprint(f"There is {Tox21_assay[\'CAS\'].nunique()} unique compounds in the Tox21 aggegated dataset")\n'

In [None]:
Tox21_assay = pd.read_csv('../Data/Annotations/Tox21_assay_aggregrated.csv',low_memory=False)
print(f"There is {Tox21_assay['CAS'].nunique()} unique compounds in the Tox21 aggegated dataset")
Tox21_assay.head(3)

In [None]:
library_info = pd.read_csv('../Data/Annotations/tox21_10k_library_info.tsv', delimiter='\t')
print(f"There is {library_info['CAS'].nunique()} unique compounds in the Tox21 chemical library dataset")

In [None]:
print(f"There are {Tox21_assay['PROTOCOL_NAME'].nunique()} different assays retrieved, not all of directed toward  NR assays.")

- We only keep the assay that are relevant for us , ie, assay toward NR ; AR, ERα, FXR, PPARδ, PPARγ, RXR, TRβ, VDR, GR, hPXR, AhR, rPXR, CAR, ERR

In [None]:
Tox21_assay = Tox21_assay[Tox21_assay['PROTOCOL_NAME'].isin(blactamase_assay)]
print(f"There are {Tox21_assay['PROTOCOL_NAME'].nunique()} BLA readout assays toward NR receptor .")

In [None]:
sorted_protocol_names = sorted(set(Tox21_assay['PROTOCOL_NAME']))
print(f'There are 15 BLA readout assays toward NR receptor : {sorted_protocol_names}')

In [None]:
create_directory('../Data/Output')

In [15]:
Tox21_annotation = Tox21_assay.merge( library_info, on='SAMPLE_ID', how='right')
Tox21_annotation = Tox21_annotation.dropna(subset=['PUBCHEM_CID'])
Tox21_annotation['PUBCHEM_CID'] = Tox21_annotation['PUBCHEM_CID'].astype(int)


In [16]:
Tox21_annotation.to_csv('../Data/Output/Tox21_annotation.csv',index=False)

Load morphology annotation file : 
- See how many compound overlap in BBC047 and Tox21

In [17]:
#profiles = pd.read_table('../pubchem_annotation_morpho.csv')
#profiles.rename(columns={'CID': 'PUBCHEM_CID'}, inplace=True)
#print(f"There are {profiles['PUBCHEM_CID'].nunique()} unique compounds in the profiles dataset")
#print(f'There are {profiles["PUBCHEM_CID"].isin(Tox21_annotation["PUBCHEM_CID"]).sum()} compounds in the BBC047 dataset that are also in the Tox21 library')
#profiles.head(2)

We used PubChem to retrieve the CID for the 30,615 unique substances tested in BBC047. Since CPD_NAME can identify several molecules without considering stereochemistry, we used Metadata_broad_sample to query PubChem for the CID. If this was unsuccessful, we used the SMILES representation. Five molecules were not identified, resulting in 30,397 unique substances. The CID retrieval code in get_assays was last run on 27/10/2024, taking approximately 4 hours.
the outpout is saved under _pubchem_annotation_october_2.pkl_


In [18]:
annotations_update = pd.read_pickle('../Data/Annotations/pubchem_annotation1_october_2.pkl') #pubchem_annotation1

In [None]:
annotations_update.rename(columns={'CID': 'PUBCHEM_CID'}, inplace=True)
#drop Na
annotations_update = annotations_update.dropna(subset=['PUBCHEM_CID'])
# change to int
annotations_update['PUBCHEM_CID'] = annotations_update['PUBCHEM_CID'].astype(int)

annotations_update.head(5)

In [None]:
print(f"There are {annotations_update['PUBCHEM_CID'].nunique()} unique compounds in the unprocessed profiles dataset")

In [None]:
unique_matches = set(annotations_update["PUBCHEM_CID"]).intersection(set(Tox21_annotation["PUBCHEM_CID"]))
print(f'There are {len(unique_matches)} unique compounds in the BBC047 dataset that are also in the Tox21 library')

Now we merge the morphology file and the tox21

In [22]:
BBC047_mol = annotations_update['PUBCHEM_CID'].to_list() # profiles['PUBCHEM_CID'].to_list()
Tox = Tox21_annotation[Tox21_annotation['PUBCHEM_CID'].isin(BBC047_mol)]

In [None]:
print(f"There are {Tox['SAMPLE_DATA_TYPE'].nunique()} sample data types, namely {list(Tox['SAMPLE_DATA_TYPE'].unique())}")

We are only interested in the activity

In [24]:
Tox_activity = Tox[Tox['SAMPLE_DATA_TYPE'] == 'activity'] #'signal'

In [None]:
print(f"There are {Tox_activity['SAMPLE_NAME_x'].nunique()} unique sample molecules in the BBC047 dataset tested in Tox21 assays")

In [None]:
create_directory('../Data/Output')
Tox_activity.to_csv('../Data/Output/Tox21_activity_BBC047.csv',index=False)

## Dataset curation ; some descriptive analysis

In [27]:
import plotly.express as px

We first start by counting the number of compound (unique) per assay (unique)

In [28]:
assay_compound = {}
for asssay in Tox_activity['PROTOCOL_NAME'].unique():
    assay_compound[asssay] = Tox_activity[Tox_activity['PROTOCOL_NAME'] == asssay].CAS_x.nunique()

In [29]:
assay_compound_df = pd.DataFrame(list(assay_compound.items()), columns=['Assay', 'Compound'])
assay_compound_df = assay_compound_df.sort_values(by='Compound', ascending=True)

In [None]:
fig = px.bar(assay_compound_df, x='Assay', y='Compound', title='Number of compounds tested in each assay')
fig.show()

-  look at all combinations of how sets intersect : using the upsetplot and library we will show the overlap of compound between the assays
** to update **

In [31]:
import upsetplot
from collections import defaultdict
import matplotlib.pyplot as plt
from upsetplot import UpSet
from upsetplot import UpSet, from_memberships

In [32]:
assay_data = {}
for asssay in blactamase_assay:
    assay_data[asssay] = Tox_activity[Tox_activity['PROTOCOL_NAME'] == asssay].CAS_x.unique().tolist()

In [33]:
#TO BE CONTINUED ...

## Endocrine active molecule 

We aim to create a dataset of endocrine-active molecules. Each compound as tested twice for each assay. If the assay outcomes are inconsistent, we will exclude the assay for that compound. Additionally, if an assay is inconclusive—meaning the experimenters could not determine whether the molecule was an agonist, antagonist, inactive, or active—the assay will be discarded. Finally, we will evaluate the remaining assays: if at least one is active, the molecule will be classified as endocrine-active; if all assays are inactive, the compound will be classified as inactive.

We 1st consider the assay measuring the *activty*, therefore we will focus on the *channel outcome* column., reflecting the outcome of the assay under investigation. 

Note : If an assay measures two channels (e.g., a receptor activity channel and a cell viability channel), the CHANNEL_OUTCOME might be “Active” for the receptor but “Inactive” for cell viability. The overall ASSAY_OUTCOME could then be “Inconclusive” or “Active” depending on how the results are interpreted.

In [None]:
Tox_activity.head(2)

In [35]:
def get_unique_molecules_and_assays(dataframe, molecule_column='PUBCHEM_CID', assay_column='PROTOCOL_NAME'):

  
    molecule_list = list(dataframe[molecule_column].unique())
    assay_list_endocrine = list(dataframe[assay_column].unique())
    
    # Create a deep copy of the original DataFrame
    copied_dataframe_activity = dataframe.copy(deep=True)
    
    return molecule_list, assay_list_endocrine, copied_dataframe_activity

#separate the agonist and antagonist protocols name in two distinct lists, make it exact matc
agonist_protocol_names = []
antagonist_protocol_names = []
for protocol_name in sorted_protocol_names:
    if re.search(r'\bagonist\b', protocol_name.lower()):
        agonist_protocol_names.append(protocol_name)
    elif re.search(r'\bantagonist\b', protocol_name.lower()):
        antagonist_protocol_names.append(protocol_name)
    else:
        warnings.warn(f"Protocol name '{protocol_name}' does not contain 'agonist' or 'antagonist'.")

In [36]:
# create two dataframes for agonist and antagonist data
Tox_activity_agonist = Tox_activity[Tox_activity['PROTOCOL_NAME'].isin(agonist_protocol_names)]
Tox_activity_antagonist = Tox_activity[Tox_activity['PROTOCOL_NAME'].isin(antagonist_protocol_names)]

In [37]:
molecule_list, assay_list_endocrine, endocrine_activity = get_unique_molecules_and_assays(Tox_activity)

In [38]:
molecule_list_agonist, assay_list_endocrine_agonist, endocrine_activity_agonist = get_unique_molecules_and_assays(Tox_activity_agonist)
molecule_list_antagonist, assay_list_endocrine_antagonist, endocrine_activity_antagonist = get_unique_molecules_and_assays(Tox_activity_antagonist)

In [41]:
def define_endocrine_activity(mol_list, assay_list, data, activity_dict):
    """
    Process list of molecules overlapping with BBC047 by checking their assay outcomes and define their endocrine activity status.
    
    Parameters:
    mol_list (list): List of molecules to process.
    assay_list (list): List of assays to check for each molecule.
    data (DataFrame): DataFrame containing assay data for different molecules.
    activity_dict (dict): Dictionary to store activity status ('active' or 'inactive') for each molecule.
    
    Returns:
    DataFrame: Updated data_test after filtering out rows based on conditions.
    dict: Updated activity_dict with activity status of molecules.
    """
    for mol in mol_list:
        view_mol = data[data['PUBCHEM_CID'] == mol].copy() #'SAMPLE_NAME_y'

        for assay in assay_list:
            view_assay = view_mol[view_mol['PROTOCOL_NAME'] == assay].copy()


            if len(set(view_assay['CHANNEL_OUTCOME'])) > 1:
                data = data[~data.index.isin(view_assay.index)]

            if any(re.search(r'\binconclusive\b', outcome) for outcome in view_assay['CHANNEL_OUTCOME']): # VS CHANNEL_OUTCOME
                data = data[~data.index.isin(view_assay.index)]
                continue
        if any(re.search(r'\bactive\b', outcome) for outcome in set(view_mol['CHANNEL_OUTCOME'])):
            activity_dict[mol] = 'active'
        else:
            activity_dict[mol] = 'inactive'

    return data, activity_dict


In [42]:
# Compter les occurrences de 'active' et 'inactive'
def count_activie_inactive(dict):
    active_count = sum(1 for value in dict.values() if value == 'active')
    inactive_count = sum(1 for value in dict.values() if value == 'inactive')
    print(f'Number of active molecules: {active_count}')
    print(f'Number of inactive molecules: {inactive_count}')
    return


In [None]:
activity_dict_agonist = {}
updated_data_agonist , activity_agonist = define_endocrine_activity(molecule_list_agonist, assay_list_endocrine_agonist, endocrine_activity_agonist, activity_dict_agonist)
count_activie_inactive(activity_agonist)

In [None]:
activity_dict_antagonist = {}
updated_data_antagonist , activity_antagonist = define_endocrine_activity(molecule_list_antagonist, assay_list_endocrine_antagonist, endocrine_activity_antagonist, activity_dict_antagonist)
count_activie_inactive(activity_antagonist)

In [None]:
activity_dict = {}
updated_data, activity = define_endocrine_activity(molecule_list, assay_list_endocrine, endocrine_activity, activity_dict)
count_activie_inactive(activity)

_Note_: This activity is defined in relation to all NR (nuclear receptor) tested in Tox21, which are classified as endocrine active molecules. We do not consider GPCRs (G protein-coupled receptors) as a pathway for endocrine activity for this case.

Therefore, if a molecule is classified as active in at least one assay after removing inconsistent and inconclusive results, it will be annotated as active. For example, p,p’-DDT is inactive in 8 out of the 9 remaining assays; however, it is active in all 3 replicates of the *Tox21 ER-Bla Agonist P2* assay and will be annotated as endocrine active.



In [44]:
def view_mol(data):
    """
    Display the assay data for a specific molecule in the Tox21 dataset.
    """
    view_mol = data[data['SAMPLE_NAME_x'] == "p,p'-DDT"].copy() #CAS_x ; 50-29-3 
    # Trier le DataFrame view_mol en fonction de la colonne 'PROTOCOL_NAME'
    view_mol.sort_values(by='PROTOCOL_NAME')
    print(data['CHANNEL_OUTCOME'].unique())
    return view_mol


In [None]:
view_mol(updated_data_agonist)

In [None]:
view_mol(updated_data_antagonist)

In [None]:
view_mol(updated_data)

## Nuclear receptor activity


Similarly to the datasets created above , we will create data for each of the 8 receptors 

In [46]:

# Create a dictionary to store the nuclear receptor name and the corresponding assay name using the protocol name list and the list of nuclear receptors defined earlier
nuclear_receptor_dict = {}
for receptor in unique_receptors:
    matching_protocols = []
    for protocol in sorted_protocol_names:
        # Use re.search to match the receptor explicitly in the protocol
        if re.search(r'\b' + re.escape(receptor) + r'\b', protocol): 
            matching_protocols.append(protocol)
    if matching_protocols:
        nuclear_receptor_dict[receptor] = matching_protocols


In [None]:
nuclear_receptor_dict.keys()

In [48]:
#make dataframe for each receptor with the accurate assay
def make_receptor_df(receptor, data):
    """
    Create a DataFrame containing assay data for a specific receptor.
    
    Parameters:
    receptor (str): The name of the receptor to filter the data.
    data (DataFrame): The DataFrame containing assay data for different molecules.
    
    Returns:
    DataFrame: The DataFrame containing assay data for the specified receptor.
    """
    receptor_df = pd.DataFrame()
    for assay in nuclear_receptor_dict[receptor]:
        receptor_df = pd.concat([receptor_df, data[data['PROTOCOL_NAME'] == assay]])
    return receptor_df


In [None]:
# Initialize a dictionary to store DataFrames for each receptor
receptor_dfs = {}

# Loop through each receptor to create a separate DataFrame
for receptor in nuclear_receptor_dict.keys():
    print(f"Creating DataFrame for {receptor}...")
    # Call the function and store the resulting DataFrame in the dictionary
    receptor_dfs[receptor] = make_receptor_df(receptor, Tox_activity)


In [None]:
activity_dict_receptor = {}
updated_data_receptor = {}


for receptor in nuclear_receptor_dict.keys():
    print(f"Defining activity for {receptor}...")


    molecule_list_receptor = list(receptor_dfs[receptor]['PUBCHEM_CID'].unique())

    assay_list_receptor = list(receptor_dfs[receptor]['PROTOCOL_NAME'].unique())

    updated_data_receptor[receptor], activity_dict_receptor[receptor] = define_endocrine_activity(
        molecule_list_receptor, assay_list_receptor, receptor_dfs[receptor], activity_dict_receptor.get(receptor, {})
    )

In [None]:
#count the active and inactive molecules for each receptor caling the function count_activie_inactive
for receptor in activity_dict_receptor.keys():
    print(f"Counting active and inactive molecules for {receptor}...")
    count_activie_inactive(activity_dict_receptor[receptor])
    

In [None]:
receptor_dfs['pparg'][(receptor_dfs['pparg']['PROTOCOL_NAME'] == 'tox21-pparg-bla-agonist-p1') & (receptor_dfs['pparg']['ASSAY_OUTCOME'] == 'inactive')]['CAS_x'].nunique()

## Save the data 

He save as a csv file the data we need to merge to the BBC047 dataset. That is the activity status, the name of the molecule, the smiles and CID.


In [53]:
def get_data_ready(dataframe_activity,dict):
    """
    Prepare the data for visualization by adding the 'Endocrine_activity' column to the DataFrame.
    
    Args:
        data (DataFrame): The DataFrame containing the assay data.
        activity_dict (dict): The dictionary containing the activity status of each molecule.
    
    Returns:
        DataFrame: The updated DataFrame with the 'Endocrine_activity' column.
    """

    
    data_cleaned = pd.DataFrame(list(dict.items()), columns=['PUBCHEM_CID', 'Endocrine_activity']) #CPD_NAME
    dataframe_activity['PUBCHEM_CID'] = dataframe_activity['PUBCHEM_CID'].astype(int)
    
    dataframe_activity = dataframe_activity[['CAS_x','SMILES','PUBCHEM_CID','SAMPLE_NAME_x']]
    dataframe_activity.rename(columns={'SAMPLE_NAME_x':'CPD_NAME', 'CAS_x':'CAS'}, inplace=True)  

    dataframe_activity= data_cleaned.merge(dataframe_activity, on='PUBCHEM_CID', how='right')  
    dataframe_activity = dataframe_activity.groupby('PUBCHEM_CID').first().reset_index()
    print(f"{dataframe_activity['PUBCHEM_CID'].nunique()} unique compounds in the endocrine activity dataset")

    return data_cleaned, dataframe_activity



In [None]:

_,df = get_data_ready(updated_data_agonist,activity_agonist)
create_directory('../Data/Output')
df.to_csv('../Data/Output/Tox21_activity_agonist.csv',index=False)

In [None]:
_,df = get_data_ready(updated_data_antagonist,activity_antagonist)
create_directory('../Data/Output')
df.to_csv('../Data/Output/Tox21_activity_antagonist.csv',index=False)

In [None]:
_,df = get_data_ready(updated_data,activity)
create_directory('../Data/Output')
df.to_csv('../Data/Output/Tox21_activity.csv',index=False)


## Some sanity Check

We noticed that we have less data point when merging the pre-processed data from output_notebook_1 to this endocrine activity data, to see which molecules are concerned, we retrieved the CID of the molecule that have been removed.

In [52]:
missing_cid = [2724385,
 2082,
 5154,
 26596,
 6758,
 3334,
 439501,
 10607,
 36314,
 30323,
 6197,
 15478,
 6167,
 4122,
 4030]

It seems that the endocrine active molecule removed correspond to molecule exhibiting 'DEAD CELL PATTERN' therefore were removed. 

In [None]:
annotations_update[annotations_update['PUBCHEM_CID'].isin(missing_cid)]

### Generate a column for plot in Datagrok 

In [54]:
T = [
    'Bisphenol A diglycidyl ether', 'Dexamethasone acetate', 'Testosterone propionate',
    'Estrone', "p,p'-DDT"
]

In [55]:
#Endocrine_activity['Datagrok'] = Endocrine_activity['CPD_NAME'].apply(lambda x: x if x in T else '')

In [None]:
create_directory('../Data/Output')
#Endocrine_activity.to_csv('../Data/Output/Tox21_Endocrine_Datagrok.csv',index=False)