## 1. Finding cosmetic products with the potential to impact human hormones

<p>Many chemicals used in cosmetics are absorbed into the skin. These chemicals get into our bloodstream and have the potential to regulate our hormones. 
</p> 

<p>
The purpose of this notebook is to rank and rate products based on the number of potential hormone disruptors/endocrine disruptors. To determine which hormone disruptors are commonly found in cosmetic products and what types of products tend to have these chemicals linked to them. For example, I expect sunscreen products to contain many endocrine disruptors. 
</p>

In this notebook, I use the [Sephora cosmetic dataset from Kaggle](https://www.kaggle.com/datasets/kingabzpro/cosmetics-datasets) coupled with the list of endocrine disruptors found [Toxin and Toxin Target Dataset (T3DB)](http://www.t3db.ca/toxins/T3D4807) 



# Import libraries and load Data

In [1]:
# Import libraries
import pandas as pd
import numpy as np
from sklearn.manifold import TSNE

In [2]:
# Load the cosmetic data
cosmetics_df = pd.read_csv("../input/cosmetics-datasets/cosmetics.csv")
cosmetics_df.head()

Unnamed: 0,Label,Brand,Name,Price,Rank,Ingredients,Combination,Dry,Normal,Oily,Sensitive
0,Moisturizer,LA MER,Crème de la Mer,175,4.1,"Algae (Seaweed) Extract, Mineral Oil, Petrolat...",1,1,1,1,1
1,Moisturizer,SK-II,Facial Treatment Essence,179,4.1,"Galactomyces Ferment Filtrate (Pitera), Butyle...",1,1,1,1,1
2,Moisturizer,DRUNK ELEPHANT,Protini™ Polypeptide Cream,68,4.4,"Water, Dicaprylyl Carbonate, Glycerin, Ceteary...",1,1,1,1,0
3,Moisturizer,LA MER,The Moisturizing Soft Cream,175,3.8,"Algae (Seaweed) Extract, Cyclopentasiloxane, P...",1,1,1,1,1
4,Moisturizer,IT COSMETICS,Your Skin But Better™ CC+™ Cream with SPF 50+,38,4.1,"Water, Snail Secretion Filtrate, Phenyl Trimet...",1,1,1,1,1


# Load toxin data and get endocrine disruptors


The toxin data used in this notebook is from the [Toxin and Toxin Target Dataset (T3DB)](http://www.t3db.ca/toxins/T3D4807).


<p>
toxin_df: contains columns with a basic description of the toxin. Here, the columns Name, Description and CAS number are most important. 
</p>
    
<p>    
mos_df: contains all toxin-target mechanisms of action and references
</p>

<p>
I filtered the dataset for endocrine disruptors by filtering for chemicals that interact with the estrogen receptor 837 chemicals bind the estrogen receptor and these were my starting point for potential hormone regulators </p>
    
`['Target UniProt ID'] == 'P03372' `

<p>
endo_toxins_df: is a filtered version of the toxin_df dataset. Filtered for only the chemicals that bind the estrogen receptor
</p>

In [3]:
# load toxin data
toxins_df = pd.read_csv("../input/toxin-datasets/toxins.csv")
toxins_df.head()
toxins_df.set_index(['T3DB ID'],inplace=True)
toxins_df.head()

Unnamed: 0_level_0,Name,Class,Description,Categories,Types,Synonyms,CAS Number,Chemical Formula,Average Molecular Mass,Monoisotopic Mass,...,OMIM ID,ChEBI ID,BioCyc ID,CTD ID,Stitch ID,PDB ID,ACToR ID,Wikipedia Link,Creation Date,Update Date
T3DB ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
T3D0001,Arsenic,SmallMolecule,Arsenic(As) is a ubiquitous metalloid found in...,"""Cigarette Toxin"", ""Pesticide"", ""Household Tox...","""Inorganic Compound"", ""Metalloid"", ""Arsenic Co...","""Arsenic ion"", ""Arsenic(3+)"", ""Arsenic(3+) ion...",7440-38-2,As,74.92,74.919951,...,,35828,CPD-763,D001151,Arsenic,,6367.0,http://en.wikipedia.org/wiki/Arsenic,2009-03-06 18:57:53 UTC,2014-12-24 20:20:50 UTC
T3D0002,Lead,SmallMolecule,Lead is a soft and malleable heavy and post-tr...,"""Cigarette Toxin"", ""Household Toxin"", ""Industr...","""Inorganic Compound"", ""Metal"", ""Lead Compound""...","""Lead (II) cation"", ""Lead ion"", ""Lead ion (Pb2...",7439-92-1,Pb,207.2,207.975539,...,150500.0,49807,CPD-527,D007854,Lead,PB,6472.0,http://en.wikipedia.org/wiki/Lead,2009-03-06 18:57:54 UTC,2014-12-24 20:20:50 UTC
T3D0003,Mercury,SmallMolecule,Mercury is a metal that is a liquid at room te...,"""Household Toxin"", ""Industrial/Workplace Toxin...","""Inorganic Compound"", ""Metal"", ""Mercury Compou...","""Hg(2+)"", ""Hg2+"", ""Mercuric ion"", ""Mercury ion...",7439-97-6,Hg,200.59,201.970626,...,,16793,CPD-29,D008628,Mercury,HG,6477.0,http://en.wikipedia.org/wiki/Mercury,2009-03-06 18:57:54 UTC,2014-12-24 20:20:50 UTC
T3D0004,Vinyl chloride,SmallMolecule,"Vinyl chloride is a man-made organic compound,...","""Cigarette Toxin"", ""Household Toxin"", ""Industr...","""Organic Compound"", ""Industrial Precursor/Inte...","""Chloroethene"", ""Chloroethylene"", ""Monochloroe...",75-01-4,C2H3Cl,62.498,61.992328,...,,28509,11-DCE,D014752,Vinyl chloride,,1466.0,,2009-03-06 18:57:54 UTC,2014-12-24 20:20:50 UTC
T3D0006,Benzene,SmallMolecule,"Benzene is a toxic, volatile, flammable liquid...","""Cigarette Toxin"", ""Pesticide"", ""Household Tox...","""Organic Compound"", ""Industrial Precursor/Inte...","""Annulene"", ""Aromatic alkane"", ""Benzeen"", ""Ben...",71-43-2,C6H6,78.1118,78.04695,...,111300.0,16716,BENZENE,D001554,Benzene,BNZ,136.0,http://en.wikipedia.org/wiki/benzene,2009-03-06 18:57:54 UTC,2014-12-24 20:20:50 UTC


In [4]:
mos_df = pd.read_csv("../input/toxin-datasets/moas.csv")
# find toxins targeting the estrogen receptor
bind_estrogen = mos_df.loc[mos_df['Target UniProt ID'] == 'P03372','Toxin T3DB ID'].values
print(len(bind_estrogen))

837


In [5]:
# filter toxins dataset for only those that find estrogen
endo_toxins_df = toxins_df.filter(bind_estrogen, axis=0)
endo_toxins_df['PubChem Compound ID'].fillna(0, inplace=True)
endo_toxins_df['PubChem Compound ID'] = endo_toxins_df['PubChem Compound ID'].astype(int)
endo_toxins_df_d = endo_toxins_df['PubChem Compound ID'].to_dict()

In [6]:
# dictionary of pubchem compond id to link toxin id
endo_toxins_df_d

{'T3D0001': 104734,
 'T3D0007': 31193,
 'T3D0012': 3036,
 'T3D0013': 40470,
 'T3D0014': 38018,
 'T3D0021': 3035,
 'T3D0025': 6294,
 'T3D0027': 36400,
 'T3D0031': 5284469,
 'T3D0048': 0,
 'T3D0050': 13089,
 'T3D0051': 38037,
 'T3D0055': 13940,
 'T3D0067': 6434141,
 'T3D0144': 8268,
 'T3D0151': 4211,
 'T3D0223': 234,
 'T3D0224': 518740,
 'T3D0241': 24501,
 'T3D0243': 443495,
 'T3D0061': 4115,
 'T3D0075': 167250,
 'T3D0138': 299,
 'T3D0187': 5359268,
 'T3D0390': 249266,
 'T3D0391': 16322,
 'T3D0392': 16323,
 'T3D0393': 25622,
 'T3D0394': 27959,
 'T3D0395': 33100,
 'T3D0396': 36399,
 'T3D0397': 36982,
 'T3D0398': 36980,
 'T3D0399': 36342,
 'T3D0400': 16307,
 'T3D0401': 18102,
 'T3D0402': 18101,
 'T3D0403': 36981,
 'T3D0404': 16308,
 'T3D0405': 38032,
 'T3D0406': 37804,
 'T3D0407': 37803,
 'T3D0408': 38029,
 'T3D0409': 38034,
 'T3D0410': 41541,
 'T3D0411': 38035,
 'T3D0412': 41555,
 'T3D0413': 41540,
 'T3D0414': 41551,
 'T3D0415': 38033,
 'T3D0416': 38030,
 'T3D0417': 23448,
 'T3D0418': 275

# Make cosmetic and highlight endocrine disruptors


A major issue connecting the two datasets is that the chemical name is not a unique key and each chemical can have many synonyms. I first labeled the two datasets with a pubchem ids (CID) to connect them with a common key. The toxin dataset already has a CID label for each column/chemical, while each ingredient for each cosmetic did not. I used the pubchem library to find CIDs for each ingredient in the cosmetic dataset. The code for adding a CID to each ingredient is in the **add_cids** function in the cosmetic class below. 



In [7]:
#TODO connect pid to toxin class which contains pid, name, synomomys and toxin informaion
# todo define class and what each function does
class Cosmetic:
    def __init__(self,ingredients,name):
        """takes row from cosmetic dataset gets the name and ingredients for the product"""
        self.ingredients = ingredients.strip('.').split(', ')
        self.name = name
        self.cids = []
        self.toxic_cids = []
        self.toxins = []
        self.ihtml = None
        
        
    def add_cids(self):
        """if cid in pubchem then add number else zero, todo this step is slow!!"""
        import pubchempy as pcp
        cids = []
        for ingredient in self.ingredients:
            results = pcp.get_compounds(ingredient.strip(), 'name')
            if results:
                cids.append(results[0].cid)
            else:
                cids.append(0)
        self.cids = cids

    def add_toxins(self,endotoxins):
        "this uses a list of endotoxins to create add toxins to the list and "
        ihtml = "## <span style='color: black;'>Ingredients From Packaging:\n\n"
        colors = ['#999999','black','#8B0000']
        state = 1
        for i,cid in enumerate(self.cids):
            if cid == 0:
                new_state = 0
            elif cid in endotoxins:
                new_state = 2
                self.toxins.append(self.ingredients[i])
                self.toxic_cids.append(cid)
            else:
                new_state = 1
                
            if state == new_state:
                ihtml += "{0}, ".format(self.ingredients[i])
            else:
                ihtml += "</span><span style='color:{0};'>{1}, ".format(colors[new_state],self.ingredients[i])
            state = new_state
        
        self.ihtml = ihtml.rstrip(', ') + '</span>'
    
    
    def html_ingredients(self):
        self.html_ingredients_list = []
        for i in self.ingredients():
            if i in self.toxins:
                self.html_ingredients.append("<p style='color: red;'><u>{0}<u></p>".format(i))
            else:
                self.html_ingredients.append(str(i))

In [8]:
cosmetics = []
for index, row in list(cosmetics_df.iterrows())[0:10]:
    cosmetic = Cosmetic(row['Ingredients'], row['Name'])
    cosmetics.append(cosmetic)

print(len(cosmetics))

10


In [9]:
cosmetics[0].add_cids()

In [10]:
cosmetics[0].add_toxins(list(endo_toxins_df_d.values()))

In [11]:
print("# <u>Product: {0}<u>\n\n{1}\n\n## The Following Ingredients Potentially Alter Estrogen In The Body:\n{2} ".format(cosmetics[0].name,cosmetics[0].ihtml, cosmetics[0].toxins))

# <u>Product: Crème de la Mer<u>

## <span style='color: black;'>Ingredients From Packaging:

</span><span style='color:#999999;'>Algae (Seaweed) Extract, </span><span style='color:black;'>Mineral Oil, </span><span style='color:#999999;'>Petrolatum, </span><span style='color:black;'>Glycerin, Isohexadecane, </span><span style='color:#999999;'>Microcrystalline Wax, Lanolin Alcohol, Citrus Aurantifolia (Lime) Extract, Sesamum Indicum (Sesame) Seed Oil, Eucalyptus Globulus (Eucalyptus) Leaf Oil, Sesamum Indicum (Sesame) Seed Powder, Medicago Sativa (Alfalfa) Seed Powder, Helianthus Annuus (Sunflower) Seedcake, Prunus Amygdalus Dulcis (Sweet Almond) Seed Meal, </span><span style='color:black;'>Sodium Gluconate, Copper Gluconate, Calcium Gluconate, Magnesium Gluconate, Zinc Gluconate, Magnesium Sulfate, Paraffin, Tocopheryl Succinate, Niacin, Water, Beta-Carotene, Decyl Oleate, </span><span style='color:#8B0000;'>Aluminum Distearate, </span><span style='color:black;'>Octyldodecanol, Citric 

# <u>Product: Crème de la Mer<u>

## <span style='color: black;'>Ingredients From Packaging:

</span><span style='color:#999999;'>Algae (Seaweed) Extract, </span><span style='color:black;'>Mineral Oil, </span><span style='color:#999999;'>Petrolatum, </span><span style='color:black;'>Glycerin, Isohexadecane, </span><span style='color:#999999;'>Microcrystalline Wax, Lanolin Alcohol, Citrus Aurantifolia (Lime) Extract, Sesamum Indicum (Sesame) Seed Oil, Eucalyptus Globulus (Eucalyptus) Leaf Oil, Sesamum Indicum (Sesame) Seed Powder, Medicago Sativa (Alfalfa) Seed Powder, Helianthus Annuus (Sunflower) Seedcake, Prunus Amygdalus Dulcis (Sweet Almond) Seed Meal, </span><span style='color:black;'>Sodium Gluconate, Copper Gluconate, Calcium Gluconate, Magnesium Gluconate, Zinc Gluconate, Magnesium Sulfate, Paraffin, Tocopheryl Succinate, Niacin, Water, Beta-Carotene, Decyl Oleate, </span><span style='color:#8B0000;'>Aluminum Distearate, </span><span style='color:black;'>Octyldodecanol, Citric Acid, Cyanocobalamin, Magnesium Stearate, Panthenol, Limonene, </span><span style='color:#8B0000;'>Geraniol, </span><span style='color:black;'>Linalool, Hydroxycitronellal, Citronellol, Benzyl Salicylate, Citral, Sodium Benzoate, Alcohol Denat., </span><span style='color:#999999;'>Fragrance</span>

## The Following Ingredients Potentially Alter Estrogen In The Body:
['Aluminum Distearate', 'Geraniol'] 