## This notebook will utilize the dynamic and interactive library [Plotly](https://plotly.com/)

-> During wesbite development, [Docpane](https://docs.datapane.com/) will be used to display the interactive plots in the website

In [2]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
%matplotlib inline
import scipy as sp
# import Circos
import plotly.express as px
import plotly.graph_objs as go
import markdown

In [3]:
df = pd.read_csv('../data/processed/cleaned_data.csv')
df.head()

Unnamed: 0,Label,Brand,Name,Price,Rank,Ingredients
0,3.0,64.0,Crème de la Mer,175.0,4.1,"Algae (Seaweed) Extract, Mineral Oil, Petrolat..."
1,3.0,95.0,Facial Treatment Essence,179.0,4.1,"Galactomyces Ferment Filtrate (Pitera), Butyle..."
2,3.0,29.0,Protini™ Polypeptide Cream,68.0,4.4,"Water, Dicaprylyl Carbonate, Glycerin, Ceteary..."
3,3.0,64.0,The Moisturizing Soft Cream,175.0,3.8,"Algae (Seaweed) Extract, Cyclopentasiloxane, P..."
4,3.0,49.0,Your Skin But Better™ CC+™ Cream with SPF 50+,38.0,4.1,"Water, Snail Secretion Filtrate, Phenyl Trimet..."


### Circos graph used to visualize the relationship between the high end brand companies that utilize

In [4]:
c = circos.Circos()
c.track("data", "../references/clean_products.txt")
c.plot("circos.png", dpi=150)

NameError: name 'circos' is not defined

### First we need to create a mini dataframe that stores all of the categories (low, moderate, high) using the generated txt [files](../references) 

In [5]:
#LOW
df = pd.read_csv('../references/low_hazard_products.txt', header=None)
df['Toxin'] = df[0].str.extract(r'(No|Oxybenzone|Resorcinol|Formaldehyde|Diethanol|Silane|Siloxane|Octinaxate)', expand=True)
df['Product'] = df[0].str.extract(r'\d+ - (.*)', expand=True)
df = df.drop(0, axis=1)

# Replacing all "NaN" values with the correct Toxin name
df.loc[0, 'Toxin'] = "Ethanolamine"
df.loc[2:13,'Toxin'] = "Oxybenzone"
df.loc[15, 'Toxin'] = "Ethanolamine"
df.loc[16, 'Toxin'] = "Formaldehyde"
df.loc[17, 'Toxin'] = "Diethanol"
df.loc[19:67,'Toxin'] = "Silane"
df.loc[69:71,'Toxin'] = "Silane"
df.loc[72, 'Toxin'] = "Octinaxate"

# Removing all unrelated 'NaN' values
df = df.drop([2,14,18,68], axis=0)

# Assigning the empty cells in 'Product' column to 'No products found'
df['Product'].fillna('No products found', inplace=True)

df.to_csv('../data/interim/low_hazard_data.csv')

df.head(10)

Unnamed: 0,Toxin,Product
0,Ethanolamine,No products found
1,Oxybenzone,No products found
3,Oxybenzone,Camera Ready CC Cream Broad Spectrum SPF 30 Da...
4,Oxybenzone,Lingerie de Peau BB Cream
5,Oxybenzone,Exfoliating Scrub
6,Oxybenzone,Hydra Life BB Creme Broad Spectrum SPF 30
7,Oxybenzone,The Broad Spectrum SPF 50 UV Protecting Fluid
8,Oxybenzone,Ultimate Sun Protection Spray Broad Spectrum S...
9,Oxybenzone,Broad Spectrum SPF 50 Sunscreen Face Cream
10,Oxybenzone,DayWear UV Base Advanced Anti-Oxidant & UV Def...


In [14]:
#MODERATE
df = pd.read_csv('../references/mod_hazard_products.txt', header=None, sep='\t')
df['Toxin'] = df[0].str.extract(r'(No|Fragrance|Octinoxates|Homosalate|Teflon)', expand=True)
df['Product'] = df[0].str.extract(r'\d+ - (.*)', expand=True)
df = df.drop(0, axis=1)

df.loc[0:132, 'Toxin'] = "Parfum"
df.loc[134:633,'Toxin'] = "Fragrance"
df.loc[133,'Toxin'] = "Fragrance"
df.loc[636:639, 'Toxin'] = "Triclosan"
df.loc[640:656, 'Toxin'] = "Homosalate"

df = df.drop([1,132,634, 635, 653], axis=0)

df['Product'].fillna('No products found', inplace=True)

df.to_csv('../data/interim/mod_hazard_data.csv')

df.head(10)

Unnamed: 0,Toxin,Product
0,Parfum,No products found
2,Parfum,Benefiance WrinkleResist24 Night Cream
3,Parfum,Goodnight Glow Retin-ALT Sleeping Crème
4,Parfum,Beauty Elixir
5,Parfum,The Silk Cream
6,Parfum,Vinosource Moisturizing Sorbet
7,Parfum,Luminous Dewy Skin Night Concentrate
8,Parfum,Seaberry Moisturizing Face Oil
9,Parfum,Renewed Hope in A Jar Refreshing & Refining Mo...
10,Parfum,Cold Plasma Sub-D Firming Neck Treatment


In [22]:
#HIGH
df = pd.read_csv('../references/high_hazard_products.txt', header=None, sep='\t')
df['Toxin'] = df[0].str.extract(r'(No|Fragrance|Octinoxates|Homosalate|Teflon)', expand=True)
df['Product'] = df[0].str.extract(r'\d+ - (.*)', expand=True)
df = df.drop(0, axis=1)

df.loc[0, 'Toxin'] = "Talc"
df.loc[1:29, 'Toxin'] = 'Talc'
df.loc[32:188,'Toxin'] = "Propylene Glycol"


df = df.drop([0,30,31], axis=0)

df['Product'].fillna('No products found', inplace=True)

df.to_csv('../data/interim/high_hazard_data.csv')

df.head(10)

Unnamed: 0,Toxin,Product
1,Talc,CC Crème High Definition Radiance Face Cream S...
2,Talc,Bio-Performance Advanced Super Restoring Cream
3,Talc,Skin Perfecting Lotion - Blemish Prone/Oily Skin
4,Talc,Bio-Performance Glow Revival Cream
5,Talc,BB Crème au Ginseng
6,Talc,Black Label Detox BB Beauty Balm
7,Talc,Bio-Performance LiftDynamic Cream
8,Talc,Ibuki Beauty Sleeping Mask
9,Talc,Benefiance Extra Creamy Cleansing Foam
10,Talc,Pureness Deep Cleansing Foam


## Toxic dataframe
Combine all three toxicities to form a combined dataframe with low, moderate, and high hazard products

### [Sunburst Plot](https://plotly.com/python/sunburst-charts/#basic-sunburst-plot-with-plotlyexpress) of the three different categories of hazardness
- Low
- Moderate
- High

In [None]:
# read in the three dataframes
low_hazard_data = pd.read_csv('../data/interim/low_hazard_data.csv')
mod_hazard_data = pd.read_csv('../data/interim/mod_hazard_data.csv')
high_hazard_data = pd.read_csv('../data/interim/high_hazard_data.csv')

# concatenate the three dataframes into a single dataframe
df = pd.concat([low_hazard_data, mod_hazard_data, high_hazard_data])

# create the sunburst chart
fig = px.sunburst(
    df,
    names='toxic_category',
    parents='toxic_chemical',
    values='Product'
)

fig.show()

### [Bar Charts](https://plotly.com/python/bar-charts/) of the three different categories of hazardness
- Low
- Moderate
- High

In [6]:
data_canada = px.data.gapminder().query("country == 'India'")
fig = px.bar(df, x='Brand', y='Rank')
fig.show()

### [Line Chart](https://plotly.com/python/line-charts/) of the variety of the categories in the df
- Product Type
- Skin Type
- Brand

In [None]:


# create the dataframe with the brand name and count of products containing each toxin
df_toxins = df.groupby(['brand', 'toxin']).count()

# create the heatmap
fig = go.Figure(data=go.Heatmap(
                   z=df_toxins['count'],
                   x=df_toxins.index.get_level_values('brand'),
                   y=df_toxins.index.get_level_values('toxin'),
                   colorscale='Viridis'))
fig.show()
