## Analysis of Harmful Ingredients in Skincare Products and Their Impact on Customer Sentiment and Ratings


### Introduction

The beauty industry is growing rapidly, with ingredients continually evolving to be safer and better for our health. Consumers are increasingly aware of the harmful ingredients used in their products, making it crucial to understand how these potentially harmful substances affect consumer sentiment and product ratings. This project aims to analyze the prevalence of harmful ingredients in skincare products and assess their impact on consumer perceptions and ratings.

In this study, we will:
1. **Identify Harmful Ingredients**: Utilize a reliable database to classify ingredients based on their safety profiles.
2. **Determine Prevalence**: Calculate how common these harmful ingredients are in various skincare products.
3. **Analyze Consumer Sentiment**: Perform sentiment analysis on consumer reviews to gauge awareness and reactions to harmful ingredients.
4. **Evaluate Product Ratings**: Assess how the presence of harmful ingredients affects product ratings.
5. **Correlation Analysis**: Explore the relationship between harmful ingredients, consumer sentiment, and product ratings.
6. **Provide Insights and Recommendations**: Offer actionable recommendations for manufacturers and regulators based on our findings.

In [37]:
pip install pandas

Note: you may need to restart the kernel to use updated packages.


In [38]:
pip install numpy 

Note: you may need to restart the kernel to use updated packages.


In [39]:
pip install matplotlib

Note: you may need to restart the kernel to use updated packages.


In [40]:
pip install xlrd

Note: you may need to restart the kernel to use updated packages.


In [41]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns 

### Identify Harmful Ingredients

In [42]:
# Load the datasets into dataframe
df_skincare = pd.read_csv('Skincare.csv')
df_ingredients = pd.read_excel('Skincare_Ingredients.xls')
print("Skincare datasets attributes:",df_skincare.dtypes)
print("Skincare ingredients attributes:", df_ingredients.dtypes)


Skincare datasets attributes: Label           object
brand           object
name            object
price            int64
rank           float64
ingredients     object
Combination      int64
Dry              int64
Normal           int64
Oily             int64
Sensitive        int64
dtype: object
Skincare ingredients attributes: CAS                              object
List Name                        object
TSCA Chemical Name               object
List Call                        object
Caveat - Chemical Use            object
Edit Description                 object
Date of Edit             datetime64[ns]
dtype: object


In [43]:
df_skincare.head()

Unnamed: 0,Label,brand,name,price,rank,ingredients,Combination,Dry,Normal,Oily,Sensitive
0,Moisturizer,LA MER,Crème de la Mer,175,4.1,"Algae (Seaweed) Extract, Mineral Oil, Petrolat...",1,1,1,1,1
1,Moisturizer,SK-II,Facial Treatment Essence,179,4.1,"Galactomyces Ferment Filtrate (Pitera), Butyle...",1,1,1,1,1
2,Moisturizer,DRUNK ELEPHANT,Protini™ Polypeptide Cream,68,4.4,"Water, Dicaprylyl Carbonate, Glycerin, Ceteary...",1,1,1,1,0
3,Moisturizer,LA MER,The Moisturizing Soft Cream,175,3.8,"Algae (Seaweed) Extract, Cyclopentasiloxane, P...",1,1,1,1,1
4,Moisturizer,IT COSMETICS,Your Skin But Better™ CC+™ Cream with SPF 50+,38,4.1,"Water, Snail Secretion Filtrate, Phenyl Trimet...",1,1,1,1,1


In [44]:
df_ingredients.tail()

Unnamed: 0,CAS,List Name,TSCA Chemical Name,List Call,Caveat - Chemical Use,Edit Description,Date of Edit
752,68605-97-0,"Fatty acids, tallow, hydrogenated, compds. wit...","Fatty acids, tallow, hydrogenated, compds. wit...",Green [Circle],,Chemical added to the list,2012-12-21
753,26590-05-6,"2-Propen-1-aminium, N,N-dimethyl-N-2-propenyl-...","2-Propen-1-aminium, N,N-dimethyl-N-2-propen-1-...",,,Chemical removed from list,2012-12-21
754,68989-22-0,"Zeolites, NaA","Zeolites, NaA",Green [Circle],,Chemical added to the list,2012-12-21
755,1318-02-1,Zeolites,Zeolites,Green [Circle],,Chemical added to the list,2012-12-21
756,27593-14-2,Octyldimethylbetaine,"1-Octanaminium, N-(carboxymethyl)-N,N-dimethyl...",Green [Circle],,Chemical added to the list,2012-12-21


In [45]:
# Clean and preprocess ingredients name in skincare and ingredients dataframe
df_skincare['ingredients'].str.lower().str.strip()


0       algae (seaweed) extract, mineral oil, petrolat...
1       galactomyces ferment filtrate (pitera), butyle...
2       water, dicaprylyl carbonate, glycerin, ceteary...
3       algae (seaweed) extract, cyclopentasiloxane, p...
4       water, snail secretion filtrate, phenyl trimet...
                              ...                        
1467    water, alcohol denat., potassium cetyl phospha...
1468    water, isododecane, dimethicone, butyloctyl sa...
1469    water, dihydroxyacetone, glycerin, sclerocarya...
1470    water, dihydroxyacetone, propylene glycol, ppg...
1471                        visit the dermaflash boutique
Name: ingredients, Length: 1472, dtype: object

In [80]:
df_ingredients['List Name'].str.lower().str.strip()

0                                       chitosan acetate
1                                      sodium levulinate
2                             halogenated aliphatic acid
3      fats and glyceridic oils, vegetable, hydrogenated
4                                            tocopherols
                             ...                        
752    fatty acids, tallow, hydrogenated, compds. wit...
753    2-propen-1-aminium, n,n-dimethyl-n-2-propenyl-...
754                                        zeolites, naa
755                                             zeolites
756                                 octyldimethylbetaine
Name: List Name, Length: 757, dtype: object

In [81]:
# Define labels that indicate harmful ingredients
harmful_labels = ['Yellow [Triangle]','Gray [Square]','Grey [Square]']


In [91]:
# Create list of ingredients that are classified as harmful
harmful_ingredients = set(df_ingredients[df_ingredients['List Call'].isin(harmful_labels)]['List Name'])
print(list(harmful_ingredients))


['L-Menthol', 'Hydroxycitronellal', 'Citronellol', 'Carvone', 'Diisobutyl carbinyl acetate', 'alpha-Isomethylionone', 'Tricyclodecenyl propionate', 'Diethylene glycol hexyl ether', 'Decaldehyde', '3-cis-Hexenyl methyl carbonate', 'Menthol (unspecified isomer)', 'Boron sodium oxide (B4Na2O7)', 'Sodium tripolyphosphate', 'Cyclohexanepropanol, 2,2,6-trimethyl-.alpha.-propyl-', 'C.I. Pigment Yellow 100', 'Boron, trifluoro(tetrahydrofuran)-, (T-4)-, polymer with 3-methyl-3-[(2,2,3,3,3-pentafluoropropoxy)methyl]oxetane, ether with 2,2-dimethyl-1,3-propanediol (2:1), bis(hydrogen sulfate), diammonium salt', '2-Heptylcyclopentan-1-one', 'Boric acid, sodium salt', '1,4-dimethoxybenzene', 'Eucalyptol', 'Methylionone', 'Undecanal', 'Ethyl butyrate', 'Nonanal', 'Phenoxyethanol', 'Dicyclopentadiene propionate', 'Halogenated aliphatic acid', 'Disulfurous acid, disodium salt', 'Siloxanes and Silicones, di-Me, 3-hydroxypropyl Me, ethoxylated propoxylated', 'Methyl undecylenate', 'Isoamyl salicylate', 

### Determine Prevalence of Harmful Ingredients in Skincare Products