<a href="https://colab.research.google.com/github/jashwanthikaa/Chemicals-in-Cosmetics/blob/master/Chemicals_in_Cosmetics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Chemicals in Cosmetics**

*   **Data Description:** This dataset by the California Safe Cosmetics Program (CSCP) provides hazardous chemicals in cosmetics sold in California. It points to ingredients known to cause cancer, birth defects, or reproductive harm. Any company whose annual sales exceed $1 million has to report these products; some may not have been reported because of non-compliance.

*   **Solution:** The dataset analysis under this project provides the trend of hazardous chemicals found in cosmetics. Key risk visualizations and statistical insights are provided on patterns observed. These findings aim at enhancing the awareness of the public to meet safety regulations better.



In [20]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [21]:
# 1. Loading and Exploring Data
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
# Load datasets

# Chemicals dataset provides information on hazardous chemicals in cosmetics
chemicals_df = pd.read_csv('/content/drive/MyDrive/Freeform Assessment Solution/chemicals-in-cosmetics-.csv')

# Categories dataset describes product categories
categories_df = pd.read_excel('/content/drive/MyDrive/Freeform Assessment Solution/chemicalsincosmetics-dd-subcategories.xlsx')

In [22]:
#Summarizing the Data(chemicals_df)
#Check structure and data types and Display

print("Basic Summary of Chemicals Data:")
chemicals_df.info()
chemicals_df

Basic Summary of Chemicals Data:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 114635 entries, 0 to 114634
Data columns (total 22 columns):
 #   Column                  Non-Null Count   Dtype  
---  ------                  --------------   -----  
 0   CDPHId                  114635 non-null  int64  
 1   ProductName             114635 non-null  object 
 2   CSFId                   80662 non-null   float64
 3   CSF                     80237 non-null   object 
 4   CompanyId               114635 non-null  int64  
 5   CompanyName             114635 non-null  object 
 6   BrandName               114408 non-null  object 
 7   PrimaryCategoryId       114635 non-null  int64  
 8   PrimaryCategory         114635 non-null  object 
 9   SubCategoryId           114635 non-null  int64  
 10  SubCategory             114635 non-null  object 
 11  CasId                   114635 non-null  int64  
 12  CasNumber               108159 non-null  object 
 13  ChemicalId              114635 non-null  

Unnamed: 0,CDPHId,ProductName,CSFId,CSF,CompanyId,CompanyName,BrandName,PrimaryCategoryId,PrimaryCategory,SubCategoryId,...,CasNumber,ChemicalId,ChemicalName,InitialDateReported,MostRecentDateReported,DiscontinuedDate,ChemicalCreatedAt,ChemicalUpdatedAt,ChemicalDateRemoved,ChemicalCount
0,2,ULTRA COLOR RICH EXTRA PLUMP LIPSTICK-ALL SHADES,,,4,New Avon LLC,AVON,44,Makeup Products (non-permanent),53,...,13463-67-7,6,Titanium dioxide,06/17/2009,08/28/2013,02/01/2011,07/09/2009,07/09/2009,,1
1,3,Glover's Medicated Shampoo,,,338,J. Strickland & Co.,Glover's,18,Hair Care Products (non-coloring),25,...,65996-92-1,4,Distillates (coal tar),07/01/2009,07/01/2009,,07/01/2009,07/01/2009,,2
2,3,Glover's Medicated Shampoo,,,338,J. Strickland & Co.,Glover's,18,Hair Care Products (non-coloring),25,...,140-67-0,5,Estragole,07/01/2009,07/01/2009,,07/02/2009,07/02/2009,,2
3,4,PRECISION GLIMMER EYE LINER-ALL SHADES �,,,4,New Avon LLC,AVON,44,Makeup Products (non-permanent),46,...,13463-67-7,7,Titanium dioxide,07/09/2009,08/28/2013,,07/09/2009,07/09/2009,,1
4,5,AVON BRILLIANT SHINE LIP GLOSS-ALL SHADES �,,,4,New Avon LLC,AVON,44,Makeup Products (non-permanent),52,...,13463-67-7,8,Titanium dioxide,07/09/2009,08/28/2013,02/01/2011,07/09/2009,07/09/2009,,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
114630,41523,HYDRA-LIP TRANSLUCENT COLOR LIPSTICK,65001.0,Rosa Soft,1259,"Yanbal USA, Inc",YANBAL,44,Makeup Products (non-permanent),53,...,13463-67-7,68059,Titanium dioxide,06/19/2020,06/19/2020,,06/19/2020,06/19/2020,,1
114631,41523,HYDRA-LIP TRANSLUCENT COLOR LIPSTICK,65002.0,Malva Spirit,1259,"Yanbal USA, Inc",YANBAL,44,Makeup Products (non-permanent),53,...,13463-67-7,68060,Titanium dioxide,06/19/2020,06/19/2020,,06/19/2020,06/19/2020,,1
114632,41523,HYDRA-LIP TRANSLUCENT COLOR LIPSTICK,65003.0,Rojo Fashion,1259,"Yanbal USA, Inc",YANBAL,44,Makeup Products (non-permanent),53,...,13463-67-7,68061,Titanium dioxide,06/19/2020,06/19/2020,,06/19/2020,06/19/2020,,1
114633,41523,HYDRA-LIP TRANSLUCENT COLOR LIPSTICK,65004.0,Terra Mystic,1259,"Yanbal USA, Inc",YANBAL,44,Makeup Products (non-permanent),53,...,13463-67-7,68062,Titanium dioxide,06/19/2020,06/19/2020,,06/19/2020,06/19/2020,,1


In [23]:
#Summarizing the Data(categories_df)

#Check structure and data types and Display

print("Basic Summary of Categories Data:")
categories_df.info()
categories_df

Basic Summary of Categories Data:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13 entries, 0 to 12
Data columns (total 2 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   Primary Category  13 non-null     object
 1   SubCategory       13 non-null     object
dtypes: object(2)
memory usage: 336.0+ bytes


Unnamed: 0,Primary Category,SubCategory
0,Baby Products,"Baby Shampoos, Baby Skin Care, Baby Wash/Soap,..."
1,Bath Products,"Bath Additives, Body Washes and Soaps, Bubble ..."
2,Fragrances,"Cologne, Perfumes - Oils and Lotions, Perfumes..."
3,Hair Care Products (non-coloring),"Hair Conditioners (leave-in), Hair Conditioner..."
4,Hair Coloring Products,"Hair Bleaches, Hair Color Sprays (aerosol), Ha..."
5,Makeup Products (non-permanent),"Blushes, Eye Shadow, Eyeliner/Eyebrow Pencils,..."
6,Nail Products,"Artificial Nails and Related Products, Basecoa..."
7,Oral Hygiene Products,"Mouthwashes and Breath Fresheners, Teeth Clean..."
8,Personal Care Products,"Antiperspirants (making a cosmetic claim), Dou..."
9,Shaving Products,"Aftershave Products, Shaving Cream and other B..."


In [24]:
# Checking for missing values

print("Missing values in Chemicals Data:")
chemicals_df.isnull().sum()

Missing values in Chemicals Data:


Unnamed: 0,0
CDPHId,0
ProductName,0
CSFId,33973
CSF,34398
CompanyId,0
CompanyName,0
BrandName,227
PrimaryCategoryId,0
PrimaryCategory,0
SubCategoryId,0
