# Foundation Shade Diversity

## Table of Contents

1. [**Introduction**](Introduction)
    - Project Description
    - Data Description
2. [**Acquiring and Loading Data**](#2)
	- Importing Libraries and Notebook Setup
    - Loading Data
    - Basic Data Exploration
    - Areas to Fix
3. [**Data Proprocessing**](#3)
4. [**Data Analysis**](#4)
5. [**Conclusion**](#5)
    - Insights
    - Suggestions
    - Possible Next Steps
6. [**Epilogue**](#6) 
    - References
    - Versioning

## Introduction

### Project Description

**Goal/Purpose:** 

In this EDA notebook, we aim to explore the diversity of foundation shades in makeup products. We will analyze a dataset of makeup products and their corresponding shades to gain insights into the range and inclusivity of shades offered by different brands. By visualizing the data and conducting statistical analyses, we seek to provide a comprehensive understanding of foundation shade diversity in the cosmetics industry.
<p>&nbsp;</p>

**Questions to be Answered:**

1. What is the distribution of foundation shades across different brands and product lines?
    - Are some brands more diverse in their shade offerings compared to others?
2. How does foundation shade diversity vary across price points and product types?
    - Do high-end brands offer a wider range of shades compared to budget-friendly brands?
3. Is there evidence of inclusivity in foundation shades for various skin tones?
    - What is the representation of shades for fair, medium, and dark skin tones?
    - Are there gaps in shade options for specific skin tones, and if so, which brands are addressing these gaps?

 

<p>&nbsp;</p>

**Assumptions/Methodology/Scope:** 

Briefly describe assumptions, processing steps, and the scope of this project.

<p>&nbsp;</p>

### Data Description

**Content:** 

This dataset is a _(filetype) file of _(how many) data points which contains ___. 

<p>&nbsp;</p>

**Description of Attributes:** 



| Column  | Description |
| :------ | :---------- |
| column1 | description1 |

<p>&nbsp;</p>

**Acknowledgements:** 

This dataset is provided by _(. The original dataset was scraped by _) and the original source can be found on [website](https://website.link).

---

## Acquiring and Loading Data
### Importing Libraries and Notebook Setup

In [2]:
# Data manipulation
import numpy as np
import pandas as pd
from unidecode import unidecode
# Visualizations
import plotly.express as px

#Preliminary Data Exploration
import exploration_helpers as eh


### Loading Data

In [3]:
# Loading All Shades Dataframe
all_shades = pd.read_csv('../datasets/allShades.csv', usecols=['brand','product','description','name','specific','hex']) # Only keeping features for our purpose

## Basic Data Exploration

### All Shades

In [4]:
# Retrieving general info and sample of 'all_shades' DataFrame
all_shades.info()
all_shades.sample(10)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6816 entries, 0 to 6815
Data columns (total 6 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   brand        6816 non-null   object
 1   product      6816 non-null   object
 2   description  6816 non-null   object
 3   name         4955 non-null   object
 4   specific     4905 non-null   object
 5   hex          6816 non-null   object
dtypes: object(6)
memory usage: 319.6+ KB


Unnamed: 0,brand,product,description,name,specific,hex
2872,Tarte,Babassu Foundcealer Skincare Foundation Broad ...,14S Fair Sand (fair skin w/ yellow undertones),Fair Sand,14S,#EBCDAF
1345,Too Faced,Born This Way Undetectable Medium-to-Full Cove...,Light Beige (light w/ neutral undertones),Light Beige,,#D7B586
3353,Morphe,Morphe 2 Hint Hint Skin Tint,Hint of Toffee (medium tan with neutral pink u...,Hint of Toffee,,#E7AC89
1653,Makeup Revolution,Conceal & Define Full Coverage Foundation,F0.2 (for fairest skin tones w/ a light yellow...,,F0.2,#E1CDAA
3216,Milani,Screen Queen Foundation,Spiced Toffee,Spiced Toffee,,#995E2D
5069,PAT McGRATH LABS,Sublime Perfection Foundation,Medium Deep 23 medium deep with yellow undertones,Medium Deep,23,#AF7A4F
1934,Clinique,Even Better Makeup Broad Spectrum SPF 15,"WN 76 Toasted Wheat (medium, warm-neutral unde...",Toasted Wheat,WN 76,#C99567
5984,TOM FORD,Traceless Foundation Stick,3.5 Ivory Rose,Ivory Rose,3.5,#F6CDB5
1149,L'Oréal,Infallible 24HR Fresh Wear Foundation In A Powder,Ivory Buff,Ivory Buff,,#EBC99E
1630,e.l.f. Cosmetics,Flawless Finish Foundation,Tan ((tan with cool pink undertones) Online Only),Tan Online Only),,#DFAB78


In [5]:
# # Print the percentage similarity of values (the lower %, the better)
num_unique = all_shades.nunique().sort_values()
print('---- Percentage Similarity of Values (%) -----')
print(100/num_unique)

---- Percentage Similarity of Values (%) -----
brand          0.934579
product        0.304878
name           0.081900
specific       0.058038
description    0.017784
hex            0.015990
dtype: float64


In [6]:
# Viewing summary statistics
all_shades.describe(include='all')


Unnamed: 0,brand,product,description,name,specific,hex
count,6816,6816,6816,4955,4905,6816
unique,107,328,5623,1221,1723,6254
top,bareMinerals,Studio Fix Fluid SPF 15 Foundation,Light,Neutral,2,#F9F9F9
freq,370,63,16,137,60,6


#### Missing Data

In [7]:
# Checking 'all_shades' dataframe for missing values using missing_values()
missing_shade_percent, missing_shade_total = eh.missing_values(all_shades)

---- Percentage of Missing Values (%) ----- 
specific    28.036972
name        27.303404
dtype: float64
---- Number of Missing Shade Values (%) ----- 
specific    1911
name        1861
dtype: int64


In [8]:
# Investigating missing data in 'all_shades' DataFrame

#Vewing missing values in the 'name' column
missing_names = all_shades['name'].isnull()
all_shades[missing_names].groupby(['brand','product']).apply(lambda group: group.sample(1))

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,brand,product,description,name,specific,hex
brand,product,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Anastasia Beverly Hills,Luminous Foundation,47,Anastasia Beverly Hills,Luminous Foundation,570N (deep skin with a neutral red undertone),,570N,#452A21
Armani Beauty,Designer Lift Smoothing Firming Full Coverage Foundation with SPF 20,6188,Armani Beauty,Designer Lift Smoothing Firming Full Coverage ...,5.5 for medium skin with neutral undertones,,5.5,#DCA771
Armani Beauty,Luminous Silk Compact Powder Foundation,6161,Armani Beauty,Luminous Silk Compact Powder Foundation,4.5 light to medium skin with neutral undertone,,4.5,#DAB59D
Armani Beauty,Luminous Silk Perfect Glow Flawless Oil-Free Foundation,4122,Armani Beauty,Luminous Silk Perfect Glow Flawless Oil-Free F...,"6.25 medium to tan, golden",,6.25,#D9A67D
Armani Beauty,Maestro Fusion Makeup SPF 15 Liquid Foundation,6734,Armani Beauty,Maestro Fusion Makeup SPF 15 Liquid Foundation,2 fair skin with warm undertone - Selected,,2,#D8B28D
...,...,...,...,...,...,...,...,...
Yves Saint Laurent,Touche Eclat Le Teint Radiant Liquid Foundation,5799,Yves Saint Laurent,Touche Eclat Le Teint Radiant Liquid Foundation,B45 Medium neutral undertone,,B45,#C0A18C
florence by mills,Like a Light Skin Tint,3153,florence by mills,Like a Light Skin Tint,F020 (fair w/ neutral undertones),,F020,#F4B68A
jane iredale,Beyond Matte Liquid Foundation,1787,jane iredale,Beyond Matte Liquid Foundation,M8 (medium neutral),,M8,#D7995D
rms beauty,"""Un"" Cover-Up Cream Foundation",6531,rms beauty,"""Un"" Cover-Up Cream Foundation",00 a light shade for fair skin,,00,#F9D8C5


In [9]:

#Viewing missing values in 'specific' column
missing_specifics = all_shades['specific'].isnull()
all_shades[missing_specifics].groupby(['brand','product']).apply(lambda group: group.sample(1))


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,brand,product,description,name,specific,hex
brand,product,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Almay,Smart Shade Skintone Matching Makeup,2519,Almay,Smart Shade Skintone Matching Makeup,Medium,Medium,,#D2A28C
Anastasia Beverly Hills,Stick Foundation,6167,Anastasia Beverly Hills,Stick Foundation,Banana highlight pale yellow - Selected,Banana,,#DDAE8F
Antonym,Certified Organic Baked Foundation,6812,Antonym,Certified Organic Baked Foundation,Dark - Selected,Dark,,#D39C7B
Antonym,Skin Esteem Organic Liquid Foundation,6742,Antonym,Skin Esteem Organic Liquid Foundation,Dark,Dark,,#B0867B
Au Naturale,Creme Starter Kit,3946,Au Naturale,Creme Starter Kit,Medium,Medium,,#C29679
...,...,...,...,...,...,...,...,...
boscia,Skin Perfecting BB Cream Broad Spectrum SPF 30,6703,boscia,Skin Perfecting BB Cream Broad Spectrum SPF 30,Venice soft seashell with neutral undertons ...,Venice,,#E6C4AF
e.l.f. Cosmetics,Flawless Finish Foundation,1625,e.l.f. Cosmetics,Flawless Finish Foundation,Buff (light w/ peachy undertones),Buff,,#DDA978
jane iredale,Liquid Minerals A Foundation,3593,jane iredale,Liquid Minerals A Foundation,Latte (medium dark w/ gold brown undertones),Latte,,#DFAB79
jane iredale,PurePressed Base Mineral Foundation Refill,1314,jane iredale,PurePressed Base Mineral Foundation Refill,Warm Silk (light w/ gold undertones),Warm Silk,,#E2BF9B


#### Duplicate Rows

In [10]:
# converting values in shades df to lower case and removing special characters for ease of identifying duplicating rows
#Converting brand names to lower case to expedite analysis 
all_shades['brand']= all_shades['brand'].str.lower()
#Removing special characters in brand names
all_shades['brand'] = all_shades['brand'].apply(lambda x: unidecode(x))
#verifying changes
all_shades['brand'].unique()


array(['anastasia beverly hills', 'becca cosmetics', 'benefit cosmetics',
       'it cosmetics', 'kvd vegan beauty', 'lancome', 'laura mercier',
       'nars', 'nudestix', 'perricone md', 'smashbox', 'tarte',
       'too faced', 'urban decay cosmetics', 'covergirl', 'ulta',
       'j.cat beauty', 'catrice', 'blk/opl', 'maybelline', 'bareminerals',
       'makeup revolution', 'cover fx', 'hourglass',
       'nyx professional makeup', 'w3ll people', 'pur', 'winky lux',
       'milani', "burt's bees", "l'oreal", 'mac', 'au naturale',
       'clinique', 'jane iredale', 'estee lauder', 'dermablend',
       'wet n wild', 'hynt beauty', 'e.l.f. cosmetics', 'kiko milano',
       'pacifica', 'beauty bakerie', 'l.a. girl', 'revlon', 'almay',
       'juice beauty', 'physicians formula', 'morphe', 'uoma beauty',
       'elcie cosmetics', "juvia's place", 'ofra cosmetics',
       'florence by mills', 'lorac', 'the ordinary', 'colourpop',
       'flower beauty', 'vdl', 'zoeva', 'essence', 'exa', 'sm

In [11]:
#Checking for and returning duplicate rows in 'all_shades' DataFrame

eh.duplicate_rows(all_shades)

No. of entirely duplicated rows: 20


Unnamed: 0,brand,product,description,name,specific,hex
4105,armani beauty,Luminous Silk Perfect Glow Flawless Oil-Free F...,"3 very fair, golden",,3.0,#DCB79B
4106,armani beauty,Luminous Silk Perfect Glow Flawless Oil-Free F...,"3.5 light to medium, olive",,3.5,#D2AF94
4107,armani beauty,Luminous Silk Perfect Glow Flawless Oil-Free F...,"3.75 very fair, pink",,3.75,#DCB098
4111,armani beauty,Luminous Silk Perfect Glow Flawless Oil-Free F...,"4.5 light to medium, neutral",,4.5,#D7B08E
4113,armani beauty,Luminous Silk Perfect Glow Flawless Oil-Free F...,"5 light, neutral",,5.0,#C9A07F
4116,armani beauty,Luminous Silk Perfect Glow Flawless Oil-Free F...,"5.25 medium, pink",,5.25,#CFA385
4118,armani beauty,Luminous Silk Perfect Glow Flawless Oil-Free F...,"5.75 light to medium, golden",,5.75,#CEA180
4144,armani beauty,Luminous Silk Perfect Glow Flawless Oil-Free F...,"3 very fair, golden",,3.0,#DCB79B
4145,armani beauty,Luminous Silk Perfect Glow Flawless Oil-Free F...,"3.5 light to medium, olive",,3.5,#D2AF94
4146,armani beauty,Luminous Silk Perfect Glow Flawless Oil-Free F...,"3.75 very fair, pink",,3.75,#DCB098


In [12]:
shade_name_dupes= all_shades.duplicated(subset=['brand','product','name'], keep=False) & all_shades['name'].notna()
shade_name_dupes = all_shades[shade_name_dupes]
print(shade_name_dupes.count())
shade_name_dupes.groupby(['brand','product']).apply(lambda group: group.sample(1))

brand          1139
product        1139
description    1139
name           1139
specific       1123
hex            1139
dtype: int64


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,brand,product,description,name,specific,hex
brand,product,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
bareminerals,I AM AN ORIGINAL GET STARTED® KIT Original Loose Mineral Foundation Customizable Set,6653,bareminerals,I AM AN ORIGINAL GET STARTED® KIT Original Loo...,Neutral Tan 21 for tan warm skin with golden u...,bareMinerals,21,#A3623E
beauty bakerie,InstaBake Aqua Glass Foundation,1758,beauty bakerie,InstaBake Aqua Glass Foundation,315 Warm,Warm,315,#A87446
beautyblender,Bounce™ Liquid Whip Long Wear Foundation,4297,beautyblender,Bounce™ Liquid Whip Long Wear Foundation,4.70 Cool very deep with cool undertones,Cool,4.70,#542911
catrice,HD Liquid Coverage Foundation,3727,catrice,HD Liquid Coverage Foundation,Caramel Beige 046,Caramel Beige,046,#B68B61
charlotte tilbury,Airbrush Flawless Longwear Foundation,4473,charlotte tilbury,Airbrush Flawless Longwear Foundation,7 Warm for medium skin with warm undertones,Warm,7,#F7CCA4
charlotte tilbury,Light Wonder Foundation,5724,charlotte tilbury,Light Wonder Foundation,5 Medium warm peach,Medium,5,#D9B198
colourpop,Pretty Fresh Foundation,3451,colourpop,Pretty Fresh Foundation,Dark 180N (neutral),Dark,180N,#A67C6B
cover fx,Custom Cover Drops,860,cover fx,Custom Cover Drops,N Medium 3 (medium deep skin w/ neutral undert...,N Medium,3,#C49777
cover fx,Natural Finish Foundation,2264,cover fx,Natural Finish Foundation,G+40 (medium olive skin w/ golden undertones),G+,40,#E3B488
cover fx,Power Play Foundation,3895,cover fx,Power Play Foundation,G+50 (medium to tan olive skin w/ golden under...,G+,50,#DA9A6C


In [13]:

specific_dupes =  all_shades.duplicated(subset=['brand','product','specific'], keep=False) & all_shades['specific'].notna()
specific_dupes = all_shades[specific_dupes]
print(specific_dupes.count())
specific_dupes.groupby(['brand','product']).apply(lambda group: group.sample(1))

brand          387
product        387
description    387
name           271
specific       387
hex            387
dtype: int64


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,brand,product,description,name,specific,hex
brand,product,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
armani beauty,Luminous Silk Perfect Glow Flawless Oil-Free Foundation,4107,armani beauty,Luminous Silk Perfect Glow Flawless Oil-Free F...,"3.75 very fair, pink",,3.75,#DCB098
bareminerals,Blemish Rescue™ Salicylic Acid Loose Powder Foundation,5780,bareminerals,Blemish Rescue™ Salicylic Acid Loose Powder Fo...,Medium Dark 5CN dark skin with cool to neutra...,Medium Dark,5CN,#9C5A37
charlotte tilbury,Airbrush Flawless Longwear Foundation,4469,charlotte tilbury,Airbrush Flawless Longwear Foundation,5.5 Warm for medium skin with warm undertones,Warm,5.5,#F5D2AF
clinique,Even Better Makeup Broad Spectrum SPF 15,1939,clinique,Even Better Makeup Broad Spectrum SPF 15,"WN 92 Toasted Almond (medium, warm-neutral und...",Toasted Almond,WN 92,#DB9467
cover fx,Custom Cover Drops,860,cover fx,Custom Cover Drops,N Medium 3 (medium deep skin w/ neutral undert...,N Medium,3,#C49777
covergirl,Full Spectrum Matte Ambition All Day Foundation,3572,covergirl,Full Spectrum Matte Ambition All Day Foundation,Deep Cool 2,Deep Cool,2,#8F6344
dior,BACKSTAGE Face & Body Foundation,4817,dior,BACKSTAGE Face & Body Foundation,5 Neutral,Neutral,5,#D9A787
dior,Dior Airflash Spray Foundation,4698,dior,Dior Airflash Spray Foundation,"3 Cool (304) Light medium skin, cool undertones",Cool,3,#C6A38D
dior,Dior Forever Matte Foundation,5339,dior,Dior Forever Matte Foundation,"3 Warm Olive light medium skin, warm olive und...",Warm Olive,3,#D5AD91
dior,Dior Forever Skin Glow Foundation,5306,dior,Dior Forever Skin Glow Foundation,"2 Cool Rosy light skin, cool pink undertones",Cool Rosy,2,#DCADA0


### Initial Observations 
**Data Types**
- `brand`, `product`,`description` and `hex` are **strings**.
- `index` is an **integer**.
All data types are properly assigned and don't need any changes

**Missing Data**
- the `specific` and `name` columns feature expected missign data as not all foundation shades have a name and a specific color code. 

**Duplicate Rows**
- The dataset contains 20 entirely duplicate rows 
- KVD Vegan duplicates by combining Letter of name to Specifc
    - removes extra Spaces in name for easier exploration 
- The ordinary combine the name and specific column the name column is incorrect
- Cover Girl duplicate combine specific and name column, drop specific column as it's incorrect and doesn't reflect the specific shade numbers for the product
- Charlotte Tilbury combine name and specifc column
- Laura Mercier Drop duplicates, and set correct shade names 
- Dior Combine Name and Specific column, correct specific column ex: Cr 4 for cool rosy 4
- Tarte - Drop Duplicates
- Bare Minerals, update sepcific to correct numbers and separate the numbers from the name column.
- Kosas(Tone), Smashbox, SurratBeauty, bareminerals, Kevyn Aucoin change name to NAN
- Sephora combine specific and name 
**Uniqueness of Data**
- [ The brand column has a 93 percent of similarity, this is expected as brands have multiple products]


---

# 3

## Data Preprocessing

Here you can add sections like:

- Renaming columns
- Drop Redundant Columns
- Changing Data Types
- Dropping Duplicates
- Handling Missing Values
- Handling Unreasonable Data Ranges
- Feature Engineering / Transformation

Use `assert` where possible to show that preprocessing is done.

### Fix Data

### Data Accuracy

`Name Column`

In [14]:
# Identifying and replacing inappropriately filled shade name values with NaN
#listing names that will have  shade name converted to nan
target_brands= ['kosas', 'kevyn aucoin', 'bareminerals','smashbox','surratbeauty']
all_shades.loc[all_shades['brand'].isin(target_brands), 'name'] = np.nan
#Verifying changes 
all_shades[all_shades['brand'].isin(target_brands)].sample(5)


Unnamed: 0,brand,product,description,name,specific,hex
5148,bareminerals,Matte Loose Powder Mineral Foundation SPF 15,Tan Nude 17 for medium-tan warm skin with gold...,,17,#A46E51
6450,kevyn aucoin,The Etherealist Foundation,Deep EF 13 deep with warm undertones,,13,#AF815C
5133,bareminerals,Matte Loose Powder Mineral Foundation SPF 15,Fair Ivory 02 for very fair neutral skin with ...,,2,#D09B83
1243,bareminerals,BAREPRO Performance Wear Powder Foundation,Sable 21 (for tan warm skin w/ golden undertones),,21,#C4885C
2395,bareminerals,BAREPRO Performance Wear Liquid Foundation Bro...,Warm Natural 12 (for light warm skin w/ yellow...,,12,#DD9F7A


In [15]:
#Filtering for 'kvd' brand
kvd = all_shades['brand']=='kvd vegan beauty'
all_shades.loc[kvd,'name'] = all_shades.loc[kvd,'name'].apply(lambda x: ' '.join(x.split()))


display(all_shades[all_shades['brand'] == 'kvd vegan beauty'])



Unnamed: 0,brand,product,description,name,specific,hex
180,kvd vegan beauty,Lock-It Foundation,Medium 54 N (medium sesame with neutral undert...,Medium N,54,#D19264
181,kvd vegan beauty,Lock-It Foundation,Light 40 N (near-white for mixing & artistry l...,Light N,40,#F9D3B9
182,kvd vegan beauty,Lock-It Foundation,Light 41 N (fair ivory with neutral undertone),Light N,41,#F1CCB6
183,kvd vegan beauty,Lock-It Foundation,Light 42 N (fair porcelain with neutral undert...,Light N,42,#FBCCB2
184,kvd vegan beauty,Lock-It Foundation,Light 42 W (fair porcelain with warm undertone),Light W,42,#F8CEA3
...,...,...,...,...,...,...
311,kvd vegan beauty,True Portrait Medium Coverage Shake Foundation,Deep 090 (deep bronze with warm undertone),Deep,090,#5E3C2C
312,kvd vegan beauty,True Portrait Medium Coverage Shake Foundation,Deep 092 (deep golden with neutral undertone),Deep,092,#624432
313,kvd vegan beauty,True Portrait Medium Coverage Shake Foundation,Deep 094 (rich-deep bronze with neutral undert...,Deep,094,#5D392C
314,kvd vegan beauty,True Portrait Medium Coverage Shake Foundation,Deep 096 (rich-deep golden with neutral undert...,Deep,096,#4F352B


In [16]:
sephora = all_shades[all_shades['product']=='MicroSmooth Baked Powder Foundation']
all_shades.loc[sephora.index, 'name'] = all_shades['description'].str.split().str[1:3].str.join(' ')
all_shades.loc[sephora.index, 'name']= all_shades['name'].str.replace('-', '').str.strip()

all_shades[all_shades['product']=='MicroSmooth Baked Powder Foundation']

Unnamed: 0,brand,product,description,name,specific,hex
5401,sephora collection,MicroSmooth Baked Powder Foundation,05 Porcelain,Porcelain,5,#ECD6C7
5402,sephora collection,MicroSmooth Baked Powder Foundation,15 Nude,Nude,15,#D9B69F
5403,sephora collection,MicroSmooth Baked Powder Foundation,25 Beige - Selected,Beige,25,#D9AC84
5404,sephora collection,MicroSmooth Baked Powder Foundation,30 Sand,Sand,30,#D5AA87
5405,sephora collection,MicroSmooth Baked Powder Foundation,35 Bronze,Bronze,35,#CFA076
5406,sephora collection,MicroSmooth Baked Powder Foundation,40 Tan,Tan,40,#C38F6A
5407,sephora collection,MicroSmooth Baked Powder Foundation,56 Mahogany,Mahogany,56,#CB9977
5408,sephora collection,MicroSmooth Baked Powder Foundation,60 Deep Ebony,Deep Ebony,60,#9E6D50


### Handeling Missing Values

In [17]:
#Filling in missing shade names for lacome brand to improve data accuracy 
lancome_missing = all_shades[(all_shades['brand']=='lancome') & all_shades['name'].isna()]
all_shades.loc[lancome_missing.index, 'name'] = lancome_missing['description'].str.split().str[1:3].str.join(' ')
all_shades[all_shades['brand']=='lancome'].sample(5)


Unnamed: 0,brand,product,description,name,specific,hex
1598,lancome,Dual Finish Multi-Tasking Lightweight Pressed ...,230 Versatile Ecru II,Versatile Ecru II,230,#F2CFA7
1608,lancome,Dual Finish Multi-Tasking Lightweight Pressed ...,410 Bisque,Bisque,410,#F0B882
319,lancome,Teint Idole Ultra Longwear Foundation Stick SP...,210 Buff N OUT OF STOCK,Buff N,210,#DEB997
5839,lancome,SKIN FEELS GOOD Skin Nourishing Foundation,01N Nude Vanilla for light skin with neutral u...,Nude Vanilla,01N,#D6AB96
5849,lancome,SKIN FEELS GOOD Skin Nourishing Foundation,12W Sunny Amber for deep skin with warm/yellow...,Sunny Amber,12W,#AF7D6F


In [18]:
givenchy_missing = all_shades[all_shades['product']=='Matissime Velvet Radiant Mattifying Fluid Foundation SPF 20']
all_shades.loc[givenchy_missing.index, 'name'] = givenchy_missing['description'].str.split().str[1:3].str.join(' ')
all_shades[all_shades['product']=='Matissime Velvet Radiant Mattifying Fluid Foundation SPF 20'].head()


Unnamed: 0,brand,product,description,name,specific,hex
6401,givenchy,Matissime Velvet Radiant Mattifying Fluid Foun...,N00 Mat Ivory light w/ neutral balance of pink...,Mat Ivory,N00,#F5D8B3
6402,givenchy,Matissime Velvet Radiant Mattifying Fluid Foun...,N01 Mat Porcelain light with neutral balance,Mat Porcelain,N01,#E0C3AD
6403,givenchy,Matissime Velvet Radiant Mattifying Fluid Foun...,N02 Mat Shell light with pink undertones,Mat Shell,N02,#E8C1B6
6404,givenchy,Matissime Velvet Radiant Mattifying Fluid Foun...,N03 Mat Sand light sand with neutral balance,Mat Sand,N03,#E5C0B0
6405,givenchy,Matissime Velvet Radiant Mattifying Fluid Foun...,N03.5 Mat Vanilla medium w/ neutral balance of...,Mat Vanilla,N03.5,#EAC39C


In [19]:
laura_products = ['Tinted Moisturizer Natural Skin Perfector Broad Spectrum SPF 30', 'Tinted Moisturizer Broad Spectrum SPF 20 - Oil Free']
laura_missing = all_shades[all_shades['product'].isin(laura_products)]
all_shades.loc[laura_missing.index, 'name'] = laura_missing['description'].str.split().str[1:2].str.join(' ')
all_shades[all_shades['product'].isin(laura_products)].sample(5)

Unnamed: 0,brand,product,description,name,specific,hex
4512,laura mercier,Tinted Moisturizer Natural Skin Perfector Broa...,4N1 Wheat olive neutral,Wheat,4N1,#E3B27E
4726,laura mercier,Tinted Moisturizer Broad Spectrum SPF 20 - Oil...,5W1 Tan medium to deep with warm to golden und...,Tan,5W1,#D19676
4506,laura mercier,Tinted Moisturizer Natural Skin Perfector Broa...,2N1 Nude light neutral,Nude,2N1,#DFB18A
4528,laura mercier,Tinted Moisturizer Natural Skin Perfector Broa...,3C1 Fawn medium cool,Fawn,3C1,#CEAF91
4530,laura mercier,Tinted Moisturizer Natural Skin Perfector Broa...,3W1 Bisque medium warm,Bisque,3W1,#D3B494


In [20]:
# Filling in missing shade names for Make Up For Ever foundation products

# Filter rows where the brand is 'make up for ever'
muf_products = all_shades[all_shades['brand'] == 'make up for ever']

# Filter rows where the product is 'Water Blend Face & Body Foundation'
water_blend = all_shades[all_shades['product']=='Water Blend Face & Body Foundation']

# Filter rows where the product is in the Matte Velvet product list
mat_velvet = ['Matte Velvet Skin Blurring Powder Foundation', 'Matte Velvet Skin Full Coverage Foundation']
mat_velvet = all_shades[all_shades['product'].isin(mat_velvet)]

# Filter rows where the product is in the Ultra product list
ultra = ['Reboot Active Care Revitalizing Foundation', 'Ultra HD Invisible Cover Foundation', 'Ultra HD Invisible Cover Stick Foundation']
ultra = all_shades[all_shades['product'].isin(ultra)]

# Update 'name' column based on specific conditions
all_shades.loc[ultra.index,'name'] = ultra['description'].str.split().str[2:4].str.join(' ')
all_shades.loc[water_blend.index,'name'] = water_blend['description'].str.split().str[1:2].str.join('')
all_shades.loc[mat_velvet.index,'name'] = mat_velvet['description'].str.split().str[1:3].str.join(' ')
all_shades.loc[muf_products.index, 'name']= all_shades['name'].str.replace('for', '').str.strip()

all_shades[all_shades['brand']=='make up for ever'].sample(5)

Unnamed: 0,brand,product,description,name,specific,hex
4418,make up for ever,Reboot Active Care Revitalizing Foundation,Y445 - Amber for darker tan skin with golden u...,Amber,Y445,#C28B67
4417,make up for ever,Reboot Active Care Revitalizing Foundation,Y434 - Golden Caramel for lighter tan skin wit...,Golden Caramel,Y434,#C68D5E
4412,make up for ever,Reboot Active Care Revitalizing Foundation,Y355 - Neutral Beige for medium skin with neut...,Neutral Beige,Y355,#CDA98A
4423,make up for ever,Reboot Active Care Revitalizing Foundation,R540 - Dark Brown for deep skin with yellow un...,Dark Brown,R540,#72493A
4429,make up for ever,Matte Velvet Skin Blurring Powder Foundation,Y225 Marble for light skin with golden undertones,Marble,Y225,#FFDDC4


In [21]:
ysl = all_shades[all_shades['product']== 'Touche Eclat All-In-One Glow Foundation']
all_shades.loc[ysl.index,'name'] = ysl['description'].str.split().str[1:3].str.join(' ')
all_shades.loc[ysl.index, 'name']= all_shades['name'].str.replace('for', '').str.strip()
all_shades[all_shades['product']=='Touche Eclat All-In-One Glow Foundation'].head()

Unnamed: 0,brand,product,description,name,specific,hex
5960,yves saint laurent,Touche Eclat All-In-One Glow Foundation,B10 Porcelain for very fair to fair skin with...,Porcelain,B10,#EEC3A1
5961,yves saint laurent,Touche Eclat All-In-One Glow Foundation,B20 Ivory for fair to light skin,Ivory,B20,#E8B392
5962,yves saint laurent,Touche Eclat All-In-One Glow Foundation,B30 Almond for light skin with neutral to gold...,Almond,B30,#E2AD85
5963,yves saint laurent,Touche Eclat All-In-One Glow Foundation,BR30 Cool Almond for light skin with cool to ...,Cool Almond,BR30,#E6C1B7
5964,yves saint laurent,Touche Eclat All-In-One Glow Foundation,B40 Sand for light to medium skin with neutral...,Sand,B40,#E3AE8C


In [22]:
bobbi_products = ['Skin Long-Wear Weightless Foundation SPF 15','Skin Foundation Stick']
bobbi = all_shades[all_shades['product'].isin((bobbi_products))]
all_shades.loc[bobbi.index, 'specific'] = bobbi['description'].str.findall(r'\(([^)]+)\)').str.join('')
all_shades[all_shades['product'].isin(bobbi_products)].sample(5)


Unnamed: 0,brand,product,description,name,specific,hex
5038,bobbi brown,Skin Long-Wear Weightless Foundation SPF 15,Warm Walnut (W-096) deep brown with yellow und...,Warm Walnut,W-096,#A16247
5021,bobbi brown,Skin Long-Wear Weightless Foundation SPF 15,Warm Natural (W-056) medium beige with yellow ...,Warm Natural,W-056,#CEA26E
5035,bobbi brown,Skin Long-Wear Weightless Foundation SPF 15,Golden Almond (W-088) dark brown with golden u...,Golden Almond,W-088,#AD6B3F
5028,bobbi brown,Skin Long-Wear Weightless Foundation SPF 15,Golden Honey (W-068) medium dark beige with go...,Golden Honey,W-068,#CE976A
5025,bobbi brown,Skin Long-Wear Weightless Foundation SPF 15,Neutral Honey (N-060) medium dark beige with a...,Neutral Honey,N-060,#D3A58E


In [23]:
nars = all_shades[all_shades['product'] == 'Soft Matte Complete Foundation']
nars_names = ['Santa Fe', 'Mont Blanc','New Caledonia']
nars_names = all_shades[all_shades['name'].isin(nars_names)]
all_shades.loc[nars.index,'specific'] = nars['description'].str.split().str[1:2].str.join('')
all_shades.loc[nars_names.index,'specific'] = nars['description'].str.split().str[2:3].str.join('')

all_shades[all_shades['product'] == 'Soft Matte Complete Foundation'].sample(5)

Unnamed: 0,brand,product,description,name,specific,hex
4200,nars,Soft Matte Complete Foundation,Fiji L5 - for light to medium skin with neutra...,Fiji,L5,#EDC192
4203,nars,Soft Matte Complete Foundation,Salzburg L3.5 - for light skin with neutral un...,Salzburg,L3.5,#EEC29F
4207,nars,Soft Matte Complete Foundation,Oslo L1 - for fair skin with pink undertones,Oslo,L1,#F7D4C4
4199,nars,Soft Matte Complete Foundation,Punjab M1 - for medium skin with yellow undert...,Punjab,M1,#E4B78E
4179,nars,Soft Matte Complete Foundation,New Caledonia D2 - for dark skin with yellow u...,New Caledonia,D2,#795036


### Handeling Duplicate Values

In [24]:
# # Drop entirely duplicated rows
all_shades.drop_duplicates(inplace=True, ignore_index=True)
# # Verifying rows dropped
assert all_shades.duplicated().sum()==0, "Removing duplicates failed: duplicate values still exist in the DataFrame"


In [25]:
sephora_dupes = all_shades.duplicated(subset=['brand','product','name'], keep=False) & all_shades['name'].notna()
sephora_dupes = all_shades[sephora_dupes]
sephora_dupes = sephora_dupes[sephora_dupes['brand']=='sephora collection']
all_shades.loc[sephora_dupes.index, 'name'] = sephora_dupes['name'] + ' ' + sephora_dupes['specific']

#Verifying changes 
all_shades[all_shades['brand'] == 'sephora collection']

Unnamed: 0,brand,product,description,name,specific,hex
4583,sephora collection,10 Hour Wear Perfection Foundation,68 Brownie for deep skin with neutral undertones,Brownie,68,#7B5E5B
4584,sephora collection,10 Hour Wear Perfection Foundation,67 Expresso (N) for deep skin with neutral und...,Expresso,67,#725647
4585,sephora collection,10 Hour Wear Perfection Foundation,66.5 Light Espresso for deep skin with neutral...,Light Espresso,66.5,#8D665C
4586,sephora collection,10 Hour Wear Perfection Foundation,66 Sandalwood (Y) for deep skin with yellow un...,Sandalwood,66,#745947
4587,sephora collection,10 Hour Wear Perfection Foundation,65.5 Chestnut for deep skin with neutral under...,Chestnut,65.5,#87625F
...,...,...,...,...,...,...
6796,sephora collection,Best Skin Ever Liquid Foundation,65 N for deep skin with neutral undertones,N 65,65,#805C3A
6797,sephora collection,Best Skin Ever Liquid Foundation,65.5 P for deep skin with pink undertones,P 65.5,65.5,#715035
6798,sephora collection,Best Skin Ever Liquid Foundation,66.5 N for deep skin with neutral undertones,N 66.5,66.5,#866645
6799,sephora collection,Best Skin Ever Liquid Foundation,67 P for deep skin with pink undertones,P 67,67,#7B5739


In [26]:
pat_dupes = all_shades.duplicated(subset=['brand','product','name'], keep=False) & all_shades['name'].notna()
pat_dupes = all_shades[pat_dupes]
pat_dupes = pat_dupes[pat_dupes['brand']=='pat mcgrath labs']
all_shades.loc[pat_dupes.index, 'name'] = pat_dupes['name'] + ' ' + pat_dupes['specific']

#Verifying changes 
all_shades[(all_shades['brand'] == 'pat mcgrath labs') & all_shades.duplicated(subset=['brand', 'product', 'name'], keep=False) & all_shades['name'].notna()]

Unnamed: 0,brand,product,description,name,specific,hex


In [29]:
beaute_bakerie = all_shades[all_shades['brand']=='beauty bakerie']
bakerie_specific = {'Warm': 'W', 'Neutral': 'N', 'Cool': 'C', 'Warm Neutral': 'WN', 'Cool Neutral': 'CN'}

for index, row in beaute_bakerie.iterrows():
    specific_suffix = bakerie_specific.get(row['name'], row['name'][0])
    all_shades.at[index, 'specific'] = row['specific'] + specific_suffix

all_shades.loc[beaute_bakerie.index, 'name'] = np.nan

TypeError: 'float' object is not subscriptable

In [28]:
all_shades[all_shades['brand']=='beauty bakerie'].sample(5)


Unnamed: 0,brand,product,description,name,specific,hex
1778,beauty bakerie,InstaBake Aqua Glass Foundation,357 Warm Neutral OUT OF STOCK,Warm Neutral,357,#F6E6D9
1764,beauty bakerie,InstaBake Aqua Glass Foundation,329 Warm Neutral,Warm Neutral,329,#CF9C71
1761,beauty bakerie,InstaBake Aqua Glass Foundation,321 Warm,Warm,321,#BA845E
1766,beauty bakerie,InstaBake Aqua Glass Foundation,333 Neutral,Neutral,333,#CFA380
1771,beauty bakerie,InstaBake Aqua Glass Foundation,343 Neutral,Neutral,343,#E0BA9D


In [30]:
beautyblender = all_shades[all_shades['brand']=='beautyblender']
bb_specific = {'Warm': 'W', 'Neutral': 'N', 'Cool': 'C', 'Neutral/Olive': 'N/O', 'Cool/Olive': 'C/O', 'Warm/Olive': 'W/O'}
all_shades.loc[beautyblender.index, 'specific'] = beautyblender.apply(lambda row: row['specific'] + bb_specific.get(row['name'], row['name'][0]), axis=1)
all_shades.loc[beautyblender.index, 'name'] = np.nan

In [31]:
all_shades[all_shades['brand']=='beautyblender']


Unnamed: 0,brand,product,description,name,specific,hex
4252,beautyblender,Bounce™ Liquid Whip Long Wear Foundation,1.00 Cool very light with cool undertones,,1.00C,#FCF0E5
4253,beautyblender,Bounce™ Liquid Whip Long Wear Foundation,1.10 Neutral very light with neutral undertones,,1.10N,#FCEDD8
4254,beautyblender,Bounce™ Liquid Whip Long Wear Foundation,1.20 Cool light with cool undertones,,1.20C,#FBE6D6
4255,beautyblender,Bounce™ Liquid Whip Long Wear Foundation,1.30 Warm light with warm undertones,,1.30W,#F5DBC1
4256,beautyblender,Bounce™ Liquid Whip Long Wear Foundation,1.40 Neutral light with neutral undertones,,1.40N,#ECCCAF
4257,beautyblender,Bounce™ Liquid Whip Long Wear Foundation,1.50 Cool light with cool undertones,,1.50C,#F0C3A6
4258,beautyblender,Bounce™ Liquid Whip Long Wear Foundation,1.60 Warm light with warm undertones,,1.60W,#EEC5A2
4259,beautyblender,Bounce™ Liquid Whip Long Wear Foundation,2.10 Cool medium with cool undertones,,2.10C,#E3C2A8
4260,beautyblender,Bounce™ Liquid Whip Long Wear Foundation,2.20 Neutral/Olive medium with neutral olive ...,,2.20N/O,#ECC49B
4261,beautyblender,Bounce™ Liquid Whip Long Wear Foundation,2.30 Warm medium with warm undertones,,2.30W,#ECC49B


In [None]:
# # Get unique values of interested columns
# cols = []
# pd.unique(df[cols].values.ravel('k'))  # argument 'k' lists the values in the order of the cols 

In [None]:
# # Create custom function
# # Google style docstrings
# # https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html
# def custom_function(param1: int, param2: str) -> bool:
#     """Example function with PEP 484 type annotations.

#     Args:
#         param1: The first parameter.
#         param2: The second parameter.

#     Returns:
#         The return value. True for success, False otherwise.

#     """

In [None]:
# # Apply function to multiple columns
# cols = []
# df_updated = df.copy()
# df_updated[cols] = df_updated[cols].applymap(custom_function)

# # Create new aggregated boolean column
# df_updated['bool'] = df_updated[cols].any(axis=1, skipna=False)

---

# 4

## Data Analysis

Here is where your analysis begins. You can add different sections based on your project goals.

### Exploring `Column Name`

In [None]:
# example plot
df = pd.DataFrame({'values': [1, 2, 3, 4, 5],
                   'hex_colors': ['#FF0000', '#00FF00', '#0000FF', '#FFFF00', '#FF00FF']})
fig = px.histogram(df, x='values', color='hex_colors')
fig.show()


**Observations**
- Ob 1
- Ob 2
- Ob 3

---

# 5

## Conclusion

### Insights 
State the insights/outcomes of your project or notebook.

### Suggestions

Make suggestions based on insights.

### Possible Next Steps
Areas to expand on:
- (if there is any)

---

# 6

## Epilogue

### References

This is how we use inline citation[<sup id="fn1-back">[1]</sup>](#fn1).

[<span id="fn1">1.</span>](#fn1-back) _Author (date)._ Title. Available at: https://website.com (Accessed: Date). 

> Use [https://www.citethisforme.com/](https://www.citethisforme.com/) to create citations.

### Versioning
Notebook and insights by (author).
- Version: 1.5.0
- Date: 2023-05-15

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=b689e16b-e36c-4f8c-b17a-b3e876352669' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>