# Foundation Shade Diversity

## Table of Contents

1. [**Introduction**](Introduction)
    - Project Description
    - Data Description
2. [**Acquiring and Loading Data**](#2)
	- Importing Libraries and Notebook Setup
    - Loading Data
    - Basic Data Exploration
    - Areas to Fix
3. [**Data Proprocessing**](#3)
4. [**Data Analysis**](#4)
5. [**Conclusion**](#5)
    - Insights
    - Suggestions
    - Possible Next Steps
6. [**Epilogue**](#6) 
    - References
    - Versioning

## Introduction

### Project Description

**Goal/Purpose:** 

In this EDA notebook, we aim to explore the diversity of foundation shades in makeup products. We will analyze a dataset of makeup products and their corresponding shades to gain insights into the range and inclusivity of shades offered by different brands. By visualizing the data and conducting statistical analyses, we seek to provide a comprehensive understanding of foundation shade diversity in the cosmetics industry.
<p>&nbsp;</p>

**Questions to be Answered:**

1. What is the distribution of foundation shades across different brands and product lines?
    - Are some brands more diverse in their shade offerings compared to others?
2. How does foundation shade diversity vary across price points and product types?
    - Do high-end brands offer a wider range of shades compared to budget-friendly brands?
3. Is there evidence of inclusivity in foundation shades for various skin tones?
    - What is the representation of shades for fair, medium, and dark skin tones?
    - Are there gaps in shade options for specific skin tones, and if so, which brands are addressing these gaps?

 

<p>&nbsp;</p>

**Assumptions/Methodology/Scope:** 

Briefly describe assumptions, processing steps, and the scope of this project.

<p>&nbsp;</p>

### Data Description

**Content:** 

This dataset is a _(filetype) file of _(how many) data points which contains ___. 

<p>&nbsp;</p>

**Description of Attributes:** 



| Column  | Description |
| :------ | :---------- |
| column1 | description1 |

<p>&nbsp;</p>

**Acknowledgements:** 

This dataset is provided by _(. The original dataset was scraped by _) and the original source can be found on [website](https://website.link).

---

## Acquiring and Loading Data
### Importing Libraries and Notebook Setup

In [1]:
# Data manipulation
import numpy as np
import pandas as pd

# Visualizations
import plotly.express as px


### Loading Data

In [12]:
# Loading All Shades Dataframe
all_shades = pd.read_csv('../datasets/allShades.csv')

## Basic Data Exploration

### All Shades

#### Check Data Types

In [18]:
# Retrieving general info and sample of 'all_shades' DataFrame
all_shades.info()
all_shades.sample(10)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6816 entries, 0 to 6815
Data columns (total 14 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   index        6816 non-null   int64  
 1   brand        6816 non-null   object 
 2   product      6816 non-null   object 
 3   url          6816 non-null   object 
 4   description  6816 non-null   object 
 5   imgSrc       6816 non-null   object 
 6   imgAlt       6816 non-null   object 
 7   name         4955 non-null   object 
 8   specific     4905 non-null   object 
 9   colorspace   6816 non-null   object 
 10  hex          6816 non-null   object 
 11  hue          6816 non-null   float64
 12  sat          6816 non-null   float64
 13  lightness    6816 non-null   float64
dtypes: float64(3), int64(1), object(10)
memory usage: 745.6+ KB


Unnamed: 0,index,brand,product,url,description,imgSrc,imgAlt,name,specific,colorspace,hex,hue,sat,lightness
3624,3624,NYX Professional Makeup,Born To Glow Naturally Radiant Foundation,https://www.ulta.com/born-glow-naturally-radia...,Mahogany (medium deep w/ neutral undertone),https://images.ulta.com/is/image/Ulta/2546612s...,Mahogany (medium deep w/ neutral undertone),Mahogany,,RGB,#A16948,22.247191,0.381974,0.456863
4132,4132,Armani Beauty,Luminous Silk Perfect Glow Flawless Oil-Free F...,https://www.sephora.com/product/luminous-silk-...,"9 tan to deep, olive",/productimages/sku/s1359694+sw.jpg,9 1 oz/ 30 mL,,9,RGB,#C19370,25.925926,0.395122,0.598039
2710,2710,L'Oréal,Infallible Fresh Wear 24HR Foundation,https://www.ulta.com/infallible-fresh-wear-24h...,Ivory Buff,https://images.ulta.com/is/image/Ulta/2538177s...,Ivory Buff,Ivory Buff,,RGB,#E8C29F,28.767123,0.613445,0.766667
2732,2732,L'Oréal,Infallible Fresh Wear 24HR Foundation,https://www.ulta.com/infallible-fresh-wear-24h...,Maple,https://images.ulta.com/is/image/Ulta/2558065s...,Maple,Maple,,RGB,#AF7041,25.636364,0.458333,0.470588
4863,4863,Tarte,SEA Water Foundation Broad Spectrum SPF 15,https://www.sephora.com/product/sea-water-foun...,48G tan-deep golden,/productimages/sku/s2349439+sw.jpg,48G 1 oz/ 30 mL Clean at Sephora,,48G,RGB,#AE7C61,21.038961,0.322176,0.531373
6613,6613,Natasha Denona,Foundation X+,https://www.sephora.com/product/foundation-x-P...,92 WNDeep - Neutral warm neutral undertone,/productimages/sku/s2213288+sw.jpg,92 WNDeep - Neutral 1.01 oz/ 30 mL,WNDeep,92,RGB,#C69981,20.869565,0.377049,0.641176
5473,5473,Bobbi Brown,Skin Foundation Stick,https://www.sephora.com/product/foundation-sti...,Warm Walnut (W-096) deep brown with yellow und...,/productimages/sku/s1582261+sw.jpg,Warm Walnut (W-096) 0.31 oz/ 9 g,Warm Walnut,,RGB,#B27845,28.073394,0.441296,0.484314
2598,2598,NYX Professional Makeup,Can't Stop Won't Stop Foundation,https://www.ulta.com/cant-stop-wont-stop-found...,Neutral Tan (medium light w/ neutral undertone),https://images.ulta.com/is/image/Ulta/2531989s...,Neutral Tan (medium light w/ neutral undertone),Neutral Tan,,RGB,#9E7654,27.567568,0.305785,0.47451
5238,5238,MAKE UP FOR EVER,Ultra HD Invisible Cover Stick Foundation,https://www.sephora.com/product/ultra-hd-invis...,Y245 - Soft Sand for light skin with pink unde...,/productimages/sku/s1713171+sw.jpg,Y245 - Soft Sand 0.44 oz/ 12.5 g,,Y245,RGB,#D5A485,23.25,0.487805,0.678431
443,443,NARS,Sheer Glow Foundation,https://www.ulta.com/sheer-glow-foundation?pro...,Tahoe (medium-deep skin w/ golden undertones),https://images.ulta.com/is/image/Ulta/2505583s...,Tahoe (medium-deep skin w/ golden undertones),Tahoe,,RGB,#DF934C,28.979592,0.696682,0.586275


##### Comments
- `brand`, `product`, `url`,`description` are **strings**.
- `hue`, `sat` and `lightness` are **floats**.
- `index` is an **integer**.


#### Check Missing Data

In [21]:
# # Print percentage of missing values
missing_percent_shade = all_shades.isna().mean().sort_values(ascending=False)
# print('---- Percentage of Missing Values (%) -----')
if missing_percent_shade.sum():
    print(missing_percent_shade[missing_percent_shade > 0] * 100)
else:
    print('None')

specific    28.036972
name        27.303404
dtype: float64


In [None]:
# Investigating missing data. 



##### Check for Duplicate Rows

In [9]:
# # Show number of duplicated rows
# print(f"No. of entirely duplicated rows: {df.duplicated().sum()}")

# # Show duplicated rows
# df[df.duplicated()]

#### Check Uniqueness of Data

In [10]:
# # Print the percentage similarity of values (the lower %, the better)
# num_unique = df.nunique().sort_values()
# print('---- Percentage Similarity of Values (%) -----')
# print(100/num_unique)

#### Check Data Range

In [11]:
# # Print summary statistics
# df.describe(include='all')
# skim(df)

### Areas to Fix
**Data Types**
- [ ] Issue 1

**Missing Data**
- [ ] 

**Duplicate Rows**
- [ ]

**Uniqueness of Data**
- [ ]

**Data Range**
- [ ]

---

# 3

## Data Preprocessing

Here you can add sections like:

- Renaming columns
- Drop Redundant Columns
- Changing Data Types
- Dropping Duplicates
- Handling Missing Values
- Handling Unreasonable Data Ranges
- Feature Engineering / Transformation

Use `assert` where possible to show that preprocessing is done.

### Rename Columns

In [12]:
# # Rename columns to snake_case
# df = clean_columns(df, replace={})

In [13]:
# # Rename columns
# columns_to_rename = {}
# df.rename(columns=columns_to_rename, inplace=True)

In [14]:
# # Verify columns are renamed
# df.columns

### Drop Redundant Columns

In [15]:
# # Check the proportion of the most frequent value in each column
# print('---- Frequency of the Mode (%) -----')
# mode_dict = {col: (df[col].value_counts().iat[0] / df[col].size * 100) for col in df.columns}
# mode_series = pd.Series(mode_dict)
# mode_series

In [16]:
# # Show the value frequency of each column greater than the mode's threshold
# threshold = 80
# for col in mode_series[mode_series > threshold].index:
#     print(df[col].value_counts(dropna=False))
#     print()

In [17]:
# # Drop columns (specify columns to drop)
# cols_to_drop = []
# df.drop(columns=cols_to_drop, axis=1, inplace=True)

In [18]:
# # Verify columns dropped
# assert all(col not in df.columns for col in cols_to_drop)

In [19]:
# # Drop columns (specify column indices to drop)
# df.drop(df.columns[a:b], axis=1, inplace=True)

In [20]:
# # Verify columns dropped
# assert all(col not in df.columns for col in df.columns[a:b])

In [21]:
# # Drop columns (specify columns to keep)
# cols_to_keep = []
# df = df[cols_to_keep]

In [22]:
# # Verify columns dropped
# assert all(col in df.columns for col in cols_to_keep)

### Changing Data Types

In [23]:
# # Convert columns to the right data types
# df[col] = df[col].astype('string')
# df[col] = df[col].astype('int')
# df[col] = pd.to_datetime(df[col], infer_datetime_format=True)

# # Convert to categorical datatype
# col_cat = ptypes.CategoricalDtype(categories=['A', 'B', 'C'], ordered=True)
# df['col_cat'] = df['col_cat'].astype(col_cat)

In [24]:
# # Verify conversion
# assert ptypes.is_string_dtype(df[col])
# assert ptypes.is_numeric_dtype(df[col])
# cols_to_check = []
# assert all(ptypes.is_datetime64_any_dtype(df[col]) for col in cols_to_check)

### Dropping Duplicates

In [25]:
# # Drop entirely duplicated rows
# df.drop_duplicates(inplace=True, ignore_index=True)

In [26]:
# # Verify rows dropped
# assert df.duplicated().sum()==0

### Handling Missing Values

### Handling Unreasonable Data Ranges

In [27]:
# # Drop affected rows
# df = df.loc[~((df['A'] == 0) | (df['B'] > 100))].reset_index()

In [28]:
# # Verify rows dropped
# len(df)

### Feature Engineering / Transformation

In [29]:
# # Get unique values of interested columns
# cols = []
# pd.unique(df[cols].values.ravel('k'))  # argument 'k' lists the values in the order of the cols 

In [30]:
# # Create custom function
# # Google style docstrings
# # https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html
# def custom_function(param1: int, param2: str) -> bool:
#     """Example function with PEP 484 type annotations.

#     Args:
#         param1: The first parameter.
#         param2: The second parameter.

#     Returns:
#         The return value. True for success, False otherwise.

#     """

In [31]:
# # Apply function to multiple columns
# cols = []
# df_updated = df.copy()
# df_updated[cols] = df_updated[cols].applymap(custom_function)

# # Create new aggregated boolean column
# df_updated['bool'] = df_updated[cols].any(axis=1, skipna=False)

---

# 4

## Data Analysis

Here is where your analysis begins. You can add different sections based on your project goals.

### Exploring `Column Name`

In [32]:
# Code and visualization

**Observations**
- Ob 1
- Ob 2
- Ob 3

---

# 5

## Conclusion

### Insights 
State the insights/outcomes of your project or notebook.

### Suggestions

Make suggestions based on insights.

### Possible Next Steps
Areas to expand on:
- (if there is any)

---

# 6

## Epilogue

### References

This is how we use inline citation[<sup id="fn1-back">[1]</sup>](#fn1).

[<span id="fn1">1.</span>](#fn1-back) _Author (date)._ Title. Available at: https://website.com (Accessed: Date). 

> Use [https://www.citethisforme.com/](https://www.citethisforme.com/) to create citations.

### Versioning
Notebook and insights by (author).
- Version: 1.5.0
- Date: 2023-05-15

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=b689e16b-e36c-4f8c-b17a-b3e876352669' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>