# Foundation Shade Diversity

## Table of Contents

1. [**Introduction**](Introduction)
    - Project Description
    - Data Description
2. [**Acquiring and Loading Data**](#2)
	- Importing Libraries and Notebook Setup
    - Loading Data
    - Basic Data Exploration
    - Areas to Fix
3. [**Data Proprocessing**](#3)
4. [**Data Analysis**](#4)
5. [**Conclusion**](#5)
    - Insights
    - Suggestions
    - Possible Next Steps
6. [**Epilogue**](#6) 
    - References
    - Versioning

## Introduction

### Project Description

**Goal/Purpose:** 

In this EDA notebook, we aim to explore the diversity of foundation shades in makeup products. We will analyze a dataset of makeup products and their corresponding shades to gain insights into the range and inclusivity of shades offered by different brands. By visualizing the data and conducting statistical analyses, we seek to provide a comprehensive understanding of foundation shade diversity in the cosmetics industry.
<p>&nbsp;</p>

**Questions to be Answered:**

1. What is the distribution of foundation shades across different brands and product lines?
    - Are some brands more diverse in their shade offerings compared to others?
2. How does foundation shade diversity vary across price points and product types?
    - Do high-end brands offer a wider range of shades compared to budget-friendly brands?
3. Is there evidence of inclusivity in foundation shades for various skin tones?
    - What is the representation of shades for fair, medium, and dark skin tones?
    - Are there gaps in shade options for specific skin tones, and if so, which brands are addressing these gaps?

 

<p>&nbsp;</p>

**Assumptions/Methodology/Scope:** 

Briefly describe assumptions, processing steps, and the scope of this project.

<p>&nbsp;</p>

### Data Description

**Content:** 

This dataset is a _(filetype) file of _(how many) data points which contains ___. 

<p>&nbsp;</p>

**Description of Attributes:** 



| Column  | Description |
| :------ | :---------- |
| column1 | description1 |

<p>&nbsp;</p>

**Acknowledgements:** 

This dataset is provided by _(. The original dataset was scraped by _) and the original source can be found on [website](https://website.link).

---

# 2

## Acquiring and Loading Data
### Importing Libraries and Notebook Setup

In [1]:
# Data manipulation
import numpy as np
import pandas as pd

# Visualizations
import plotly.express as px


### Loading Data

In [6]:
# Loading All Shades Dataframe
all_shades = pd.read_csv('datsets/allShades.csv')

FileNotFoundError: [Errno 2] No such file or directory: 'datsets/allShades.csv'

### Basic Data Exploration

In [4]:
# # Show rows and columns count
# print(f"Rows count: {df.shape[0]}\nColumns count: {df.shape[1]}")

In [5]:
# df.head()

In [6]:
# df.tail()

#### Check Data Types

In [7]:
# # Show data types
# df.info()

- `column1`, `column2`, `column3` are **strings**.
- `column4` and `column5` are **floats**.
- `column6` is an **integer**.

`column3` should be a **datetime** type instead.

#### Check Missing Data

In [8]:
# # Print percentage of missing values
# missing_percent = df.isna().mean().sort_values(ascending=False)
# print('---- Percentage of Missing Values (%) -----')
# if missing_percent.sum():
#     print(missing_percent[missing_percent > 0] * 100)
# else:
#     print('None')

#### Check for Duplicate Rows

In [9]:
# # Show number of duplicated rows
# print(f"No. of entirely duplicated rows: {df.duplicated().sum()}")

# # Show duplicated rows
# df[df.duplicated()]

#### Check Uniqueness of Data

In [10]:
# # Print the percentage similarity of values (the lower %, the better)
# num_unique = df.nunique().sort_values()
# print('---- Percentage Similarity of Values (%) -----')
# print(100/num_unique)

#### Check Data Range

In [11]:
# # Print summary statistics
# df.describe(include='all')
# skim(df)

### Areas to Fix
**Data Types**
- [ ] Issue 1

**Missing Data**
- [ ] 

**Duplicate Rows**
- [ ]

**Uniqueness of Data**
- [ ]

**Data Range**
- [ ]

---

# 3

## Data Preprocessing

Here you can add sections like:

- Renaming columns
- Drop Redundant Columns
- Changing Data Types
- Dropping Duplicates
- Handling Missing Values
- Handling Unreasonable Data Ranges
- Feature Engineering / Transformation

Use `assert` where possible to show that preprocessing is done.

### Rename Columns

In [12]:
# # Rename columns to snake_case
# df = clean_columns(df, replace={})

In [13]:
# # Rename columns
# columns_to_rename = {}
# df.rename(columns=columns_to_rename, inplace=True)

In [14]:
# # Verify columns are renamed
# df.columns

### Drop Redundant Columns

In [15]:
# # Check the proportion of the most frequent value in each column
# print('---- Frequency of the Mode (%) -----')
# mode_dict = {col: (df[col].value_counts().iat[0] / df[col].size * 100) for col in df.columns}
# mode_series = pd.Series(mode_dict)
# mode_series

In [16]:
# # Show the value frequency of each column greater than the mode's threshold
# threshold = 80
# for col in mode_series[mode_series > threshold].index:
#     print(df[col].value_counts(dropna=False))
#     print()

In [17]:
# # Drop columns (specify columns to drop)
# cols_to_drop = []
# df.drop(columns=cols_to_drop, axis=1, inplace=True)

In [18]:
# # Verify columns dropped
# assert all(col not in df.columns for col in cols_to_drop)

In [19]:
# # Drop columns (specify column indices to drop)
# df.drop(df.columns[a:b], axis=1, inplace=True)

In [20]:
# # Verify columns dropped
# assert all(col not in df.columns for col in df.columns[a:b])

In [21]:
# # Drop columns (specify columns to keep)
# cols_to_keep = []
# df = df[cols_to_keep]

In [22]:
# # Verify columns dropped
# assert all(col in df.columns for col in cols_to_keep)

### Changing Data Types

In [23]:
# # Convert columns to the right data types
# df[col] = df[col].astype('string')
# df[col] = df[col].astype('int')
# df[col] = pd.to_datetime(df[col], infer_datetime_format=True)

# # Convert to categorical datatype
# col_cat = ptypes.CategoricalDtype(categories=['A', 'B', 'C'], ordered=True)
# df['col_cat'] = df['col_cat'].astype(col_cat)

In [24]:
# # Verify conversion
# assert ptypes.is_string_dtype(df[col])
# assert ptypes.is_numeric_dtype(df[col])
# cols_to_check = []
# assert all(ptypes.is_datetime64_any_dtype(df[col]) for col in cols_to_check)

### Dropping Duplicates

In [25]:
# # Drop entirely duplicated rows
# df.drop_duplicates(inplace=True, ignore_index=True)

In [26]:
# # Verify rows dropped
# assert df.duplicated().sum()==0

### Handling Missing Values

### Handling Unreasonable Data Ranges

In [27]:
# # Drop affected rows
# df = df.loc[~((df['A'] == 0) | (df['B'] > 100))].reset_index()

In [28]:
# # Verify rows dropped
# len(df)

### Feature Engineering / Transformation

In [29]:
# # Get unique values of interested columns
# cols = []
# pd.unique(df[cols].values.ravel('k'))  # argument 'k' lists the values in the order of the cols 

In [30]:
# # Create custom function
# # Google style docstrings
# # https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html
# def custom_function(param1: int, param2: str) -> bool:
#     """Example function with PEP 484 type annotations.

#     Args:
#         param1: The first parameter.
#         param2: The second parameter.

#     Returns:
#         The return value. True for success, False otherwise.

#     """

In [31]:
# # Apply function to multiple columns
# cols = []
# df_updated = df.copy()
# df_updated[cols] = df_updated[cols].applymap(custom_function)

# # Create new aggregated boolean column
# df_updated['bool'] = df_updated[cols].any(axis=1, skipna=False)

---

# 4

## Data Analysis

Here is where your analysis begins. You can add different sections based on your project goals.

### Exploring `Column Name`

In [32]:
# Code and visualization

**Observations**
- Ob 1
- Ob 2
- Ob 3

---

# 5

## Conclusion

### Insights 
State the insights/outcomes of your project or notebook.

### Suggestions

Make suggestions based on insights.

### Possible Next Steps
Areas to expand on:
- (if there is any)

---

# 6

## Epilogue

### References

This is how we use inline citation[<sup id="fn1-back">[1]</sup>](#fn1).

[<span id="fn1">1.</span>](#fn1-back) _Author (date)._ Title. Available at: https://website.com (Accessed: Date). 

> Use [https://www.citethisforme.com/](https://www.citethisforme.com/) to create citations.

### Versioning
Notebook and insights by (author).
- Version: 1.5.0
- Date: 2023-05-15

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=b689e16b-e36c-4f8c-b17a-b3e876352669' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>