# DATA MANIPULATION

Data manipulation refers to the process of transforming and restructuring data to make it suitable for analysis or visualization. Pivot tables are a common technique used in data manipulation, especially for summarizing and aggregating data in a structured format.

#Notes:

Before you make a data manipulation, you have to check the dataset; whether the data is dirty (there are still missing values or unique data that must be cleaned) or has been clean.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

In [2]:
#load new data

df = pd.read_csv('cosmetics.csv')

In [3]:
#view the data with ascending (by default: 5 rows only)

df.head()

Unnamed: 0,Label,Brand,Name,Price,Rank,Ingredients,Combination,Dry,Normal,Oily,Sensitive
0,Moisturizer,LA MER,Crème de la Mer,175,4.1,"Algae (Seaweed) Extract, Mineral Oil, Petrolat...",1,1,1,1,1
1,Moisturizer,SK-II,Facial Treatment Essence,179,4.1,"Galactomyces Ferment Filtrate (Pitera), Butyle...",1,1,1,1,1
2,Moisturizer,DRUNK ELEPHANT,Protini™ Polypeptide Cream,68,4.4,"Water, Dicaprylyl Carbonate, Glycerin, Ceteary...",1,1,1,1,0
3,Moisturizer,LA MER,The Moisturizing Soft Cream,175,3.8,"Algae (Seaweed) Extract, Cyclopentasiloxane, P...",1,1,1,1,1
4,Moisturizer,IT COSMETICS,Your Skin But Better™ CC+™ Cream with SPF 50+,38,4.1,"Water, Snail Secretion Filtrate, Phenyl Trimet...",1,1,1,1,1


In [4]:
#info dataset

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1472 entries, 0 to 1471
Data columns (total 11 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Label        1472 non-null   object 
 1   Brand        1472 non-null   object 
 2   Name         1472 non-null   object 
 3   Price        1472 non-null   int64  
 4   Rank         1472 non-null   float64
 5   Ingredients  1472 non-null   object 
 6   Combination  1472 non-null   int64  
 7   Dry          1472 non-null   int64  
 8   Normal       1472 non-null   int64  
 9   Oily         1472 non-null   int64  
 10  Sensitive    1472 non-null   int64  
dtypes: float64(1), int64(6), object(4)
memory usage: 126.6+ KB


The dataset has been clean, so we can make the data manipulation~

## 1. Data Binning

Data binning is a sort of data preprocessing that involves dealing with missing values, formatting, normalization, and standardization. Binning can be used to convert numerical values into categorical values or to sample (quantify) numerical values.

Do binning data for the discount

In [5]:
#special sale 30% on skincare for combination or sensitive skin types

df['discount']=df['Combination']+df['Sensitive']

In [6]:
df['discount'][df['discount']>0]='30% off'
df['discount'][df['discount']==0]='normal price'

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['discount'][df['discount']>0]='30% off'
  df['discount'][df['discount']>0]='30% off'


In [7]:
#check the dataset with selected rows

df.iloc[573:583]

Unnamed: 0,Label,Brand,Name,Price,Rank,Ingredients,Combination,Dry,Normal,Oily,Sensitive,discount
573,Cleanser,CLARISONIC,Refreshing Gel Cleanser,19,5.0,"Water, Glycerin, Coco-Betaine, Sodium Cocoyl G...",1,1,1,1,1,30% off
574,Cleanser,FENTY BEAUTY BY RIHANNA,Invisimatte Blotting Paper,16,3.9,Visit the FENTY BEAUTY by Rihanna boutique,0,0,0,0,0,normal price
575,Cleanser,CLINIQUE,Facial Soap with Dish,15,4.4,"Sodium Palmate/Cocoate Or/Ou Palm Kernelate , ...",0,0,0,0,0,normal price
576,Cleanser,ALGENIST,Hydrating Essence Toner,25,4.4,"Water, Butylene Glycol, Sodium PCA, Hamamelis ...",0,0,0,0,0,normal price
577,Cleanser,REN CLEAN SKINCARE,Micro Polish Cleanser,32,4.5,Citrus Aurantium Bergamia (Bergamot) Leaf Extr...,0,0,0,0,0,normal price
578,Cleanser,DERMADOCTOR,Kakadu C™ Brightening Daily Cleanser with Vita...,36,3.9,"Aloe Barbadensis Leaf Extract, Decyl Glucoside...",0,0,0,0,0,normal price
579,Treatment,DRUNK ELEPHANT,C-Firma™ Day Serum,80,4.1,"Water, Ethoxydiglycol, Ascorbic Acid, Glycerin...",1,1,1,1,0,30% off
580,Treatment,SUNDAY RILEY,Good Genes All-In-One Lactic Acid Treatment,158,4.3,"Opuntia Tuna Fruit (Prickly Pear) Extract, Aga...",1,1,1,1,0,30% off
581,Treatment,ESTÉE LAUDER,Advanced Night Repair Synchronized Recovery Co...,98,4.3,Advanced Night Repr Sync Rec Cmp11 Division: E...,1,1,1,1,1,30% off
582,Treatment,OLEHENRIKSEN,Truth Serum®,48,4.3,"Water, Sodium Ascorbyl Phosphate, Calcium Asco...",1,1,1,1,1,30% off
