## Survey for Skincare Startup

In this project we will analyze a survey conducted to 10,000 Malaysians regarding their usage of skincare and opinions on AI technology for skincare.

In [1]:
import numpy as np
import pandas as pd

pd.options.display.max_columns = None # Display all columns

In [2]:
df = pd.read_csv('SkincareSurveyData.csv')

df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df = df.sort_values('Timestamp', ascending=True)

In [3]:
df.shape

(10000, 31)

In [4]:
df['form_id'] = ['F' + str(form).zfill(5) 
                 for form in np.arange(1, df.shape[0] + 1)]
id_col = df.pop('form_id')
df.insert(0, id_col.name, id_col)

In [5]:
df.shape

(10000, 32)

In [6]:
df

Unnamed: 0,form_id,Timestamp,Gender,Age,Race,Occupation,Do you agree that skincare is important ?,Have you ever used any skin care products?,"Which, if any, of the following statements applies to you?",Which of the following types of ingredients would make you more likely to buy a skin care product?,How do/did you choose your products?,Do you use samples before buying skincare products?,How often do you buy skincare products?,How willing are you to try different skin care products?,Where do you purchase your skin care products?,"On average, how much do you spend on skincare products each month?",I wasting to much time to find out skincare and routine that suits my skin.,I being doubtful about the information shared by influencers and brand promoted content.,I feel difficult to understand the list of ingredients on the products.,I bought expensive product but doesn't see any improvement on my skin.,I unaware of which ingredients is the best or to avoid according my skin type.,"I experienced allergies after use a new skincare products ( Eg : Rashness, acne, purging etc)",I want to reduce time to find which routine / products suits with my skin.,I want have solutions from expert in effortless and cheap way.,I want to have my personalized skincare routine that suitable with my current products.,I want to gain knowledge of skincare regime in easy and understandable way.,I want to adapt a healthy lifestyle for a glowing and healthy skin.,Do you think that technology can improve your skincare routine?,Do you have heard about AI (Artificial Intelligent)?,"After you know about AI, do you want to have a skin scanning app that can customize skincare regime?",Do you feel excited to use this skincare application?,Do share your skincare goals and motivation with us!
89,F00001,2022-03-22 01:15:00,Female,19,JAVANESSE,Consultant,3,Yes,I struggle with acne and breakout;I have an un...,Natural ingredients;Alcohol-free ingredients;P...,Dermatologist's advice,Yes,No,3,Internet / online;Mall / Department Store,RM 150 - RM 200,4,2,4,3,4,3,1,5,1,3,5,No,Yes,No,3,I just want healthy skin and confident with my...
4032,F00002,2022-03-22 02:33:00,Male,18,JAVANESSE,Own,3,No,I have pigmentation;I have redness and sensiti...,I have brown spots from sun damages,Friend/Relative recommendation;Product's ingre...,No,Yes,1,Internet / online;Shoppee Official Store,> RM 200,1,5,2,3,3,2,1,5,1,5,3,No,No,No,3,To address skin dullness and lack of radiance
5861,F00003,2022-03-22 02:34:00,Female,33,Iban,Engineer,3,Yes,I have pigmentation;I have a combination skin ...,Natural ingredients;Noncomedogenic ingredients...,Observation and review from the other users,No,Never,5,Internet / online;Pharmacy,Less than RM 50,2,3,3,3,5,2,5,1,3,1,1,No,Yes,Yes,2,To have a healthy skin without spending too mu...
2305,F00004,2022-03-22 03:07:00,Female,21,Indian,Software Engineer,4,Yes,I have oily skin;I suffer from redness and sen...,Noncomedogenic ingredients (ingredients that d...,Friend/Relative recommendation;Brand;Dermatolo...,No,Never,4,Internet / online;Specialized beauty shops,RM 100 - RM 150,5,2,4,3,2,1,1,4,3,3,3,No,No,No,4,To achieve healthier and clearer skin
7388,F00005,2022-03-22 03:10:00,Female,31,Arab,Student,4,Yes,I have dry and dull skin;I have sensitive skin...,Natural ingredients;Noncomedogenic ingredients...,Brand;Pricing;Product's ingredients,No,Sometimes,4,Mall / Department Store,RM 200 - RM 300,3,3,2,3,2,2,4,3,4,3,5,Yes,Yes,Yes,5,To keep my skin hydrated and healthy-looking
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2593,F09996,2022-11-29 21:22:00,Female,34,Sudanese,Manager,3,Yes,I struggle with acne and breakout;I have dry a...,I have brown spots from sun damages,Friend/Relative recommendation;Dermatologist's...,No,Often,2,Mall/Department Store,RM 100 - RM 150,1,5,4,3,2,2,3,3,2,3,3,Yes,No,No,5,To achieve a flawless makeup base through skin...
6015,F09997,2022-11-29 21:30:00,Male,20,Bumiputra Sabah (Bajau),Manager,2,No,I have dry and dull skin;I have an uneven skin...,Natural ingredients;Noncomedogenic ingredients...,Friend/Relative recommendation;Brand;Pricing;P...,Yes,Yes,4,Internet / online;Shoppee Official Store,> RM 200,5,3,3,2,2,1,5,4,3,3,1,Yes,No,No,5,To address specific concerns like dark circles...
1079,F09998,2022-11-29 22:08:00,Female,19,Bumiputra Sabah (Bajau),Designer,3,Yes,I have pigmentation; I suffer from redness and...,Noncomedogenic ingredients (ingredients that d...,Friend/Relative recommendation;Brand;Pricing;P...,No,No,1,Internet / online;Pharmacy;Specialty beauty store,RM 50 - RM 100,4,4,5,4,1,3,2,4,2,3,5,No,Yes,Yes,2,To minimize the appearance of fine lines and w...
6922,F09999,2022-11-29 22:49:00,Male,23,Arabian,Student,Yes,Yes,I suffer from redness and sensitivity,Natural ingredients;Cruelty-free ingredients;V...,Pricing;Product's ingredients,Yes,Very often,3,Internet / online;Pharmacy;Specialized beauty ...,> RM 200,5,1,2,5,4,4,5,5,1,4,2,No,No,No,3,To achieve a glowing complexion and reduce ble...


In [7]:
## Checking for data anomalies
df.loc[
    (df['Have you ever used any skin care products?'] == 'Yes') &
    (df['How often do you buy skincare products?'].isin(['No', 'Never']))
]

Unnamed: 0,form_id,Timestamp,Gender,Age,Race,Occupation,Do you agree that skincare is important ?,Have you ever used any skin care products?,"Which, if any, of the following statements applies to you?",Which of the following types of ingredients would make you more likely to buy a skin care product?,How do/did you choose your products?,Do you use samples before buying skincare products?,How often do you buy skincare products?,How willing are you to try different skin care products?,Where do you purchase your skin care products?,"On average, how much do you spend on skincare products each month?",I wasting to much time to find out skincare and routine that suits my skin.,I being doubtful about the information shared by influencers and brand promoted content.,I feel difficult to understand the list of ingredients on the products.,I bought expensive product but doesn't see any improvement on my skin.,I unaware of which ingredients is the best or to avoid according my skin type.,"I experienced allergies after use a new skincare products ( Eg : Rashness, acne, purging etc)",I want to reduce time to find which routine / products suits with my skin.,I want have solutions from expert in effortless and cheap way.,I want to have my personalized skincare routine that suitable with my current products.,I want to gain knowledge of skincare regime in easy and understandable way.,I want to adapt a healthy lifestyle for a glowing and healthy skin.,Do you think that technology can improve your skincare routine?,Do you have heard about AI (Artificial Intelligent)?,"After you know about AI, do you want to have a skin scanning app that can customize skincare regime?",Do you feel excited to use this skincare application?,Do share your skincare goals and motivation with us!
89,F00001,2022-03-22 01:15:00,Female,19,JAVANESSE,Consultant,3,Yes,I struggle with acne and breakout;I have an un...,Natural ingredients;Alcohol-free ingredients;P...,Dermatologist's advice,Yes,No,3,Internet / online;Mall / Department Store,RM 150 - RM 200,4,2,4,3,4,3,1,5,1,3,5,No,Yes,No,3,I just want healthy skin and confident with my...
5861,F00003,2022-03-22 02:34:00,Female,33,Iban,Engineer,3,Yes,I have pigmentation;I have a combination skin ...,Natural ingredients;Noncomedogenic ingredients...,Observation and review from the other users,No,Never,5,Internet / online;Pharmacy,Less than RM 50,2,3,3,3,5,2,5,1,3,1,1,No,Yes,Yes,2,To have a healthy skin without spending too mu...
2305,F00004,2022-03-22 03:07:00,Female,21,Indian,Software Engineer,4,Yes,I have oily skin;I suffer from redness and sen...,Noncomedogenic ingredients (ingredients that d...,Friend/Relative recommendation;Brand;Dermatolo...,No,Never,4,Internet / online;Specialized beauty shops,RM 100 - RM 150,5,2,4,3,2,1,1,4,3,3,3,No,No,No,4,To achieve healthier and clearer skin
1064,F00014,2022-03-22 09:56:00,Female,> 35,korean,Consultant,3,Yes,I have oily skin;I suffer from redness and sen...,I have brown spots from sun damages,Dermatologist's advice;Product's ingredients,No,No,4,Mall / Department Store;Online,RM 150 - RM 200,3,2,3,2,4,3,5,5,1,3,5,Yes,No,No,4,beauty is pain.
280,F00016,2022-03-22 10:28:00,Male,20-25,Indian,Doctor,4,Yes,I have an uneven skin tone;I have brown spots ...,Natural ingredients;Alcohol-free ingredients;P...,Brand;Dermatologist's advice;Pricing,Yes,Never,3,Internet / online;Mall / Department Store;Phar...,RM 100 - RM 150,5,1,1,2,3,5,3,5,3,5,1,Yes,No,No,4,To achieve healthier and clearer skin
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5386,F09951,2022-11-28 15:17:00,Male,18,Korean,Consultant,3,Yes,I have oily skin;I suffer from redness and sen...,Noncomedogenic ingredients (ingredients that d...,Brand;Pricing,Yes,No,1,Internet / online;Mall / Department Store;Spec...,Less than RM 20,3,5,5,4,5,4,2,3,4,2,1,Yes,Yes,Yes,5,To achieve clearer and healthier skin
7021,F09957,2022-11-28 17:54:00,Male,35-40,Arab,Software Engineer,4,Yes,I have an uneven skin tone;I have brown spots ...,Natural ingredients;Noncomedogenic ingredients...,Brand;Dermatologist's advice;Pricing;Product's...,Yes,No,2,Internet/online;Mall/Department Store;Pharmacy,RM 20 - RM 50,1,4,2,5,4,3,4,4,4,4,5,Yes,Yes,No,5,To improve overall skin health and vitality
5476,F09961,2022-11-28 19:54:00,Female,> 35,Libya,Student,5,Yes,I have dry and dull skin;I have an uneven skin...,Natural ingredients;Cruelty-free ingredients;V...,Friend/Relative recommendation;Pricing;Adverti...,Yes,No,2,Internet / online,RM 100 - RM 150,1,2,5,4,4,3,5,5,3,5,3,Yes,No,Yes,1,To get rid of my acne scars
5159,F09974,2022-11-29 06:36:00,Male,35-40,Malay,Teacher,5,Yes,I have oily skin;blackheads and whiteheads due...,Alcohol-free ingredients,Pricing;Product's ingredients,No,Never,2,Internet / online;Mall / Department Store;Phar...,< RM 50,1,5,2,3,2,2,1,5,1,3,2,No,Yes,No,1,I wish to have free pigmentation on my face an...


In [8]:
## Checking for data anomalies
df.loc[
    (df['Have you ever used any skin care products?'] == 'No') &
    ~(df['How often do you buy skincare products?'].isin(['No', 'Never']))
]

Unnamed: 0,form_id,Timestamp,Gender,Age,Race,Occupation,Do you agree that skincare is important ?,Have you ever used any skin care products?,"Which, if any, of the following statements applies to you?",Which of the following types of ingredients would make you more likely to buy a skin care product?,How do/did you choose your products?,Do you use samples before buying skincare products?,How often do you buy skincare products?,How willing are you to try different skin care products?,Where do you purchase your skin care products?,"On average, how much do you spend on skincare products each month?",I wasting to much time to find out skincare and routine that suits my skin.,I being doubtful about the information shared by influencers and brand promoted content.,I feel difficult to understand the list of ingredients on the products.,I bought expensive product but doesn't see any improvement on my skin.,I unaware of which ingredients is the best or to avoid according my skin type.,"I experienced allergies after use a new skincare products ( Eg : Rashness, acne, purging etc)",I want to reduce time to find which routine / products suits with my skin.,I want have solutions from expert in effortless and cheap way.,I want to have my personalized skincare routine that suitable with my current products.,I want to gain knowledge of skincare regime in easy and understandable way.,I want to adapt a healthy lifestyle for a glowing and healthy skin.,Do you think that technology can improve your skincare routine?,Do you have heard about AI (Artificial Intelligent)?,"After you know about AI, do you want to have a skin scanning app that can customize skincare regime?",Do you feel excited to use this skincare application?,Do share your skincare goals and motivation with us!
4032,F00002,2022-03-22 02:33:00,Male,18,JAVANESSE,Own,3,No,I have pigmentation;I have redness and sensiti...,I have brown spots from sun damages,Friend/Relative recommendation;Product's ingre...,No,Yes,1,Internet / online;Shoppee Official Store,> RM 200,1,5,2,3,3,2,1,5,1,5,3,No,No,No,3,To address skin dullness and lack of radiance
5215,F00007,2022-03-22 04:47:00,Female,34,Sudanese,Employee,5,No,I suffer from redness and sensitivity; I have ...,Natural ingredients;Noncomedogenic ingredients...,Salesman's recommendation,No,Yes,5,Specialized beauty shops,> RM 150,3,4,1,4,2,3,3,2,2,5,3,Yes,No,No,1,To achieve a more balanced and even skin compl...
9696,F00010,2022-03-22 06:52:00,Male,20-25,Bumiputra sabah,Employee,5,No,I have dry and dull skin;I have sensitive skin...,Natural ingredients,Pricing;Product's ingredients,Yes,Very often,2,Internet / online;Pharmacy;Specialized beauty ...,RM 150 - RM 200,5,5,2,3,2,2,3,1,4,4,2,No,Yes,No,1,To incorporate more plant-based ingredients fo...
8460,F00012,2022-03-22 09:03:00,Female,30-35,Arab,Teacher,Yes,No,I have a combination skin with dry and oily pl...,Natural ingredients;Hypoallergenic ingredients,Friend/Relative recommendation;Dermatologist's...,No,Sometimes,3,Internet / online;Pharmacy,< RM 50,1,5,5,3,3,1,3,1,2,5,2,Yes,No,Yes,3,Whitening and avoid dryness / too oily.
6550,F00013,2022-03-22 09:29:00,Male,24,Iban,Software Engineer,3,No,I have dry and dull skin;I have a combination ...,Natural ingredients;Petroleum-free ingredients,Friend/Relative recommendation;Brand;Dermatolo...,Yes,Yes,2,Mall/Department Store,< RM 50,5,2,3,1,5,4,1,3,2,2,1,No,Yes,No,2,To address signs of fatigue and dullness in my...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9439,F09988,2022-11-29 17:51:00,Female,32,Libya,Software Engineer,3,No,I struggle with acne and breakout;I have a com...,Natural ingredients;Alcohol-free ingredients;O...,Friend/Relative recommendation;Brand;Pricing;P...,Yes,Very often,4,Internet / online;Pharmacy;Specialized beauty ...,Less than RM 20,5,5,3,2,2,5,5,1,5,4,4,Yes,Yes,No,5,To achieve healthier and clearer skin
3278,F09992,2022-11-29 19:51:00,Male,27,Bumiputra Sabah (bajau),Architect,3,No,I have oily skin;I struggle with acne and brea...,Natural ingredients;Oil-free ingredients,Brand;Product's ingredients,Yes,Sometimes,4,Mall / Department Store;Pharmacy;Specialized b...,> RM 150,1,1,1,2,2,5,1,3,5,2,2,No,Yes,No,5,To achieve a healthy glow and radiant complexion
8806,F09993,2022-11-29 20:16:00,Male,18,JAVANESSE,Engineer,3,No,I have a combination skin with dry and oily pl...,Natural ingredients;Alcohol-free ingredients,Friend/Relative recommendation;Brand;Pricing;P...,No,Very often,5,Internet / online;Mall / Department Store;Phar...,RM 200 - RM 300,4,4,5,4,1,2,1,3,5,3,4,No,Yes,Yes,1,To address hyperpigmentation and achieve an ev...
969,F09995,2022-11-29 20:50:00,Male,23,Indian,Doctor,5,No,I have dry and dull skin;I have sensitive skin...,Natural ingredients;Noncomedogenic ingredients...,Friend/Relative recommendation;Dermatologist's...,Yes,Yes,1,Internet / online;Pharmacy;Specialty beauty store,RM 20 - RM 50,2,2,3,1,3,3,1,1,2,2,2,No,No,No,4,To have a healthy skin and no breakout acne ag...


In [9]:
# Dropping invalid data, since we're looking for 
# responses with some experience in skincare--purchase to usage

## (1) 'Yes, using skincare' but 'How often? No/Never' AND
## (2) 'No, not using skincare' but 'How often? Besides No/Never'

df = df.drop(df.loc[
    (df['Have you ever used any skin care products?'] == 'Yes') &
    (df['How often do you buy skincare products?'].isin(['No', 'Never']))
].index)

df = df.drop(df.loc[
    (df['Have you ever used any skin care products?'] == 'No') &
    ~(df['How often do you buy skincare products?'].isin(['No', 'Never']))
].index)

In [10]:
df

Unnamed: 0,form_id,Timestamp,Gender,Age,Race,Occupation,Do you agree that skincare is important ?,Have you ever used any skin care products?,"Which, if any, of the following statements applies to you?",Which of the following types of ingredients would make you more likely to buy a skin care product?,How do/did you choose your products?,Do you use samples before buying skincare products?,How often do you buy skincare products?,How willing are you to try different skin care products?,Where do you purchase your skin care products?,"On average, how much do you spend on skincare products each month?",I wasting to much time to find out skincare and routine that suits my skin.,I being doubtful about the information shared by influencers and brand promoted content.,I feel difficult to understand the list of ingredients on the products.,I bought expensive product but doesn't see any improvement on my skin.,I unaware of which ingredients is the best or to avoid according my skin type.,"I experienced allergies after use a new skincare products ( Eg : Rashness, acne, purging etc)",I want to reduce time to find which routine / products suits with my skin.,I want have solutions from expert in effortless and cheap way.,I want to have my personalized skincare routine that suitable with my current products.,I want to gain knowledge of skincare regime in easy and understandable way.,I want to adapt a healthy lifestyle for a glowing and healthy skin.,Do you think that technology can improve your skincare routine?,Do you have heard about AI (Artificial Intelligent)?,"After you know about AI, do you want to have a skin scanning app that can customize skincare regime?",Do you feel excited to use this skincare application?,Do share your skincare goals and motivation with us!
7388,F00005,2022-03-22 03:10:00,Female,31,Arab,Student,4,Yes,I have dry and dull skin;I have sensitive skin...,Natural ingredients;Noncomedogenic ingredients...,Brand;Pricing;Product's ingredients,No,Sometimes,4,Mall / Department Store,RM 200 - RM 300,3,3,2,3,2,2,4,3,4,3,5,Yes,Yes,Yes,5,To keep my skin hydrated and healthy-looking
2466,F00006,2022-03-22 03:24:00,Male,18,Nigerian,Employee,5,Yes,I have oily skin;I struggle with acne and brea...,Alcohol-free ingredients,Dermatologist's advice;Product's ingredients;A...,Yes,Very often,4,Internet / online;Mall / Department Store,> RM 150,4,5,5,1,2,4,5,2,4,4,1,Yes,Yes,No,1,beauty is pain.
2502,F00008,2022-03-22 05:15:00,Male,23,Bumiputra Sabah (Bajau),Own,4,Yes,I suffer from redness and sensitivity;I have d...,Natural ingredients;Noncomedogenic ingredients...,Brand;Pricing,No,Rarely,4,Mall / Department Store;Online,Less than RM 20,1,2,4,2,3,5,4,1,2,4,2,Yes,Yes,No,2,To prioritize sun protection and prevent sun d...
6228,F00009,2022-03-22 05:32:00,Female,30,Bumiputra Sabah (Bajau),Doctor,2,Yes,I have pigmentation;I have an uneven skin tone...,Natural ingredients;Noncomedogenic ingredients...,Friend/Relative recommendation;Dermatologist's...,Yes,Yes,4,Mall / Department Store;Online,RM 100 - RM 150,3,5,5,5,3,2,4,3,2,3,4,Yes,No,No,4,To have healthier and clearer skin.
7262,F00011,2022-03-22 07:29:00,Female,27,Malay,Doctor,4,Yes,I have pigmentation;I suffer from redness and ...,Natural ingredients;Dye-free ingredients,Friend/Relative recommendation;Pricing;Product...,No,Very often,1,Mall / Department Store;Specialized beauty shops,RM 100 - RM 150,4,2,2,2,1,2,1,5,3,4,1,No,No,Yes,2,To prioritize skin health as part of my self-c...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8317,F09991,2022-11-29 19:20:00,Female,> 35,Bumiputra Sabah,Student,Yes,No,I struggle with acne and breakout;I have dry a...,Noncomedogenic ingredients (ingredients that d...,Product's ingredients,No,No,3,Mall / Department Store;Online,> RM 200,5,3,3,5,4,1,1,5,4,5,5,No,Yes,No,3,To maintain healthy and youthful skin
1445,F09994,2022-11-29 20:22:00,Male,35,Arab,Software Engineer,5,Yes,I have pigmentation;I suffer from redness and ...,Natural ingredients;Hypoallergenic ingredients...,Brand;Product's ingredients;Advertising and ma...,No,Very often,3,Internet / online;Specialized beauty shops,Less than RM 20,3,2,4,4,4,3,4,3,5,4,1,No,Yes,No,1,To achieve clear and healthy-looking skin
2593,F09996,2022-11-29 21:22:00,Female,34,Sudanese,Manager,3,Yes,I struggle with acne and breakout;I have dry a...,I have brown spots from sun damages,Friend/Relative recommendation;Dermatologist's...,No,Often,2,Mall/Department Store,RM 100 - RM 150,1,5,4,3,2,2,3,3,2,3,3,Yes,No,No,5,To achieve a flawless makeup base through skin...
6922,F09999,2022-11-29 22:49:00,Male,23,Arabian,Student,Yes,Yes,I suffer from redness and sensitivity,Natural ingredients;Cruelty-free ingredients;V...,Pricing;Product's ingredients,Yes,Very often,3,Internet / online;Pharmacy;Specialized beauty ...,> RM 200,5,1,2,5,4,4,5,5,1,4,2,No,No,No,3,To achieve a glowing complexion and reduce ble...


## Data Cleaning

In [11]:
## GENDER: Changing 'Male' and 'Female' to 'M' and 'F'

df['Gender'] = np.where(
    df['Gender'] == 'Male', 'M', 'F')

In [12]:
## AGE: Changing different formats to bins

df['Age'].value_counts()

Age
22       239
32       224
15-20    223
30-35    221
35       220
29       220
26       215
30       213
35-40    211
31       210
33       207
28       207
27       207
20-25    205
23       203
34       199
> 35     198
21       195
20       195
18       194
25       193
>35      191
25-30    188
24       187
19       181
Name: count, dtype: int64

In [13]:
df.loc[df['Age'].str.contains('>'), 'Age'] = '36' ## >35

df.loc[df['Age'].str.contains('-'), 'Age'] = (
    df.loc[df['Age'].str.contains('-'), 'Age']
        .str.replace('-\d+', '', regex=True)
)

In [14]:
df['Age'] = df['Age'].astype(int)
df['Age'] = pd.cut(df['Age'],
    bins=[15, 20, 25, 30, 35, np.inf],
    labels=['15-19', '20-24', '25-29', '30-34', '>35'],
right=False)

In [15]:
df['Age'].value_counts()

Age
30-34    1274
25-29    1230
20-24    1224
>35       820
15-19     598
Name: count, dtype: int64

In [16]:
## RACE: Streamlining 'Race' formats

df['Race'].value_counts()

Race
Korean                     259
Malay                      254
korean                     253
Bumiputra Sabah            252
Arabian                    248
Bumiputra Sabah (Bajau)    248
Sabahan                    245
African                    243
Iban                       238
Nigerian                   237
JAVANESSE                  236
Indian                     236
Bumiputra Sabah (bajau)    233
Arab                       232
Libya                      230
Arab                       225
Sudanese                   223
sabahan                    222
Bumiputra sabah            219
Libyan                     215
Libya                      202
Chinese                    196
Name: count, dtype: int64

In [17]:
## Combining duplicate values and fixing some formats

df.loc[df['Race'].str.lower().str.contains('arab')  , 'Race'] = 'Arabian'
df.loc[df['Race'].str.lower().str.contains('korean'), 'Race'] = 'Korean'
df.loc[df['Race'].str.lower().str.contains('sabah') , 'Race'] = 'Bumiputra Sabah (Bajau)'
df.loc[df['Race'].str.lower().str.contains('libya') , 'Race'] = 'Libyan'
df.loc[df['Race'].str.lower().str.contains('java')  , 'Race'] = 'Javan'
df.loc[df['Race'].str.lower().str.contains('sudan') , 'Race'] = 'Sudan'

In [18]:
df['Race'].value_counts()

Race
Bumiputra Sabah (Bajau)    1419
Arabian                     705
Libyan                      647
Korean                      512
Malay                       254
African                     243
Iban                        238
Nigerian                    237
Javan                       236
Indian                      236
Sudan                       223
Chinese                     196
Name: count, dtype: int64

In [19]:
df['Occupation'].value_counts()

Occupation
Teacher              440
Engineer             420
Architect            417
Software Engineer    403
Own                  402
Manager              396
Consultant           394
Doctor               393
Retired              390
Student              381
Consultant           377
Designer             371
Employee             362
Name: count, dtype: int64

In [20]:
df.loc[df['Occupation'].str.lower()
       .str.contains('consultant') , 'Occupation'] = 'Consultant'

In [21]:
df['Occupation'].value_counts()

Occupation
Consultant           771
Teacher              440
Engineer             420
Architect            417
Software Engineer    403
Own                  402
Manager              396
Doctor               393
Retired              390
Student              381
Designer             371
Employee             362
Name: count, dtype: int64

In [22]:
skin_class = df.loc[:, df.columns[:6]].join(
    pd.get_dummies(
        df['Which, if any, of the following statements applies to you?']
.str.get_dummies(sep=';')))

In [23]:
skin_class

Unnamed: 0,form_id,Timestamp,Gender,Age,Race,Occupation,I have a combination skin with dry and oily places,I have brown spots from sun damages,I suffer from redness and sensitivity,I have a combination skin with dry and oily places.1,I have acne scars,I have an uneven skin tone,I have brown spots from sun damages.1,I have combination skin,I have combination skin with dry and oily places,I have dry and dull skin,I have dry skin,I have fine lines and wrinkles,I have normal skin with very minor problem.,I have oily skin,I have pigmentation,I have redness and sensitivity,I have sensitive skin,I have wrinkles,I struggle with acne and breakout,I suffer from redness and sensitivity.1,"Not priority, natural skin looks better",Tiny bumps on forehead,blackheads and whiteheads due to clogged pores,dark skin,eczema
7388,F00005,2022-03-22 03:10:00,F,30-34,Arabian,Student,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,1,0,1,0,0,0,0,0,0
2466,F00006,2022-03-22 03:24:00,M,15-19,Nigerian,Employee,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0
2502,F00008,2022-03-22 05:15:00,M,20-24,Bumiputra Sabah (Bajau),Own,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0
6228,F00009,2022-03-22 05:32:00,F,30-34,Bumiputra Sabah (Bajau),Doctor,0,0,0,0,0,1,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
7262,F00011,2022-03-22 07:29:00,F,25-29,Malay,Doctor,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8317,F09991,2022-11-29 19:20:00,F,>35,Bumiputra Sabah (Bajau),Student,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
1445,F09994,2022-11-29 20:22:00,M,>35,Arabian,Software Engineer,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0
2593,F09996,2022-11-29 21:22:00,F,30-34,Sudan,Manager,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
6922,F09999,2022-11-29 22:49:00,M,20-24,Arabian,Student,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0


In [24]:
skin_class.columns

Index(['form_id', 'Timestamp', 'Gender', 'Age', 'Race', 'Occupation',
       ' I have a combination skin with dry and oily places',
       ' I have brown spots from sun damages',
       ' I suffer from redness and sensitivity',
       'I have a combination skin with dry and oily places',
       'I have acne scars', 'I have an uneven skin tone',
       'I have brown spots from sun damages', 'I have combination skin',
       'I have combination skin with dry and oily places',
       'I have dry and dull skin', 'I have dry skin',
       'I have fine lines and wrinkles',
       'I have normal skin with very minor problem.', 'I have oily skin',
       'I have pigmentation', 'I have redness and sensitivity',
       'I have sensitive skin', 'I have wrinkles',
       'I struggle with acne and breakout',
       'I suffer from redness and sensitivity',
       'Not priority, natural skin looks better', 'Tiny bumps on forehead',
       'blackheads and whiteheads due to clogged pores', 'dark skin',

In [25]:
skin_class

Unnamed: 0,form_id,Timestamp,Gender,Age,Race,Occupation,I have a combination skin with dry and oily places,I have brown spots from sun damages,I suffer from redness and sensitivity,I have a combination skin with dry and oily places.1,I have acne scars,I have an uneven skin tone,I have brown spots from sun damages.1,I have combination skin,I have combination skin with dry and oily places,I have dry and dull skin,I have dry skin,I have fine lines and wrinkles,I have normal skin with very minor problem.,I have oily skin,I have pigmentation,I have redness and sensitivity,I have sensitive skin,I have wrinkles,I struggle with acne and breakout,I suffer from redness and sensitivity.1,"Not priority, natural skin looks better",Tiny bumps on forehead,blackheads and whiteheads due to clogged pores,dark skin,eczema
7388,F00005,2022-03-22 03:10:00,F,30-34,Arabian,Student,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,1,0,1,0,0,0,0,0,0
2466,F00006,2022-03-22 03:24:00,M,15-19,Nigerian,Employee,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0
2502,F00008,2022-03-22 05:15:00,M,20-24,Bumiputra Sabah (Bajau),Own,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0
6228,F00009,2022-03-22 05:32:00,F,30-34,Bumiputra Sabah (Bajau),Doctor,0,0,0,0,0,1,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
7262,F00011,2022-03-22 07:29:00,F,25-29,Malay,Doctor,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8317,F09991,2022-11-29 19:20:00,F,>35,Bumiputra Sabah (Bajau),Student,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
1445,F09994,2022-11-29 20:22:00,M,>35,Arabian,Software Engineer,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0
2593,F09996,2022-11-29 21:22:00,F,30-34,Sudan,Manager,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
6922,F09999,2022-11-29 22:49:00,M,20-24,Arabian,Student,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0


In [26]:
## Combining duplicate columns

dupl_cols_skin_class = ['combination skin', 'uneven skin', 
    'brown spots', 'redness and sensitivity']

for kw in dupl_cols_skin_class:
    kw_cols = skin_class.columns[
        skin_class.columns.str.contains(kw)].to_list()
    skin_class[kw] = 0
    for col in kw_cols:
        skin_class[kw] += skin_class[col]
    skin_class = skin_class.drop(columns=kw_cols)

In [27]:
skin_class.columns

Index(['form_id', 'Timestamp', 'Gender', 'Age', 'Race', 'Occupation',
       'I have acne scars', 'I have dry and dull skin', 'I have dry skin',
       'I have fine lines and wrinkles',
       'I have normal skin with very minor problem.', 'I have oily skin',
       'I have pigmentation', 'I have sensitive skin', 'I have wrinkles',
       'I struggle with acne and breakout',
       'Not priority, natural skin looks better', 'Tiny bumps on forehead',
       'blackheads and whiteheads due to clogged pores', 'dark skin', 'eczema',
       'combination skin', 'uneven skin', 'brown spots',
       'redness and sensitivity'],
      dtype='object')

In [28]:
## Renaming `skin_class` formats
skin_class.columns = (
    skin_class.columns.str.replace(
        r'I have |I struggle with |, natural skin looks better| with very minor problem.|blackheads and whiteheads due to ', '', regex=True)
    .str.lower().str.replace(' ', '_')
)

In [29]:
skin_class.columns

Index(['form_id', 'timestamp', 'gender', 'age', 'race', 'occupation',
       'acne_scars', 'dry_and_dull_skin', 'dry_skin',
       'fine_lines_and_wrinkles', 'normal_skin', 'oily_skin', 'pigmentation',
       'sensitive_skin', 'wrinkles', 'acne_and_breakout', 'not_priority',
       'tiny_bumps_on_forehead', 'clogged_pores', 'dark_skin', 'eczema',
       'combination_skin', 'uneven_skin', 'brown_spots',
       'redness_and_sensitivity'],
      dtype='object')

In [30]:
skin_class = skin_class.set_index([
    'form_id', 'timestamp', 'gender', 'age', 'race', 'occupation']
).add_prefix('SC_').reset_index()

In [31]:
skin_class.columns

Index(['form_id', 'timestamp', 'gender', 'age', 'race', 'occupation',
       'SC_acne_scars', 'SC_dry_and_dull_skin', 'SC_dry_skin',
       'SC_fine_lines_and_wrinkles', 'SC_normal_skin', 'SC_oily_skin',
       'SC_pigmentation', 'SC_sensitive_skin', 'SC_wrinkles',
       'SC_acne_and_breakout', 'SC_not_priority', 'SC_tiny_bumps_on_forehead',
       'SC_clogged_pores', 'SC_dark_skin', 'SC_eczema', 'SC_combination_skin',
       'SC_uneven_skin', 'SC_brown_spots', 'SC_redness_and_sensitivity'],
      dtype='object')

In [32]:
## Verify that there are no double inputs (max > 1)
skin_class.set_index([
    'form_id', 'timestamp', 'gender', 'age', 'race', 'occupation']
).max()

SC_acne_scars                 1
SC_dry_and_dull_skin          1
SC_dry_skin                   1
SC_fine_lines_and_wrinkles    1
SC_normal_skin                1
SC_oily_skin                  1
SC_pigmentation               1
SC_sensitive_skin             1
SC_wrinkles                   1
SC_acne_and_breakout          1
SC_not_priority               1
SC_tiny_bumps_on_forehead     1
SC_clogged_pores              1
SC_dark_skin                  1
SC_eczema                     1
SC_combination_skin           1
SC_uneven_skin                1
SC_brown_spots                1
SC_redness_and_sensitivity    1
dtype: int64

In [33]:
ingr_prefs = df.loc[:, df.columns[:6]].join(
    pd.get_dummies(
        df['Which of the following types of ingredients would make you more likely to buy a skin care product?']
.str.get_dummies(sep=';')))

In [34]:
ingr_prefs

Unnamed: 0,form_id,Timestamp,Gender,Age,Race,Occupation,Alcohol-free ingredients,Noncomedogenic ingredients (ingredients that do not block pores),Oil-free ingredients,Alcohol-free ingredients.1,Antioxidant ingredients,Cheap,Cruelty-free ingredients,Dye-free ingredients,Exfoliating ingredients,Fragrance free,Fragrance-free ingredients,Hypoallergenic ingredients,I have brown spots from sun damages,Low pH,Natural ingredients,No perfume ingredients added.,Noncomedogenic ingredients (ingredients that do not block pores).1,Oil-free ingredients.1,Petroleum-free ingredients,Vegan ingredients,fragrance-free
7388,F00005,2022-03-22 03:10:00,F,30-34,Arabian,Student,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0
2466,F00006,2022-03-22 03:24:00,M,15-19,Nigerian,Employee,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2502,F00008,2022-03-22 05:15:00,M,20-24,Bumiputra Sabah (Bajau),Own,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0
6228,F00009,2022-03-22 05:32:00,F,30-34,Bumiputra Sabah (Bajau),Doctor,0,0,0,1,0,1,0,1,0,0,0,0,0,0,1,0,1,1,1,0,0
7262,F00011,2022-03-22 07:29:00,F,25-29,Malay,Doctor,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8317,F09991,2022-11-29 19:20:00,F,>35,Bumiputra Sabah (Bajau),Student,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0
1445,F09994,2022-11-29 20:22:00,M,>35,Arabian,Software Engineer,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,1,0,0,0,0
2593,F09996,2022-11-29 21:22:00,F,30-34,Sudan,Manager,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
6922,F09999,2022-11-29 22:49:00,M,20-24,Arabian,Student,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,1,0


In [35]:
ingr_prefs.columns

Index(['form_id', 'Timestamp', 'Gender', 'Age', 'Race', 'Occupation',
       ' Alcohol-free ingredients',
       ' Noncomedogenic ingredients (ingredients that do not block pores)',
       ' Oil-free ingredients', 'Alcohol-free ingredients',
       'Antioxidant ingredients', 'Cheap', 'Cruelty-free ingredients',
       'Dye-free ingredients', 'Exfoliating ingredients', 'Fragrance free',
       'Fragrance-free ingredients', 'Hypoallergenic ingredients',
       'I have brown spots from sun damages', 'Low pH', 'Natural ingredients',
       'No perfume ingredients added.',
       'Noncomedogenic ingredients (ingredients that do not block pores)',
       'Oil-free ingredients', 'Petroleum-free ingredients',
       'Vegan ingredients', 'fragrance-free'],
      dtype='object')

In [36]:
## Cleaning irrelevant values
ingr_prefs = ingr_prefs.drop(columns=
    'I have brown spots from sun damages')

## Combining duplicate columns
dupl_cols_ingr_prefs = ['alcohol-free', 
    'noncomedogenic', 'oil-free', 'fragrance']

for kw in dupl_cols_ingr_prefs:
    kw_cols = ingr_prefs.columns[
        ingr_prefs.columns.str.lower().str.contains(kw)].to_list()
    ingr_prefs[kw] = 0
    for col in kw_cols:
        ingr_prefs[kw] += ingr_prefs[col]
    ingr_prefs = ingr_prefs.drop(columns=kw_cols)

In [37]:
ingr_prefs.columns

Index(['form_id', 'Timestamp', 'Gender', 'Age', 'Race', 'Occupation',
       'Antioxidant ingredients', 'Cheap', 'Cruelty-free ingredients',
       'Dye-free ingredients', 'Exfoliating ingredients',
       'Hypoallergenic ingredients', 'Low pH', 'Natural ingredients',
       'No perfume ingredients added.', 'Petroleum-free ingredients',
       'Vegan ingredients', 'alcohol-free', 'noncomedogenic', 'oil-free',
       'fragrance'],
      dtype='object')

In [38]:
## Renaming `ingr_choice` formats
ingr_prefs.columns = (
    ingr_prefs.columns.str.replace(
        r' ingredients*', '', regex=True).str.lower()
        .str.replace('.', '').str.replace(r' |-', '_', regex=True)
)

In [39]:
ingr_prefs = ingr_prefs.set_index([
    'form_id', 'timestamp', 'gender', 'age', 'race', 'occupation']
).add_prefix('IP_').reset_index()

In [40]:
ingr_prefs.columns

Index(['form_id', 'timestamp', 'gender', 'age', 'race', 'occupation',
       'IP_antioxidant', 'IP_cheap', 'IP_cruelty_free', 'IP_dye_free',
       'IP_exfoliating', 'IP_hypoallergenic', 'IP_low_ph', 'IP_natural',
       'IP_no_perfume_added', 'IP_petroleum_free', 'IP_vegan',
       'IP_alcohol_free', 'IP_noncomedogenic', 'IP_oil_free', 'IP_fragrance'],
      dtype='object')

In [41]:
## Verify that there are no double inputs (max > 1)
ingr_prefs.set_index([
    'form_id', 'timestamp', 'gender', 'age', 'race', 'occupation']
).max()

IP_antioxidant         1
IP_cheap               1
IP_cruelty_free        1
IP_dye_free            1
IP_exfoliating         1
IP_hypoallergenic      1
IP_low_ph              1
IP_natural             1
IP_no_perfume_added    1
IP_petroleum_free      1
IP_vegan               1
IP_alcohol_free        1
IP_noncomedogenic      1
IP_oil_free            1
IP_fragrance           1
dtype: int64

In [42]:
prod_references = df.loc[:, df.columns[:6]].join(
    pd.get_dummies(
        df['How do/did you choose your products?']
.str.get_dummies(sep=';')))

In [43]:
prod_references

Unnamed: 0,form_id,Timestamp,Gender,Age,Race,Occupation,"Advertising and marketing ( Tiktok, Facebook, Twitter etc)",Brand,Dermatologist's advice,Friend/Relative recommendation,Observation and review from the other users,Pricing,Product's ingredients,Salesman's recommendation
7388,F00005,2022-03-22 03:10:00,F,30-34,Arabian,Student,0,1,0,0,0,1,1,0
2466,F00006,2022-03-22 03:24:00,M,15-19,Nigerian,Employee,1,0,1,0,0,0,1,0
2502,F00008,2022-03-22 05:15:00,M,20-24,Bumiputra Sabah (Bajau),Own,0,1,0,0,0,1,0,0
6228,F00009,2022-03-22 05:32:00,F,30-34,Bumiputra Sabah (Bajau),Doctor,0,0,1,1,0,0,0,0
7262,F00011,2022-03-22 07:29:00,F,25-29,Malay,Doctor,0,0,0,1,0,1,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8317,F09991,2022-11-29 19:20:00,F,>35,Bumiputra Sabah (Bajau),Student,0,0,0,0,0,0,1,0
1445,F09994,2022-11-29 20:22:00,M,>35,Arabian,Software Engineer,1,1,0,0,0,0,1,1
2593,F09996,2022-11-29 21:22:00,F,30-34,Sudan,Manager,0,0,1,1,0,0,1,0
6922,F09999,2022-11-29 22:49:00,M,20-24,Arabian,Student,0,0,0,0,0,1,1,0


In [44]:
rename_prod_refs = ['form_id', 'timestamp', 'gender', 'age', 'race', 'occupation',
    'social_media', 'brand', 'dermatologist_advice', 
    'friends_or_relative_recs', 'observation_or_review', 
    'pricing', 'product_ingr', 'salesmen_recs']

prod_references.columns = rename_prod_refs

prod_references = prod_references.set_index([
    'form_id', 'timestamp', 'gender', 'age', 'race', 'occupation']
).add_prefix('PR_').reset_index()

In [45]:
prod_references

Unnamed: 0,form_id,timestamp,gender,age,race,occupation,PR_social_media,PR_brand,PR_dermatologist_advice,PR_friends_or_relative_recs,PR_observation_or_review,PR_pricing,PR_product_ingr,PR_salesmen_recs
0,F00005,2022-03-22 03:10:00,F,30-34,Arabian,Student,0,1,0,0,0,1,1,0
1,F00006,2022-03-22 03:24:00,M,15-19,Nigerian,Employee,1,0,1,0,0,0,1,0
2,F00008,2022-03-22 05:15:00,M,20-24,Bumiputra Sabah (Bajau),Own,0,1,0,0,0,1,0,0
3,F00009,2022-03-22 05:32:00,F,30-34,Bumiputra Sabah (Bajau),Doctor,0,0,1,1,0,0,0,0
4,F00011,2022-03-22 07:29:00,F,25-29,Malay,Doctor,0,0,0,1,0,1,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5141,F09991,2022-11-29 19:20:00,F,>35,Bumiputra Sabah (Bajau),Student,0,0,0,0,0,0,1,0
5142,F09994,2022-11-29 20:22:00,M,>35,Arabian,Software Engineer,1,1,0,0,0,0,1,1
5143,F09996,2022-11-29 21:22:00,F,30-34,Sudan,Manager,0,0,1,1,0,0,1,0
5144,F09999,2022-11-29 22:49:00,M,20-24,Arabian,Student,0,0,0,0,0,1,1,0


In [46]:
## Verify that there are no double inputs (max > 1)
prod_references.set_index([
    'form_id', 'timestamp', 'gender', 'age', 'race', 'occupation']
).max()

PR_social_media                1
PR_brand                       1
PR_dermatologist_advice        1
PR_friends_or_relative_recs    1
PR_observation_or_review       1
PR_pricing                     1
PR_product_ingr                1
PR_salesmen_recs               1
dtype: int64

In [47]:
where_to_purchase = df.loc[:, df.columns[:6]].join(
    pd.get_dummies(
        df['Where do you purchase your skin care products?']
.str.get_dummies(sep=';')))

In [48]:
where_to_purchase

Unnamed: 0,form_id,Timestamp,Gender,Age,Race,Occupation,Internet / online,Internet/online,Mall / Department Store,Mall/Department Store,Online,Pharmacy,Shoppee Official Store,Specialized beauty shops,Specialty beauty store
7388,F00005,2022-03-22 03:10:00,F,30-34,Arabian,Student,0,0,1,0,0,0,0,0,0
2466,F00006,2022-03-22 03:24:00,M,15-19,Nigerian,Employee,1,0,1,0,0,0,0,0,0
2502,F00008,2022-03-22 05:15:00,M,20-24,Bumiputra Sabah (Bajau),Own,0,0,1,0,1,0,0,0,0
6228,F00009,2022-03-22 05:32:00,F,30-34,Bumiputra Sabah (Bajau),Doctor,0,0,1,0,1,0,0,0,0
7262,F00011,2022-03-22 07:29:00,F,25-29,Malay,Doctor,0,0,1,0,0,0,0,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8317,F09991,2022-11-29 19:20:00,F,>35,Bumiputra Sabah (Bajau),Student,0,0,1,0,1,0,0,0,0
1445,F09994,2022-11-29 20:22:00,M,>35,Arabian,Software Engineer,1,0,0,0,0,0,0,1,0
2593,F09996,2022-11-29 21:22:00,F,30-34,Sudan,Manager,0,0,0,1,0,0,0,0,0
6922,F09999,2022-11-29 22:49:00,M,20-24,Arabian,Student,1,0,0,0,0,1,0,1,0


In [49]:
## Combining duplicate columns
dupl_cols_where_to_purchase = ['online', 'mall', 'beauty']

for kw in dupl_cols_where_to_purchase:
    kw_cols = where_to_purchase.columns[
        where_to_purchase.columns.str.lower().str.contains(kw)].to_list()
    where_to_purchase[kw] = 0
    for col in kw_cols:
        where_to_purchase[kw] += where_to_purchase[col]
    where_to_purchase = where_to_purchase.drop(columns=kw_cols)

In [50]:
where_to_purchase.columns

Index(['form_id', 'Timestamp', 'Gender', 'Age', 'Race', 'Occupation',
       'Pharmacy', 'Shoppee Official Store', 'online', 'mall', 'beauty'],
      dtype='object')

In [51]:
rename_where_to_purchase = [
    'form_id', 'timestamp', 'gender', 'age', 'race', 'occupation',
    'pharmacy', 'shopee', 'online', 'mall_or_dept_store', 'beauty_store']

where_to_purchase.columns = rename_where_to_purchase

where_to_purchase = where_to_purchase.set_index(
    ['form_id', 'timestamp', 'gender', 'age', 'race', 'occupation']
).add_prefix('BW_').reset_index()

In [52]:
where_to_purchase

Unnamed: 0,form_id,timestamp,gender,age,race,occupation,BW_pharmacy,BW_shopee,BW_online,BW_mall_or_dept_store,BW_beauty_store
0,F00005,2022-03-22 03:10:00,F,30-34,Arabian,Student,0,0,0,1,0
1,F00006,2022-03-22 03:24:00,M,15-19,Nigerian,Employee,0,0,1,1,0
2,F00008,2022-03-22 05:15:00,M,20-24,Bumiputra Sabah (Bajau),Own,0,0,1,1,0
3,F00009,2022-03-22 05:32:00,F,30-34,Bumiputra Sabah (Bajau),Doctor,0,0,1,1,0
4,F00011,2022-03-22 07:29:00,F,25-29,Malay,Doctor,0,0,0,1,1
...,...,...,...,...,...,...,...,...,...,...,...
5141,F09991,2022-11-29 19:20:00,F,>35,Bumiputra Sabah (Bajau),Student,0,0,1,1,0
5142,F09994,2022-11-29 20:22:00,M,>35,Arabian,Software Engineer,0,0,1,0,1
5143,F09996,2022-11-29 21:22:00,F,30-34,Sudan,Manager,0,0,0,1,0
5144,F09999,2022-11-29 22:49:00,M,20-24,Arabian,Student,1,0,1,0,1


In [53]:
## Verify that there are no double inputs (max > 1)
where_to_purchase.set_index(
    ['form_id', 'timestamp', 'gender', 'age', 'race', 'occupation']
).max()

BW_pharmacy              1
BW_shopee                1
BW_online                1
BW_mall_or_dept_store    1
BW_beauty_store          1
dtype: int64

In [54]:
df.columns

Index(['form_id', 'Timestamp', 'Gender', 'Age', 'Race', 'Occupation',
       'Do you agree that skincare is important ?',
       'Have you ever used any skin care products?',
       'Which, if any, of the following statements applies to you?',
       'Which of the following types of ingredients would make you more likely to buy a skin care product?',
       'How do/did you choose your products?',
       'Do you use samples before buying skincare products?',
       'How often do you buy skincare products?',
       'How willing are you to try different skin care products?',
       'Where do you purchase your skin care products?',
       'On average, how much do you spend on skincare products each month?',
       'I wasting to much time to find out skincare and routine that suits my skin.',
       'I being doubtful about the information shared by influencers and brand promoted content.',
       'I feel difficult to understand the list of ingredients on the products.',
       'I bought exp

In [55]:
drop_cleaned_cols = [
    'Which, if any, of the following statements applies to you?',
    'Which of the following types of ingredients would make you more likely to buy a skin care product?',
    'How do/did you choose your products?',
    'Where do you purchase your skin care products?'
]


yes_no_ctg_questions = df.drop(columns=drop_cleaned_cols)
yes_no_ctg_questions = yes_no_ctg_questions.set_index(
    ['form_id', 'Timestamp', 'Gender', 'Age', 'Race', 'Occupation']
)

In [56]:
yes_no_ctg_questions

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Do you agree that skincare is important ?,Have you ever used any skin care products?,Do you use samples before buying skincare products?,How often do you buy skincare products?,How willing are you to try different skin care products?,"On average, how much do you spend on skincare products each month?",I wasting to much time to find out skincare and routine that suits my skin.,I being doubtful about the information shared by influencers and brand promoted content.,I feel difficult to understand the list of ingredients on the products.,I bought expensive product but doesn't see any improvement on my skin.,I unaware of which ingredients is the best or to avoid according my skin type.,"I experienced allergies after use a new skincare products ( Eg : Rashness, acne, purging etc)",I want to reduce time to find which routine / products suits with my skin.,I want have solutions from expert in effortless and cheap way.,I want to have my personalized skincare routine that suitable with my current products.,I want to gain knowledge of skincare regime in easy and understandable way.,I want to adapt a healthy lifestyle for a glowing and healthy skin.,Do you think that technology can improve your skincare routine?,Do you have heard about AI (Artificial Intelligent)?,"After you know about AI, do you want to have a skin scanning app that can customize skincare regime?",Do you feel excited to use this skincare application?,Do share your skincare goals and motivation with us!
form_id,Timestamp,Gender,Age,Race,Occupation,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1
F00005,2022-03-22 03:10:00,F,30-34,Arabian,Student,4,Yes,No,Sometimes,4,RM 200 - RM 300,3,3,2,3,2,2,4,3,4,3,5,Yes,Yes,Yes,5,To keep my skin hydrated and healthy-looking
F00006,2022-03-22 03:24:00,M,15-19,Nigerian,Employee,5,Yes,Yes,Very often,4,> RM 150,4,5,5,1,2,4,5,2,4,4,1,Yes,Yes,No,1,beauty is pain.
F00008,2022-03-22 05:15:00,M,20-24,Bumiputra Sabah (Bajau),Own,4,Yes,No,Rarely,4,Less than RM 20,1,2,4,2,3,5,4,1,2,4,2,Yes,Yes,No,2,To prioritize sun protection and prevent sun d...
F00009,2022-03-22 05:32:00,F,30-34,Bumiputra Sabah (Bajau),Doctor,2,Yes,Yes,Yes,4,RM 100 - RM 150,3,5,5,5,3,2,4,3,2,3,4,Yes,No,No,4,To have healthier and clearer skin.
F00011,2022-03-22 07:29:00,F,25-29,Malay,Doctor,4,Yes,No,Very often,1,RM 100 - RM 150,4,2,2,2,1,2,1,5,3,4,1,No,No,Yes,2,To prioritize skin health as part of my self-c...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
F09991,2022-11-29 19:20:00,F,>35,Bumiputra Sabah (Bajau),Student,Yes,No,No,No,3,> RM 200,5,3,3,5,4,1,1,5,4,5,5,No,Yes,No,3,To maintain healthy and youthful skin
F09994,2022-11-29 20:22:00,M,>35,Arabian,Software Engineer,5,Yes,No,Very often,3,Less than RM 20,3,2,4,4,4,3,4,3,5,4,1,No,Yes,No,1,To achieve clear and healthy-looking skin
F09996,2022-11-29 21:22:00,F,30-34,Sudan,Manager,3,Yes,No,Often,2,RM 100 - RM 150,1,5,4,3,2,2,3,3,2,3,3,Yes,No,No,5,To achieve a flawless makeup base through skin...
F09999,2022-11-29 22:49:00,M,20-24,Arabian,Student,Yes,Yes,Yes,Very often,3,> RM 200,5,1,2,5,4,4,5,5,1,4,2,No,No,No,3,To achieve a glowing complexion and reduce ble...


In [57]:
## Listing the scale questions
yes_no_ctg_questions.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
How willing are you to try different skin care products?,5146.0,3.02604,1.422334,1.0,2.0,3.0,4.0,5.0
I wasting to much time to find out skincare and routine that suits my skin.,5146.0,3.005635,1.415919,1.0,2.0,3.0,4.0,5.0
I being doubtful about the information shared by influencers and brand promoted content.,5146.0,3.004469,1.413175,1.0,2.0,3.0,4.0,5.0
I feel difficult to understand the list of ingredients on the products.,5146.0,2.984065,1.412886,1.0,2.0,3.0,4.0,5.0
I bought expensive product but doesn't see any improvement on my skin.,5146.0,3.023319,1.402982,1.0,2.0,3.0,4.0,5.0
I unaware of which ingredients is the best or to avoid according my skin type.,5146.0,3.002915,1.416065,1.0,2.0,3.0,4.0,5.0
"I experienced allergies after use a new skincare products ( Eg : Rashness, acne, purging etc)",5146.0,3.011854,1.410655,1.0,2.0,3.0,4.0,5.0
I want to reduce time to find which routine / products suits with my skin.,5146.0,2.998251,1.409463,1.0,2.0,3.0,4.0,5.0
I want have solutions from expert in effortless and cheap way.,5146.0,2.954333,1.414369,1.0,2.0,3.0,4.0,5.0
I want to have my personalized skincare routine that suitable with my current products.,5146.0,2.990478,1.413425,1.0,2.0,3.0,4.0,5.0


In [58]:
rename_yes_no_ctg_cols = [
    'scale_is_skincare_important', 
    'is_using_any_skincare_products', 
    'is_using_sample_before_buying_skincare',
    'scale_buying_frequency', 
    'scale_willing_try_skincare',
    'skincare_buying_cost_per_month', 
    'scale_wasting_time_research',
    'scale_doubtful_abt_skincare_info',
    'scale_difficult_to_understand_ingredients',
    'scale_expensive_product_but_no_results',
    'scale_unaware_of_best_ingredient',
    'scale_allergy_on_new_product',
    'scale_w_reduce_research_time',
    'scale_w_easy_solutions_from_expert',
    'scale_w_personalized_routine',
    'scale_w_gain_easy_skincare_knowledge',
    'scale_w_adapt_healthy_lifestyle_for_skin',
    'is_thinking_technology_can_improve_skincare_routine',
    'is_aware_of_AI', 
    'is_open_to_skin_scanning_app_to_customize_skincare',
    'scale_is_excited_to_use_the_app',
    'notes_on_skincare_goals_and_motivation', 
]

In [59]:
yes_no_ctg_questions.columns = rename_yes_no_ctg_cols

In [60]:
yes_no_ctg_questions.columns

Index(['scale_is_skincare_important', 'is_using_any_skincare_products',
       'is_using_sample_before_buying_skincare', 'scale_buying_frequency',
       'scale_willing_try_skincare', 'skincare_buying_cost_per_month',
       'scale_wasting_time_research', 'scale_doubtful_abt_skincare_info',
       'scale_difficult_to_understand_ingredients',
       'scale_expensive_product_but_no_results',
       'scale_unaware_of_best_ingredient', 'scale_allergy_on_new_product',
       'scale_w_reduce_research_time', 'scale_w_easy_solutions_from_expert',
       'scale_w_personalized_routine', 'scale_w_gain_easy_skincare_knowledge',
       'scale_w_adapt_healthy_lifestyle_for_skin',
       'is_thinking_technology_can_improve_skincare_routine', 'is_aware_of_AI',
       'is_open_to_skin_scanning_app_to_customize_skincare',
       'scale_is_excited_to_use_the_app',
       'notes_on_skincare_goals_and_motivation'],
      dtype='object')

In [61]:
yes_no_ctg_questions.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,scale_is_skincare_important,is_using_any_skincare_products,is_using_sample_before_buying_skincare,scale_buying_frequency,scale_willing_try_skincare,skincare_buying_cost_per_month,scale_wasting_time_research,scale_doubtful_abt_skincare_info,scale_difficult_to_understand_ingredients,scale_expensive_product_but_no_results,scale_unaware_of_best_ingredient,scale_allergy_on_new_product,scale_w_reduce_research_time,scale_w_easy_solutions_from_expert,scale_w_personalized_routine,scale_w_gain_easy_skincare_knowledge,scale_w_adapt_healthy_lifestyle_for_skin,is_thinking_technology_can_improve_skincare_routine,is_aware_of_AI,is_open_to_skin_scanning_app_to_customize_skincare,scale_is_excited_to_use_the_app,notes_on_skincare_goals_and_motivation
form_id,Timestamp,Gender,Age,Race,Occupation,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1
F00005,2022-03-22 03:10:00,F,30-34,Arabian,Student,4,Yes,No,Sometimes,4,RM 200 - RM 300,3,3,2,3,2,2,4,3,4,3,5,Yes,Yes,Yes,5,To keep my skin hydrated and healthy-looking
F00006,2022-03-22 03:24:00,M,15-19,Nigerian,Employee,5,Yes,Yes,Very often,4,> RM 150,4,5,5,1,2,4,5,2,4,4,1,Yes,Yes,No,1,beauty is pain.
F00008,2022-03-22 05:15:00,M,20-24,Bumiputra Sabah (Bajau),Own,4,Yes,No,Rarely,4,Less than RM 20,1,2,4,2,3,5,4,1,2,4,2,Yes,Yes,No,2,To prioritize sun protection and prevent sun d...
F00009,2022-03-22 05:32:00,F,30-34,Bumiputra Sabah (Bajau),Doctor,2,Yes,Yes,Yes,4,RM 100 - RM 150,3,5,5,5,3,2,4,3,2,3,4,Yes,No,No,4,To have healthier and clearer skin.
F00011,2022-03-22 07:29:00,F,25-29,Malay,Doctor,4,Yes,No,Very often,1,RM 100 - RM 150,4,2,2,2,1,2,1,5,3,4,1,No,No,Yes,2,To prioritize skin health as part of my self-c...


In [62]:
yes_no_ctg_questions['scale_buying_frequency'].value_counts()

scale_buying_frequency
Rarely        765
Very often    754
Yes           750
Often         727
Never         721
No            716
Sometimes     713
Name: count, dtype: int64

In [63]:
yes_no_ctg_questions.dtypes

scale_is_skincare_important                            object
is_using_any_skincare_products                         object
is_using_sample_before_buying_skincare                 object
scale_buying_frequency                                 object
scale_willing_try_skincare                              int64
skincare_buying_cost_per_month                         object
scale_wasting_time_research                             int64
scale_doubtful_abt_skincare_info                        int64
scale_difficult_to_understand_ingredients               int64
scale_expensive_product_but_no_results                  int64
scale_unaware_of_best_ingredient                        int64
scale_allergy_on_new_product                            int64
scale_w_reduce_research_time                            int64
scale_w_easy_solutions_from_expert                      int64
scale_w_personalized_routine                            int64
scale_w_gain_easy_skincare_knowledge                    int64
scale_w_

In [64]:
yes_no_ctg_questions

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,scale_is_skincare_important,is_using_any_skincare_products,is_using_sample_before_buying_skincare,scale_buying_frequency,scale_willing_try_skincare,skincare_buying_cost_per_month,scale_wasting_time_research,scale_doubtful_abt_skincare_info,scale_difficult_to_understand_ingredients,scale_expensive_product_but_no_results,scale_unaware_of_best_ingredient,scale_allergy_on_new_product,scale_w_reduce_research_time,scale_w_easy_solutions_from_expert,scale_w_personalized_routine,scale_w_gain_easy_skincare_knowledge,scale_w_adapt_healthy_lifestyle_for_skin,is_thinking_technology_can_improve_skincare_routine,is_aware_of_AI,is_open_to_skin_scanning_app_to_customize_skincare,scale_is_excited_to_use_the_app,notes_on_skincare_goals_and_motivation
form_id,Timestamp,Gender,Age,Race,Occupation,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1
F00005,2022-03-22 03:10:00,F,30-34,Arabian,Student,4,Yes,No,Sometimes,4,RM 200 - RM 300,3,3,2,3,2,2,4,3,4,3,5,Yes,Yes,Yes,5,To keep my skin hydrated and healthy-looking
F00006,2022-03-22 03:24:00,M,15-19,Nigerian,Employee,5,Yes,Yes,Very often,4,> RM 150,4,5,5,1,2,4,5,2,4,4,1,Yes,Yes,No,1,beauty is pain.
F00008,2022-03-22 05:15:00,M,20-24,Bumiputra Sabah (Bajau),Own,4,Yes,No,Rarely,4,Less than RM 20,1,2,4,2,3,5,4,1,2,4,2,Yes,Yes,No,2,To prioritize sun protection and prevent sun d...
F00009,2022-03-22 05:32:00,F,30-34,Bumiputra Sabah (Bajau),Doctor,2,Yes,Yes,Yes,4,RM 100 - RM 150,3,5,5,5,3,2,4,3,2,3,4,Yes,No,No,4,To have healthier and clearer skin.
F00011,2022-03-22 07:29:00,F,25-29,Malay,Doctor,4,Yes,No,Very often,1,RM 100 - RM 150,4,2,2,2,1,2,1,5,3,4,1,No,No,Yes,2,To prioritize skin health as part of my self-c...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
F09991,2022-11-29 19:20:00,F,>35,Bumiputra Sabah (Bajau),Student,Yes,No,No,No,3,> RM 200,5,3,3,5,4,1,1,5,4,5,5,No,Yes,No,3,To maintain healthy and youthful skin
F09994,2022-11-29 20:22:00,M,>35,Arabian,Software Engineer,5,Yes,No,Very often,3,Less than RM 20,3,2,4,4,4,3,4,3,5,4,1,No,Yes,No,1,To achieve clear and healthy-looking skin
F09996,2022-11-29 21:22:00,F,30-34,Sudan,Manager,3,Yes,No,Often,2,RM 100 - RM 150,1,5,4,3,2,2,3,3,2,3,3,Yes,No,No,5,To achieve a flawless makeup base through skin...
F09999,2022-11-29 22:49:00,M,20-24,Arabian,Student,Yes,Yes,Yes,Very often,3,> RM 200,5,1,2,5,4,4,5,5,1,4,2,No,No,No,3,To achieve a glowing complexion and reduce ble...


In [65]:
# Checking the case if 'No' means 'Never' or 'Rarely'
yes_no_ctg_questions.loc[
    yes_no_ctg_questions['scale_buying_frequency'] == 'No',
'is_using_any_skincare_products'].value_counts()

is_using_any_skincare_products
No    716
Name: count, dtype: int64

In [66]:
# Checking the case for 'Yes'
yes_no_ctg_questions.loc[
    yes_no_ctg_questions['scale_buying_frequency'] == 'Yes',
'is_using_any_skincare_products'].value_counts()

is_using_any_skincare_products
Yes    750
Name: count, dtype: int64

In [67]:
yes_no_ctg_questions['scale_is_skincare_important'].value_counts()

scale_is_skincare_important
3      1071
Yes    1052
5      1046
2      1008
4       969
Name: count, dtype: int64

In [68]:
# Streamlining format in `scale_how_often_buying_skincare`

scale_how_often = [['Never', 'No'], ['Rarely'], 
    ['Sometimes'], ['Often', 'Yes'], ['Very often']]

for i in range(0, 5):
    yes_no_ctg_questions.loc[
        yes_no_ctg_questions[
            'scale_buying_frequency'].isin(scale_how_often[i]), 
            'scale_buying_frequency'] = i+1

yes_no_ctg_questions['scale_buying_frequency'] = (
    yes_no_ctg_questions['scale_buying_frequency'].astype(int))    
    
# Streamlining format in `scale_is_skincare_important`
yes_no_ctg_questions.loc[
        yes_no_ctg_questions[
            'scale_is_skincare_important'] == 'Yes', 
            'scale_is_skincare_important'] = 4

yes_no_ctg_questions['scale_is_skincare_important'] = (
    yes_no_ctg_questions['scale_is_skincare_important'].astype(int))

In [69]:
yes_no_ctg_questions['scale_buying_frequency'].unique()

array([3, 5, 2, 4, 1])

In [70]:
yes_no_ctg_questions['scale_buying_frequency'].value_counts()

scale_buying_frequency
4    1477
1    1437
2     765
5     754
3     713
Name: count, dtype: int64

In [71]:
## Streamlining boolean and categorical formats

yes_no_cols = [
    'is_using_any_skincare_products', 'is_using_sample_before_buying_skincare',
    'is_aware_of_AI', 'is_thinking_technology_can_improve_skincare_routine',
    'is_open_to_skin_scanning_app_to_customize_skincare'
]

for col in yes_no_cols:
    yes_no_ctg_questions[col] = np.where(
        yes_no_ctg_questions[col] == 'Yes', 1, 0)

In [72]:
yes_no_ctg_questions['skincare_buying_cost_per_month'].value_counts()

skincare_buying_cost_per_month
RM 150 - RM 200    562
RM 200 - RM 300    542
> RM 150           536
> RM 200           524
Less than RM 50    523
RM 50 - RM 100     505
RM 100 - RM 150    499
< RM 50            497
Less than RM 20    482
RM 20 - RM 50      476
Name: count, dtype: int64

In [73]:
yes_no_ctg_questions['skincare_buying_cost_per_month'] = (
    yes_no_ctg_questions['skincare_buying_cost_per_month']
        .str.replace('Less than', '<'))

In [74]:
group_costs = [
    ['< RM 20', '< RM 50', 'RM 20 - RM 50'],
    ['RM 50 - RM 100'], ['RM 100 - RM 150'],
    ['RM 150 - RM 200', '> RM 150'],
    ['> RM 200', 'RM 200 - RM 300']
]

new_group = ['20', '51', '101', '151', '201']

for i in range(0, 5):
    yes_no_ctg_questions.loc[
        yes_no_ctg_questions[
            'skincare_buying_cost_per_month'].isin(group_costs[i]), 
            'skincare_buying_cost_per_month'] = new_group[i]

In [75]:
yes_no_ctg_questions['skincare_buying_cost_per_month'].value_counts()

skincare_buying_cost_per_month
20     1978
151    1098
201    1066
51      505
101     499
Name: count, dtype: int64

In [76]:
yes_no_ctg_questions['skincare_buying_cost_per_month'] = (
    yes_no_ctg_questions['skincare_buying_cost_per_month'].astype(int))

yes_no_ctg_questions['skincare_buying_cost_per_month'] = pd.cut(
    yes_no_ctg_questions['skincare_buying_cost_per_month'],
        bins = [0, 50, 100, 150, 200, np.inf],
        labels = ['0-50', '50-100', '100-150', '150-200', '>200'],
right=False)

In [77]:
yes_no_ctg_questions['skincare_buying_cost_per_month'].value_counts()

skincare_buying_cost_per_month
0-50       1978
150-200    1098
>200       1066
50-100      505
100-150     499
Name: count, dtype: int64

In [78]:
yes_no_ctg_questions.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
scale_is_skincare_important,5146.0,3.603381,1.018727,2.0,3.0,4.0,4.0,5.0
is_using_any_skincare_products,5146.0,0.720754,0.448672,0.0,0.0,1.0,1.0,1.0
is_using_sample_before_buying_skincare,5146.0,0.503498,0.500036,0.0,0.0,1.0,1.0,1.0
scale_buying_frequency,5146.0,2.872911,1.457055,1.0,1.0,3.0,4.0,5.0
scale_willing_try_skincare,5146.0,3.02604,1.422334,1.0,2.0,3.0,4.0,5.0
scale_wasting_time_research,5146.0,3.005635,1.415919,1.0,2.0,3.0,4.0,5.0
scale_doubtful_abt_skincare_info,5146.0,3.004469,1.413175,1.0,2.0,3.0,4.0,5.0
scale_difficult_to_understand_ingredients,5146.0,2.984065,1.412886,1.0,2.0,3.0,4.0,5.0
scale_expensive_product_but_no_results,5146.0,3.023319,1.402982,1.0,2.0,3.0,4.0,5.0
scale_unaware_of_best_ingredient,5146.0,3.002915,1.416065,1.0,2.0,3.0,4.0,5.0


In [79]:
yes_no_ctg_questions = yes_no_ctg_questions.reset_index()
yes_no_ctg_questions.columns = yes_no_ctg_questions.columns.str.lower()

## Combining Data

In [80]:
## ID and demographics
df[df.columns[:6]]

Unnamed: 0,form_id,Timestamp,Gender,Age,Race,Occupation
7388,F00005,2022-03-22 03:10:00,F,30-34,Arabian,Student
2466,F00006,2022-03-22 03:24:00,M,15-19,Nigerian,Employee
2502,F00008,2022-03-22 05:15:00,M,20-24,Bumiputra Sabah (Bajau),Own
6228,F00009,2022-03-22 05:32:00,F,30-34,Bumiputra Sabah (Bajau),Doctor
7262,F00011,2022-03-22 07:29:00,F,25-29,Malay,Doctor
...,...,...,...,...,...,...
8317,F09991,2022-11-29 19:20:00,F,>35,Bumiputra Sabah (Bajau),Student
1445,F09994,2022-11-29 20:22:00,M,>35,Arabian,Software Engineer
2593,F09996,2022-11-29 21:22:00,F,30-34,Sudan,Manager
6922,F09999,2022-11-29 22:49:00,M,20-24,Arabian,Student


In [81]:
skin_class.head()

Unnamed: 0,form_id,timestamp,gender,age,race,occupation,SC_acne_scars,SC_dry_and_dull_skin,SC_dry_skin,SC_fine_lines_and_wrinkles,SC_normal_skin,SC_oily_skin,SC_pigmentation,SC_sensitive_skin,SC_wrinkles,SC_acne_and_breakout,SC_not_priority,SC_tiny_bumps_on_forehead,SC_clogged_pores,SC_dark_skin,SC_eczema,SC_combination_skin,SC_uneven_skin,SC_brown_spots,SC_redness_and_sensitivity
0,F00005,2022-03-22 03:10:00,F,30-34,Arabian,Student,0,1,0,0,0,0,1,1,0,1,0,0,0,0,0,0,0,0,0
1,F00006,2022-03-22 03:24:00,M,15-19,Nigerian,Employee,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0
2,F00008,2022-03-22 05:15:00,M,20-24,Bumiputra Sabah (Bajau),Own,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1
3,F00009,2022-03-22 05:32:00,F,30-34,Bumiputra Sabah (Bajau),Doctor,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,1,0
4,F00011,2022-03-22 07:29:00,F,25-29,Malay,Doctor,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1


In [82]:
skin_class.shape

(5146, 25)

In [83]:
ingr_prefs.head()

Unnamed: 0,form_id,timestamp,gender,age,race,occupation,IP_antioxidant,IP_cheap,IP_cruelty_free,IP_dye_free,IP_exfoliating,IP_hypoallergenic,IP_low_ph,IP_natural,IP_no_perfume_added,IP_petroleum_free,IP_vegan,IP_alcohol_free,IP_noncomedogenic,IP_oil_free,IP_fragrance
0,F00005,2022-03-22 03:10:00,F,30-34,Arabian,Student,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0
1,F00006,2022-03-22 03:24:00,M,15-19,Nigerian,Employee,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0
2,F00008,2022-03-22 05:15:00,M,20-24,Bumiputra Sabah (Bajau),Own,0,0,0,0,0,0,0,1,0,0,0,1,1,0,0
3,F00009,2022-03-22 05:32:00,F,30-34,Bumiputra Sabah (Bajau),Doctor,0,1,0,1,0,0,0,1,0,1,0,1,1,1,0
4,F00011,2022-03-22 07:29:00,F,25-29,Malay,Doctor,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0


In [84]:
ingr_prefs.shape

(5146, 21)

In [85]:
prod_references.head()

Unnamed: 0,form_id,timestamp,gender,age,race,occupation,PR_social_media,PR_brand,PR_dermatologist_advice,PR_friends_or_relative_recs,PR_observation_or_review,PR_pricing,PR_product_ingr,PR_salesmen_recs
0,F00005,2022-03-22 03:10:00,F,30-34,Arabian,Student,0,1,0,0,0,1,1,0
1,F00006,2022-03-22 03:24:00,M,15-19,Nigerian,Employee,1,0,1,0,0,0,1,0
2,F00008,2022-03-22 05:15:00,M,20-24,Bumiputra Sabah (Bajau),Own,0,1,0,0,0,1,0,0
3,F00009,2022-03-22 05:32:00,F,30-34,Bumiputra Sabah (Bajau),Doctor,0,0,1,1,0,0,0,0
4,F00011,2022-03-22 07:29:00,F,25-29,Malay,Doctor,0,0,0,1,0,1,1,0


In [86]:
prod_references.shape

(5146, 14)

In [87]:
where_to_purchase.head()

Unnamed: 0,form_id,timestamp,gender,age,race,occupation,BW_pharmacy,BW_shopee,BW_online,BW_mall_or_dept_store,BW_beauty_store
0,F00005,2022-03-22 03:10:00,F,30-34,Arabian,Student,0,0,0,1,0
1,F00006,2022-03-22 03:24:00,M,15-19,Nigerian,Employee,0,0,1,1,0
2,F00008,2022-03-22 05:15:00,M,20-24,Bumiputra Sabah (Bajau),Own,0,0,1,1,0
3,F00009,2022-03-22 05:32:00,F,30-34,Bumiputra Sabah (Bajau),Doctor,0,0,1,1,0
4,F00011,2022-03-22 07:29:00,F,25-29,Malay,Doctor,0,0,0,1,1


In [88]:
where_to_purchase.shape

(5146, 11)

In [89]:
yes_no_ctg_questions.head()

Unnamed: 0,form_id,timestamp,gender,age,race,occupation,scale_is_skincare_important,is_using_any_skincare_products,is_using_sample_before_buying_skincare,scale_buying_frequency,scale_willing_try_skincare,skincare_buying_cost_per_month,scale_wasting_time_research,scale_doubtful_abt_skincare_info,scale_difficult_to_understand_ingredients,scale_expensive_product_but_no_results,scale_unaware_of_best_ingredient,scale_allergy_on_new_product,scale_w_reduce_research_time,scale_w_easy_solutions_from_expert,scale_w_personalized_routine,scale_w_gain_easy_skincare_knowledge,scale_w_adapt_healthy_lifestyle_for_skin,is_thinking_technology_can_improve_skincare_routine,is_aware_of_ai,is_open_to_skin_scanning_app_to_customize_skincare,scale_is_excited_to_use_the_app,notes_on_skincare_goals_and_motivation
0,F00005,2022-03-22 03:10:00,F,30-34,Arabian,Student,4,1,0,3,4,>200,3,3,2,3,2,2,4,3,4,3,5,1,1,1,5,To keep my skin hydrated and healthy-looking
1,F00006,2022-03-22 03:24:00,M,15-19,Nigerian,Employee,5,1,1,5,4,150-200,4,5,5,1,2,4,5,2,4,4,1,1,1,0,1,beauty is pain.
2,F00008,2022-03-22 05:15:00,M,20-24,Bumiputra Sabah (Bajau),Own,4,1,0,2,4,0-50,1,2,4,2,3,5,4,1,2,4,2,1,1,0,2,To prioritize sun protection and prevent sun d...
3,F00009,2022-03-22 05:32:00,F,30-34,Bumiputra Sabah (Bajau),Doctor,2,1,1,4,4,100-150,3,5,5,5,3,2,4,3,2,3,4,1,0,0,4,To have healthier and clearer skin.
4,F00011,2022-03-22 07:29:00,F,25-29,Malay,Doctor,4,1,0,5,1,100-150,4,2,2,2,1,2,1,5,3,4,1,0,0,1,2,To prioritize skin health as part of my self-c...


In [90]:
yes_no_ctg_questions.shape

(5146, 28)

In [91]:
## Combine dataframes, starting with id_df and then merging it with the others
df_cleaned = df[df.columns[:6]].copy()

cleaned_dfs = [yes_no_ctg_questions, skin_class, ingr_prefs, 
    prod_references, where_to_purchase]

# Dropping `id_df` cols in each parts of another DataFrames
demo_cols = ['timestamp', 'gender', 'age', 'race', 'occupation'] 

for df_parts in cleaned_dfs:
    df_parts.drop(columns=demo_cols, inplace=True)
    df_cleaned = df_cleaned.merge(df_parts, on='form_id', how='left')

In [92]:
df_cleaned

Unnamed: 0,form_id,Timestamp,Gender,Age,Race,Occupation,scale_is_skincare_important,is_using_any_skincare_products,is_using_sample_before_buying_skincare,scale_buying_frequency,scale_willing_try_skincare,skincare_buying_cost_per_month,scale_wasting_time_research,scale_doubtful_abt_skincare_info,scale_difficult_to_understand_ingredients,scale_expensive_product_but_no_results,scale_unaware_of_best_ingredient,scale_allergy_on_new_product,scale_w_reduce_research_time,scale_w_easy_solutions_from_expert,scale_w_personalized_routine,scale_w_gain_easy_skincare_knowledge,scale_w_adapt_healthy_lifestyle_for_skin,is_thinking_technology_can_improve_skincare_routine,is_aware_of_ai,is_open_to_skin_scanning_app_to_customize_skincare,scale_is_excited_to_use_the_app,notes_on_skincare_goals_and_motivation,SC_acne_scars,SC_dry_and_dull_skin,SC_dry_skin,SC_fine_lines_and_wrinkles,SC_normal_skin,SC_oily_skin,SC_pigmentation,SC_sensitive_skin,SC_wrinkles,SC_acne_and_breakout,SC_not_priority,SC_tiny_bumps_on_forehead,SC_clogged_pores,SC_dark_skin,SC_eczema,SC_combination_skin,SC_uneven_skin,SC_brown_spots,SC_redness_and_sensitivity,IP_antioxidant,IP_cheap,IP_cruelty_free,IP_dye_free,IP_exfoliating,IP_hypoallergenic,IP_low_ph,IP_natural,IP_no_perfume_added,IP_petroleum_free,IP_vegan,IP_alcohol_free,IP_noncomedogenic,IP_oil_free,IP_fragrance,PR_social_media,PR_brand,PR_dermatologist_advice,PR_friends_or_relative_recs,PR_observation_or_review,PR_pricing,PR_product_ingr,PR_salesmen_recs,BW_pharmacy,BW_shopee,BW_online,BW_mall_or_dept_store,BW_beauty_store
0,F00005,2022-03-22 03:10:00,F,30-34,Arabian,Student,4,1,0,3,4,>200,3,3,2,3,2,2,4,3,4,3,5,1,1,1,5,To keep my skin hydrated and healthy-looking,0,1,0,0,0,0,1,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,1,0,0,0,1,1,0,0,0,0,1,0
1,F00006,2022-03-22 03:24:00,M,15-19,Nigerian,Employee,5,1,1,5,4,150-200,4,5,5,1,2,4,5,2,4,4,1,1,1,0,1,beauty is pain.,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,1,0,0,0,1,1,0
2,F00008,2022-03-22 05:15:00,M,20-24,Bumiputra Sabah (Bajau),Own,4,1,0,2,4,0-50,1,2,4,2,3,5,4,1,2,4,2,1,1,0,2,To prioritize sun protection and prevent sun d...,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,1,1,0,0,0,1,0,0,0,1,0,0,0,0,1,1,0
3,F00009,2022-03-22 05:32:00,F,30-34,Bumiputra Sabah (Bajau),Doctor,2,1,1,4,4,100-150,3,5,5,5,3,2,4,3,2,3,4,1,0,0,4,To have healthier and clearer skin.,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,1,0,0,1,0,1,0,0,0,1,0,1,0,1,1,1,0,0,0,1,1,0,0,0,0,0,0,1,1,0
4,F00011,2022-03-22 07:29:00,F,25-29,Malay,Doctor,4,1,0,5,1,100-150,4,2,2,2,1,2,1,5,3,4,1,0,0,1,2,To prioritize skin health as part of my self-c...,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,1,1,0,0,0,0,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5141,F09991,2022-11-29 19:20:00,F,>35,Bumiputra Sabah (Bajau),Student,4,0,0,1,3,>200,5,3,3,5,4,1,1,5,4,5,5,0,1,0,3,To maintain healthy and youthful skin,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,1,1,1,0,0,0,0,0,0,0,1,0,0,0,1,1,0
5142,F09994,2022-11-29 20:22:00,M,>35,Arabian,Software Engineer,5,1,0,5,3,0-50,3,2,4,4,4,3,4,3,5,4,1,0,1,0,1,To achieve clear and healthy-looking skin,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,1,1,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,1,1,0,0,0,0,1,1,0,0,1,0,1
5143,F09996,2022-11-29 21:22:00,F,30-34,Sudan,Manager,3,1,0,4,2,100-150,1,5,4,3,2,2,3,3,2,3,3,1,0,0,5,To achieve a flawless makeup base through skin...,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,1,0,0,0,0,1,0
5144,F09999,2022-11-29 22:49:00,M,20-24,Arabian,Student,4,1,1,5,3,>200,5,1,2,5,4,4,5,5,1,4,2,0,0,0,3,To achieve a glowing complexion and reduce ble...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,1,1,0,1,0,1,0,1


In [93]:
df_cleaned.dtypes

form_id                          object
Timestamp                datetime64[ns]
Gender                           object
Age                            category
Race                             object
                              ...      
BW_pharmacy                       int64
BW_shopee                         int64
BW_online                         int64
BW_mall_or_dept_store             int64
BW_beauty_store                   int64
Length: 75, dtype: object

In [94]:
df_cleaned.dtypes[:30]

form_id                                                        object
Timestamp                                              datetime64[ns]
Gender                                                         object
Age                                                          category
Race                                                           object
Occupation                                                     object
scale_is_skincare_important                                     int32
is_using_any_skincare_products                                  int32
is_using_sample_before_buying_skincare                          int32
scale_buying_frequency                                          int32
scale_willing_try_skincare                                      int64
skincare_buying_cost_per_month                               category
scale_wasting_time_research                                     int64
scale_doubtful_abt_skincare_info                                int64
scale_difficult_to_u

In [95]:
df_cleaned.select_dtypes(include=['int32', 'int64'])

Unnamed: 0,scale_is_skincare_important,is_using_any_skincare_products,is_using_sample_before_buying_skincare,scale_buying_frequency,scale_willing_try_skincare,scale_wasting_time_research,scale_doubtful_abt_skincare_info,scale_difficult_to_understand_ingredients,scale_expensive_product_but_no_results,scale_unaware_of_best_ingredient,scale_allergy_on_new_product,scale_w_reduce_research_time,scale_w_easy_solutions_from_expert,scale_w_personalized_routine,scale_w_gain_easy_skincare_knowledge,scale_w_adapt_healthy_lifestyle_for_skin,is_thinking_technology_can_improve_skincare_routine,is_aware_of_ai,is_open_to_skin_scanning_app_to_customize_skincare,scale_is_excited_to_use_the_app,SC_acne_scars,SC_dry_and_dull_skin,SC_dry_skin,SC_fine_lines_and_wrinkles,SC_normal_skin,SC_oily_skin,SC_pigmentation,SC_sensitive_skin,SC_wrinkles,SC_acne_and_breakout,SC_not_priority,SC_tiny_bumps_on_forehead,SC_clogged_pores,SC_dark_skin,SC_eczema,SC_combination_skin,SC_uneven_skin,SC_brown_spots,SC_redness_and_sensitivity,IP_antioxidant,IP_cheap,IP_cruelty_free,IP_dye_free,IP_exfoliating,IP_hypoallergenic,IP_low_ph,IP_natural,IP_no_perfume_added,IP_petroleum_free,IP_vegan,IP_alcohol_free,IP_noncomedogenic,IP_oil_free,IP_fragrance,PR_social_media,PR_brand,PR_dermatologist_advice,PR_friends_or_relative_recs,PR_observation_or_review,PR_pricing,PR_product_ingr,PR_salesmen_recs,BW_pharmacy,BW_shopee,BW_online,BW_mall_or_dept_store,BW_beauty_store
0,4,1,0,3,4,3,3,2,3,2,2,4,3,4,3,5,1,1,1,5,0,1,0,0,0,0,1,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,1,0,0,0,1,1,0,0,0,0,1,0
1,5,1,1,5,4,4,5,5,1,2,4,5,2,4,4,1,1,1,0,1,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,1,0,0,0,1,1,0
2,4,1,0,2,4,1,2,4,2,3,5,4,1,2,4,2,1,1,0,2,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,1,1,0,0,0,1,0,0,0,1,0,0,0,0,1,1,0
3,2,1,1,4,4,3,5,5,5,3,2,4,3,2,3,4,1,0,0,4,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,1,0,0,1,0,1,0,0,0,1,0,1,0,1,1,1,0,0,0,1,1,0,0,0,0,0,0,1,1,0
4,4,1,0,5,1,4,2,2,2,1,2,1,5,3,4,1,0,0,1,2,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,1,1,0,0,0,0,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5141,4,0,0,1,3,5,3,3,5,4,1,1,5,4,5,5,0,1,0,3,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,1,1,1,0,0,0,0,0,0,0,1,0,0,0,1,1,0
5142,5,1,0,5,3,3,2,4,4,4,3,4,3,5,4,1,0,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,1,1,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,1,1,0,0,0,0,1,1,0,0,1,0,1
5143,3,1,0,4,2,1,5,4,3,2,2,3,3,2,3,3,1,0,0,5,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,1,0,0,0,0,1,0
5144,4,1,1,5,3,5,1,2,5,4,4,5,5,1,4,2,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,1,1,0,1,0,1,0,1


In [96]:
int_cols = df_cleaned.select_dtypes(include=['int32', 'int64']).columns.to_list()

df_cleaned.loc[:, df_cleaned.columns.isin(int_cols)]

Unnamed: 0,scale_is_skincare_important,is_using_any_skincare_products,is_using_sample_before_buying_skincare,scale_buying_frequency,scale_willing_try_skincare,scale_wasting_time_research,scale_doubtful_abt_skincare_info,scale_difficult_to_understand_ingredients,scale_expensive_product_but_no_results,scale_unaware_of_best_ingredient,scale_allergy_on_new_product,scale_w_reduce_research_time,scale_w_easy_solutions_from_expert,scale_w_personalized_routine,scale_w_gain_easy_skincare_knowledge,scale_w_adapt_healthy_lifestyle_for_skin,is_thinking_technology_can_improve_skincare_routine,is_aware_of_ai,is_open_to_skin_scanning_app_to_customize_skincare,scale_is_excited_to_use_the_app,SC_acne_scars,SC_dry_and_dull_skin,SC_dry_skin,SC_fine_lines_and_wrinkles,SC_normal_skin,SC_oily_skin,SC_pigmentation,SC_sensitive_skin,SC_wrinkles,SC_acne_and_breakout,SC_not_priority,SC_tiny_bumps_on_forehead,SC_clogged_pores,SC_dark_skin,SC_eczema,SC_combination_skin,SC_uneven_skin,SC_brown_spots,SC_redness_and_sensitivity,IP_antioxidant,IP_cheap,IP_cruelty_free,IP_dye_free,IP_exfoliating,IP_hypoallergenic,IP_low_ph,IP_natural,IP_no_perfume_added,IP_petroleum_free,IP_vegan,IP_alcohol_free,IP_noncomedogenic,IP_oil_free,IP_fragrance,PR_social_media,PR_brand,PR_dermatologist_advice,PR_friends_or_relative_recs,PR_observation_or_review,PR_pricing,PR_product_ingr,PR_salesmen_recs,BW_pharmacy,BW_shopee,BW_online,BW_mall_or_dept_store,BW_beauty_store
0,4,1,0,3,4,3,3,2,3,2,2,4,3,4,3,5,1,1,1,5,0,1,0,0,0,0,1,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,1,0,0,0,1,1,0,0,0,0,1,0
1,5,1,1,5,4,4,5,5,1,2,4,5,2,4,4,1,1,1,0,1,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,1,0,0,0,1,1,0
2,4,1,0,2,4,1,2,4,2,3,5,4,1,2,4,2,1,1,0,2,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,1,1,0,0,0,1,0,0,0,1,0,0,0,0,1,1,0
3,2,1,1,4,4,3,5,5,5,3,2,4,3,2,3,4,1,0,0,4,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,1,0,0,1,0,1,0,0,0,1,0,1,0,1,1,1,0,0,0,1,1,0,0,0,0,0,0,1,1,0
4,4,1,0,5,1,4,2,2,2,1,2,1,5,3,4,1,0,0,1,2,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,1,1,0,0,0,0,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5141,4,0,0,1,3,5,3,3,5,4,1,1,5,4,5,5,0,1,0,3,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,1,1,1,0,0,0,0,0,0,0,1,0,0,0,1,1,0
5142,5,1,0,5,3,3,2,4,4,4,3,4,3,5,4,1,0,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,1,1,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,1,1,0,0,0,0,1,1,0,0,1,0,1
5143,3,1,0,4,2,1,5,4,3,2,2,3,3,2,3,3,1,0,0,5,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,1,0,0,0,0,1,0
5144,4,1,1,5,3,5,1,2,5,4,4,5,5,1,4,2,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,1,1,0,1,0,1,0,1


In [97]:
df_cleaned.loc[:, df_cleaned.columns.isin(int_cols)].describe()

Unnamed: 0,scale_is_skincare_important,is_using_any_skincare_products,is_using_sample_before_buying_skincare,scale_buying_frequency,scale_willing_try_skincare,scale_wasting_time_research,scale_doubtful_abt_skincare_info,scale_difficult_to_understand_ingredients,scale_expensive_product_but_no_results,scale_unaware_of_best_ingredient,scale_allergy_on_new_product,scale_w_reduce_research_time,scale_w_easy_solutions_from_expert,scale_w_personalized_routine,scale_w_gain_easy_skincare_knowledge,scale_w_adapt_healthy_lifestyle_for_skin,is_thinking_technology_can_improve_skincare_routine,is_aware_of_ai,is_open_to_skin_scanning_app_to_customize_skincare,scale_is_excited_to_use_the_app,SC_acne_scars,SC_dry_and_dull_skin,SC_dry_skin,SC_fine_lines_and_wrinkles,SC_normal_skin,SC_oily_skin,SC_pigmentation,SC_sensitive_skin,SC_wrinkles,SC_acne_and_breakout,SC_not_priority,SC_tiny_bumps_on_forehead,SC_clogged_pores,SC_dark_skin,SC_eczema,SC_combination_skin,SC_uneven_skin,SC_brown_spots,SC_redness_and_sensitivity,IP_antioxidant,IP_cheap,IP_cruelty_free,IP_dye_free,IP_exfoliating,IP_hypoallergenic,IP_low_ph,IP_natural,IP_no_perfume_added,IP_petroleum_free,IP_vegan,IP_alcohol_free,IP_noncomedogenic,IP_oil_free,IP_fragrance,PR_social_media,PR_brand,PR_dermatologist_advice,PR_friends_or_relative_recs,PR_observation_or_review,PR_pricing,PR_product_ingr,PR_salesmen_recs,BW_pharmacy,BW_shopee,BW_online,BW_mall_or_dept_store,BW_beauty_store
count,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0,5146.0
mean,3.603381,0.720754,0.503498,2.872911,3.02604,3.005635,3.004469,2.984065,3.023319,3.002915,3.011854,2.998251,2.954333,2.990478,2.983871,3.013408,0.508939,0.500777,0.500194,2.990672,0.009133,0.274388,0.010299,0.057715,0.009522,0.222697,0.372911,0.197046,0.061407,0.399145,0.008745,0.008745,0.00855,0.009911,0.012243,0.419355,0.403809,0.409833,0.302176,0.019821,0.021376,0.039448,0.192188,0.084143,0.093859,0.018072,0.846483,0.021182,0.175087,0.061213,0.583754,0.43082,0.430237,0.252429,0.344928,0.575398,0.36747,0.472406,0.023125,0.438982,0.553634,0.092693,0.451419,0.045667,0.643412,0.544501,0.403031
std,1.018727,0.448672,0.500036,1.457055,1.422334,1.415919,1.413175,1.412886,1.402982,1.416065,1.410655,1.409463,1.414369,1.413425,1.416662,1.412844,0.499969,0.500048,0.500049,1.407018,0.09514,0.446249,0.100971,0.233226,0.097124,0.416097,0.483626,0.397806,0.240099,0.48977,0.093112,0.093112,0.092081,0.099067,0.109977,0.493501,0.490708,0.49185,0.459246,0.139399,0.144648,0.194677,0.394058,0.277629,0.291661,0.133226,0.36052,0.144003,0.380079,0.239743,0.492983,0.495239,0.495157,0.434448,0.475391,0.49433,0.482163,0.499286,0.150314,0.496311,0.497163,0.29003,0.497683,0.208781,0.479038,0.498064,0.490555
min,2.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,3.0,0.0,0.0,1.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,4.0,1.0,1.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,3.0,1.0,1.0,1.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,0.0
75%,4.0,1.0,1.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,4.0,1.0,1.0,1.0,4.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,0.0,1.0,0.0,1.0,1.0,1.0
max,5.0,1.0,1.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,1.0,1.0,1.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


In [98]:
## Saving data to Excel (format .xlsx) so that 
## we can create dashboard in another sheet
df_cleaned.to_excel(
    'SkincareSurveyData_cleaned_3.xlsx', 
    sheet_name='data', index=False)