# "Take Me Out" : A Further Analysis

Em​ma​nuella Pra​mudita Ru​manti (16520296)

Institut Teknologi Bandung


Dataset source: TakeMeOut

I started working on this on: 15 November 2020

In [128]:
# import libraries
import pandas as pd
import collections

# import TakeMeOut dataset
init_df = pd.read_csv('takemeout.csv')

## Data Preprocessing

### Overview

In [129]:
# print the first three items to get an overview
print(init_df.head(3))

                     Timestamp Siapa nama kamu? Cewek atau cowok nih?  \
0  2020/10/31 3:39:25 PM GMT+7  A**************                 Cowok   
1  2020/10/31 3:39:36 PM GMT+7            L****                 Cewek   
2  2020/10/31 3:39:38 PM GMT+7     Y***********                 Cowok   

   Seberapa penting quality time bareng calon pacar untuk kamu?  \
0                                                  5              
1                                                  5              
2                                                  4              

   Seberapa penting physical touch sama calon pacar untuk kamu?  \
0                                                  5              
1                                                  5              
2                                                  3              

   Seberapa penting word of affirmation dari calon pacar untuk kamu?  \
0                                                  4                   
1                        

### Indexing

To simplify things, column name lengths are reduced via renaming.

In [146]:
df = init_df.rename(columns={'Timestamp' : 'timestamp',
                             'Siapa nama kamu?' : 'nama',
                             'Cewek atau cowok nih?' : 'gender',
                             'Seberapa penting quality time bareng calon pacar untuk kamu?' : 'quality time',
                             'Seberapa penting physical touch sama calon pacar untuk kamu?' : 'physical touch',
                             'Seberapa penting word of affirmation dari calon pacar untuk kamu?' : 'affirmation',
                             'Seberapa penting dapet kado dari calon pacar untuk kamu?' : 'gifting',
                             'Seberapa penting bantuan dari calon pacar untuk kamu?' : 'service'})

print(df.head())

                     timestamp             nama gender  quality time  \
0  2020/10/31 3:39:25 PM GMT+7  A**************  Cowok             5   
1  2020/10/31 3:39:36 PM GMT+7            L****  Cewek             5   
2  2020/10/31 3:39:38 PM GMT+7     Y***********  Cowok             4   
3  2020/10/31 3:39:42 PM GMT+7             a***  Cowok             5   
4  2020/10/31 3:39:43 PM GMT+7            B****  Cowok             5   

   physical touch  affirmation  gifting  service  
0               5            4        1        3  
1               5            3        2        2  
2               3            4        4        4  
3               5            5        2        3  
4               5            5        2        4  


### Cleaning

Since the data for the timestamps, they are removed.

In [147]:
df = df.drop(['timestamp'], axis = 1)

print(df.head())

              nama gender  quality time  physical touch  affirmation  gifting  \
0  A**************  Cowok             5               5            4        1   
1            L****  Cewek             5               5            3        2   
2     Y***********  Cowok             4               3            4        4   
3             a***  Cowok             5               5            5        2   
4            B****  Cowok             5               5            5        2   

   service  
0        3  
1        2  
2        4  
3        3  
4        4  


## Data Observation

### Statistics

Statistics from the quantitative data (related to love language preference) from the dataset, at a glance.

In [69]:
print(df.describe())

       quality time  physical touch  affirmation     gifting     service
count    101.000000      101.000000   101.000000  101.000000  101.000000
mean       4.099010        3.168317     3.722772    2.732673    3.752475
std        1.212476        1.334686     1.209287    1.325829    1.143730
min        1.000000        1.000000     1.000000    1.000000    1.000000
25%        4.000000        2.000000     3.000000    2.000000    3.000000
50%        5.000000        3.000000     4.000000    3.000000    4.000000
75%        5.000000        4.000000     5.000000    4.000000    5.000000
max        5.000000        5.000000     5.000000    5.000000    5.000000


* **Quality time** has the **highest average** score.
* **Physical touch** has the **highest standard deviation**.

### Gender Ratio

In [151]:
print(df['gender'].value_counts())

print('The male:female ratio is '+ str(len(df[df['gender'] == 'Cowok'])/len(df[df['gender'] == 'Cewek'])) + ':1')

Cowok    81
Cewek    20
Name: gender, dtype: int64
The male:female ratio is 4.05:1


## Data Analysis

### Five Stars
The number of people who gave a score of 5.

In [161]:
for i in range(2, 7):
    print(df.iloc[:, i].name, ':', len(df[df.iloc[:, i] == 5]))
    
bintang5 = df[(df['quality time'] == 5) | (df['physical touch'] == 5) | (df['affirmation'] == 5) | (df['gifting'] == 5) | (df['service'] == 5)]

print('people who gave a 5 for at least one of the love languages :', len(bintang5), '(', round(len(bintang5)/len(df)*100, 2), '% )')
print('people who did not give a 5 at all :', len(df) - len(bintang5))

quality time : 51
physical touch : 22
affirmation : 31
gifting : 14
service : 30
people who gave a 5 for at least one of the love languages : 73 ( 72.28 % )
people who did not give a 5 at all : 28


* **Quality time** received the **most 5's**, followed by affirmation.
* **Gifting** the **least** amount of 5's

### One Star

In [164]:
for i in range(2, 7):
    print(df.iloc[:, i].name, ':', len(df[df.iloc[:, i] == 1]))
    
bintang1 = df[(df['quality time'] == 1) | (df['physical touch'] == 1) | (df['affirmation'] == 1) | (df['gifting'] == 1) | (df['service'] == 1)]

print('people who gave a score 1 for at least one of the love languages :', len(bintang1))
print('people who did not give a 1 at all :', len(df) - len(bintang1))

quality time : 9
physical touch : 16
affirmation : 9
gifting : 23
service : 7
people who gave a score 1 for at least one of the love languages : 28
people who did not give a 1 at all : 73


* **Gifting** earned the **most 1's**, followed by physical touch.

### Relation to Gender

In [156]:
f_bintang5 = len(bintang5[bintang5['gender'] == 'Cewek'])/len(df[df['gender'] == 'Cewek'])
m_bintang5 = len(bintang5[bintang5['gender'] == 'Cowok'])/len(df[df['gender'] == 'Cowok'])

print('Percentage of women who gave at least one score of 5 :', round(f_bintang5, 3) * 100, '%')
print('Percentage of men who gave at least one score of 5 :', round(m_bintang5, 3) * 100, '%')

Percentage of women who gave at least one score of 5 : 65.0 %
Percentage of men who gave at least one score of 5 : 74.1 %


In [157]:
f_bintang1 = len(bintang1[bintang1['gender'] == 'Cewek'])/len(df[df['gender'] == 'Cewek'])
m_bintang1 = len(bintang1[bintang1['gender'] == 'Cowok'])/len(df[df['gender'] == 'Cowok'])

print('Percentage of women who gave at least one score of 1 :', round(f_bintang1, 3) * 100, '%')
print('Percentage of men who gave at least one score of 1 :', round(m_bintang1, 3) * 100, '%')

Percentage of women who gave at least one score of 1 : 40.0 %
Percentage of men who gave at least one score of 1 : 24.7 %


* Percentage-wise
    * **more men** gave **5's**
    * **more women** gave **1's**
    * for **both genders**, the **majority** gave at least one 5 
    
### Love Languages by Popularity
Popularity defined as the total sum of its scores.

In [159]:
f_sum = collections.OrderedDict({
    'quality time': df[df['gender'] == 'Cewek']['quality time'].sum(),
    'physical touch': df[df['gender'] == 'Cewek']['physical touch'].sum(),
    'affirmation': df[df['gender'] == 'Cewek']['affirmation'].sum(),
    'gifting': df[df['gender'] == 'Cewek']['gifting'].sum(),
    'service': df[df['gender'] == 'Cewek']['service'].sum()
})

print('Total scores for women, in descending order:')
for k in f_sum:
    print(k, ':', f_sum[k])
    
print('\n')

m_sum = collections.OrderedDict({
    'quality time': df[df['gender'] == 'Cowok']['quality time'].sum(),
    'physical touch': df[df['gender'] == 'Cowok']['physical touch'].sum(),
    'affirmation': df[df['gender'] == 'Cowok']['affirmation'].sum(),
    'gifting': df[df['gender'] == 'Cowok']['gifting'].sum(),
    'service': df[df['gender'] == 'Cowok']['service'].sum()
})

print('Total scores for men, in descending order:')
for k in m_sum:
    print(k, ':', m_sum[k])

Total scores for women, in descending order:
quality time : 75
physical touch : 53
affirmation : 66
gifting : 54
service : 74


Total scores for men, in descending order:
quality time : 339
physical touch : 267
affirmation : 310
gifting : 222
service : 305


* The **rankings** are **exactly the same** for both men and women.
* The rankings are as follows:
    1. quality time
    2. physical touch
    3. affirmation
    4. gifting
    5. service
    
    
## Conclusion

* The male:female ratio for the respondents in this dataset is 4:1.
* The majority of the respondents (72%) gave at least one of the five love languages a score of 5/5
* Quality time received the most 5's
* Gifting received the least 5's
* Percentage-wise, more men (74%) gave at least one 5 compared to women (65%)
* Percentage-wise, more women (40%) gave at least one 1 compared to men (25%)
* The rankings of love languages by popularity are exactly the same for both men and women
    1. quality time
    2. physical touch
    3. affirmation
    4. gifting
    5. service