#Study Understanding

##Study Objective

Proyek ini bertujuan untuk memahami bagaimana penggunaan media sosial mempengaruhi kesehatan mental, termasuk tingkat stres, kecemasan, dan depresi. Analisis ini akan mengeksplorasi hubungan antara durasi penggunaan media sosial, jenis platform yang digunakan, serta dampaknya terhadap kondisi psikologis individu.

## Assess Situation

Media sosial telah menjadi bagian besar dari kehidupan modern, tetapi dampaknya terhadap kesehatan mental masih menjadi perdebatan. Beberapa penelitian menunjukkan bahwa penggunaan media sosial yang berlebihan dapat meningkatkan risiko gangguan mental, sementara yang lain menemukan bahwa platform ini juga dapat memberikan dukungan sosial.  

Dataset yang digunakan adalah **Social Media and Mental Health**, yang berisi data mengenai kebiasaan penggunaan media sosial serta kondisi kesehatan mental responden. Analisis akan fokus pada bagaimana frekuensi penggunaan, jenis interaksi, dan faktor lainnya berhubungan dengan kesehatan mental.  

## Data Mining Goals  

- Mengidentifikasi pola hubungan antara intensitas penggunaan media sosial dan tingkat kesehatan mental.  
- Menemukan segmentasi pengguna berdasarkan durasi dan cara mereka menggunakan media sosial.  
- Mengeksplorasi faktor-faktor yang memperburuk atau memperbaiki dampak media sosial terhadap kesehatan mental.  
- Menentukan apakah ada hubungan antara jenis aktivitas di media sosial (scrolling pasif, interaksi aktif, dll.) dengan kesehatan mental.

## Project Plan

### Data Understanding  
- Mengeksplorasi dataset **Social Media and Mental Health** untuk memahami struktur data dan variabel yang tersedia.  
- Menganalisis distribusi data, tipe data, serta mengidentifikasi potensi permasalahan seperti missing values atau outlier.  
- Menentukan variabel utama yang akan dianalisis, seperti durasi penggunaan media sosial, tingkat interaksi, dan dampaknya terhadap kesehatan mental.  

### Data Preparation  
- Membersihkan data, menangani nilai yang hilang, serta melakukan transformasi atau encoding yang diperlukan.  
- Mengelompokkan data untuk keperluan analisis lebih lanjut, seperti segmentasi pengguna berdasarkan pola penggunaan media sosial.  

### Visualisasi  
- Membuat visualisasi untuk memahami pola hubungan antara variabel yang dianalisis.  
- Menggunakan grafik, heatmap, dan diagram hubungan untuk menampilkan korelasi antara durasi penggunaan media sosial dengan tingkat stres, kecemasan, dan depresi.  

### Dashboard  
- Membangun dashboard interaktif untuk menyajikan hasil utama dari analisis.  
- Dashboard akan menampilkan insight utama terkait dampak media sosial terhadap kesehatan mental dalam bentuk grafik dan metrik yang mudah dipahami.  

### Insight & Action  
- Menarik kesimpulan dari hasil analisis mengenai bagaimana media sosial mempengaruhi kesehatan mental.  
- Memberikan rekomendasi berbasis data, seperti pola penggunaan media sosial yang lebih sehat atau strategi untuk mengurangi dampak negatif dari media sosial terhadap kesejahteraan mental.

# DATA UNDERSTANDING

"Scroll tanpa Kontrol: bagaimana media sosial mempengaruhi kesehatan mental?"

Dataset yang saya gunakan dari kaggle.com

https://www.kaggle.com/datasets/souvikahmed071/social-media-and-mental-health

In [None]:
import pandas as pd

file = "/content/smmh.csv"

df = pd.read_csv(file)
df

Unnamed: 0,Timestamp,1. What is your age?,2. Gender,3. Relationship Status,4. Occupation Status,5. What type of organizations are you affiliated with?,6. Do you use social media?,7. What social media platforms do you commonly use?,8. What is the average time you spend on social media every day?,9. How often do you find yourself using Social media without a specific purpose?,...,11. Do you feel restless if you haven't used Social media in a while?,"12. On a scale of 1 to 5, how easily distracted are you?","13. On a scale of 1 to 5, how much are you bothered by worries?",14. Do you find it difficult to concentrate on things?,"15. On a scale of 1-5, how often do you compare yourself to other successful people through the use of social media?","16. Following the previous question, how do you feel about these comparisons, generally speaking?",17. How often do you look to seek validation from features of social media?,18. How often do you feel depressed or down?,"19. On a scale of 1 to 5, how frequently does your interest in daily activities fluctuate?","20. On a scale of 1 to 5, how often do you face issues regarding sleep?"
0,4/18/2022 19:18:47,21.0,Male,In a relationship,University Student,University,Yes,"Facebook, Twitter, Instagram, YouTube, Discord...",Between 2 and 3 hours,5,...,2,5,2,5,2,3,2,5,4,5
1,4/18/2022 19:19:28,21.0,Female,Single,University Student,University,Yes,"Facebook, Twitter, Instagram, YouTube, Discord...",More than 5 hours,4,...,2,4,5,4,5,1,1,5,4,5
2,4/18/2022 19:25:59,21.0,Female,Single,University Student,University,Yes,"Facebook, Instagram, YouTube, Pinterest",Between 3 and 4 hours,3,...,1,2,5,4,3,3,1,4,2,5
3,4/18/2022 19:29:43,21.0,Female,Single,University Student,University,Yes,"Facebook, Instagram",More than 5 hours,4,...,1,3,5,3,5,1,2,4,3,2
4,4/18/2022 19:33:31,21.0,Female,Single,University Student,University,Yes,"Facebook, Instagram, YouTube",Between 2 and 3 hours,3,...,4,4,5,5,3,3,3,4,4,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
476,5/21/2022 23:38:28,24.0,Male,Single,Salaried Worker,"University, Private",Yes,"Facebook, Instagram, YouTube",Between 2 and 3 hours,3,...,3,4,3,4,4,2,4,3,4,4
477,5/22/2022 0:01:05,26.0,Female,Married,Salaried Worker,University,Yes,"Facebook, YouTube",Between 1 and 2 hours,2,...,2,3,4,4,4,2,4,4,4,1
478,5/22/2022 10:29:21,29.0,Female,Married,Salaried Worker,University,Yes,"Facebook, YouTube",Between 2 and 3 hours,3,...,4,3,2,3,3,3,4,2,2,2
479,7/14/2022 19:33:47,21.0,Male,Single,University Student,University,Yes,"Facebook, Twitter, Instagram, YouTube, Discord...",Between 2 and 3 hours,2,...,2,3,3,2,2,3,4,4,5,4


In [None]:
# Cek jumlah baris & kolom
df.shape

(481, 21)

Diatas adadalah hasil dari df.shape dengan baris 481 dan kolom 21

In [None]:
df.dtypes

Unnamed: 0,0
Timestamp,object
1. What is your age?,float64
2. Gender,object
3. Relationship Status,object
4. Occupation Status,object
5. What type of organizations are you affiliated with?,object
6. Do you use social media?,object
7. What social media platforms do you commonly use?,object
8. What is the average time you spend on social media every day?,object
9. How often do you find yourself using Social media without a specific purpose?,int64


In [None]:
df.describe()

Unnamed: 0,1. What is your age?,9. How often do you find yourself using Social media without a specific purpose?,10. How often do you get distracted by Social media when you are busy doing something?,11. Do you feel restless if you haven't used Social media in a while?,"12. On a scale of 1 to 5, how easily distracted are you?","13. On a scale of 1 to 5, how much are you bothered by worries?",14. Do you find it difficult to concentrate on things?,"15. On a scale of 1-5, how often do you compare yourself to other successful people through the use of social media?","16. Following the previous question, how do you feel about these comparisons, generally speaking?",17. How often do you look to seek validation from features of social media?,18. How often do you feel depressed or down?,"19. On a scale of 1 to 5, how frequently does your interest in daily activities fluctuate?","20. On a scale of 1 to 5, how often do you face issues regarding sleep?"
count,481.0,481.0,481.0,481.0,481.0,481.0,481.0,481.0,481.0,481.0,481.0,481.0,481.0
mean,26.13659,3.553015,3.320166,2.588358,3.349272,3.559252,3.245322,2.831601,2.775468,2.455301,3.255717,3.170478,3.201663
std,9.91511,1.096299,1.328137,1.257059,1.175552,1.283356,1.347105,1.407835,1.056479,1.247739,1.313033,1.256666,1.461619
min,13.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
25%,21.0,3.0,2.0,2.0,3.0,3.0,2.0,2.0,2.0,1.0,2.0,2.0,2.0
50%,22.0,4.0,3.0,2.0,3.0,4.0,3.0,3.0,3.0,2.0,3.0,3.0,3.0
75%,26.0,4.0,4.0,3.0,4.0,5.0,4.0,4.0,3.0,3.0,4.0,4.0,5.0
max,91.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0


Diatas adalah Statistik Deskriptif dari dataset yang saya gunakan.

# DATA PREPARATION

"Scroll tanpa Kontrol: bagaimana media sosial mempengaruhi kesehatan mental?"

Dataset yang saya gunakan dari kaggle.com

https://www.kaggle.com/datasets/souvikahmed071/social-media-and-mental-health

In [None]:
import pandas as pd

file = "/content/smmh.csv"

df = pd.read_csv(file)
df

Unnamed: 0,Timestamp,1. What is your age?,2. Gender,3. Relationship Status,4. Occupation Status,5. What type of organizations are you affiliated with?,6. Do you use social media?,7. What social media platforms do you commonly use?,8. What is the average time you spend on social media every day?,9. How often do you find yourself using Social media without a specific purpose?,...,11. Do you feel restless if you haven't used Social media in a while?,"12. On a scale of 1 to 5, how easily distracted are you?","13. On a scale of 1 to 5, how much are you bothered by worries?",14. Do you find it difficult to concentrate on things?,"15. On a scale of 1-5, how often do you compare yourself to other successful people through the use of social media?","16. Following the previous question, how do you feel about these comparisons, generally speaking?",17. How often do you look to seek validation from features of social media?,18. How often do you feel depressed or down?,"19. On a scale of 1 to 5, how frequently does your interest in daily activities fluctuate?","20. On a scale of 1 to 5, how often do you face issues regarding sleep?"
0,4/18/2022 19:18:47,21.0,Male,In a relationship,University Student,University,Yes,"Facebook, Twitter, Instagram, YouTube, Discord...",Between 2 and 3 hours,5,...,2,5,2,5,2,3,2,5,4,5
1,4/18/2022 19:19:28,21.0,Female,Single,University Student,University,Yes,"Facebook, Twitter, Instagram, YouTube, Discord...",More than 5 hours,4,...,2,4,5,4,5,1,1,5,4,5
2,4/18/2022 19:25:59,21.0,Female,Single,University Student,University,Yes,"Facebook, Instagram, YouTube, Pinterest",Between 3 and 4 hours,3,...,1,2,5,4,3,3,1,4,2,5
3,4/18/2022 19:29:43,21.0,Female,Single,University Student,University,Yes,"Facebook, Instagram",More than 5 hours,4,...,1,3,5,3,5,1,2,4,3,2
4,4/18/2022 19:33:31,21.0,Female,Single,University Student,University,Yes,"Facebook, Instagram, YouTube",Between 2 and 3 hours,3,...,4,4,5,5,3,3,3,4,4,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
476,5/21/2022 23:38:28,24.0,Male,Single,Salaried Worker,"University, Private",Yes,"Facebook, Instagram, YouTube",Between 2 and 3 hours,3,...,3,4,3,4,4,2,4,3,4,4
477,5/22/2022 0:01:05,26.0,Female,Married,Salaried Worker,University,Yes,"Facebook, YouTube",Between 1 and 2 hours,2,...,2,3,4,4,4,2,4,4,4,1
478,5/22/2022 10:29:21,29.0,Female,Married,Salaried Worker,University,Yes,"Facebook, YouTube",Between 2 and 3 hours,3,...,4,3,2,3,3,3,4,2,2,2
479,7/14/2022 19:33:47,21.0,Male,Single,University Student,University,Yes,"Facebook, Twitter, Instagram, YouTube, Discord...",Between 2 and 3 hours,2,...,2,3,3,2,2,3,4,4,5,4


1. Missing Values

In [None]:
print((df.isna().sum() / len(df)) * 100)

Timestamp                                                                                                               0.000000
1. What is your age?                                                                                                    0.000000
2. Gender                                                                                                               0.000000
3. Relationship Status                                                                                                  0.000000
4. Occupation Status                                                                                                    0.000000
5. What type of organizations are you affiliated with?                                                                  6.237006
6. Do you use social media?                                                                                             0.000000
7. What social media platforms do you commonly use?                                              

2. Duplikat

In [None]:
df[df.duplicated()]

Unnamed: 0,Timestamp,1. What is your age?,2. Gender,3. Relationship Status,4. Occupation Status,5. What type of organizations are you affiliated with?,6. Do you use social media?,7. What social media platforms do you commonly use?,8. What is the average time you spend on social media every day?,9. How often do you find yourself using Social media without a specific purpose?,...,11. Do you feel restless if you haven't used Social media in a while?,"12. On a scale of 1 to 5, how easily distracted are you?","13. On a scale of 1 to 5, how much are you bothered by worries?",14. Do you find it difficult to concentrate on things?,"15. On a scale of 1-5, how often do you compare yourself to other successful people through the use of social media?","16. Following the previous question, how do you feel about these comparisons, generally speaking?",17. How often do you look to seek validation from features of social media?,18. How often do you feel depressed or down?,"19. On a scale of 1 to 5, how frequently does your interest in daily activities fluctuate?","20. On a scale of 1 to 5, how often do you face issues regarding sleep?"


In [None]:
df.columns

Index(['Timestamp', '1. What is your age?', '2. Gender',
       '3. Relationship Status', '4. Occupation Status',
       '5. What type of organizations are you affiliated with?',
       '6. Do you use social media?',
       '7. What social media platforms do you commonly use?',
       '8. What is the average time you spend on social media every day?',
       '9. How often do you find yourself using Social media without a specific purpose?',
       '10. How often do you get distracted by Social media when you are busy doing something?',
       '11. Do you feel restless if you haven't used Social media in a while?',
       '12. On a scale of 1 to 5, how easily distracted are you?',
       '13. On a scale of 1 to 5, how much are you bothered by worries?',
       '14. Do you find it difficult to concentrate on things?',
       '15. On a scale of 1-5, how often do you compare yourself to other successful people through the use of social media?',
       '16. Following the previous question, 

In [None]:
df.dtypes

Unnamed: 0,0
Timestamp,object
1. What is your age?,float64
2. Gender,object
3. Relationship Status,object
4. Occupation Status,object
5. What type of organizations are you affiliated with?,object
6. Do you use social media?,object
7. What social media platforms do you commonly use?,object
8. What is the average time you spend on social media every day?,object
9. How often do you find yourself using Social media without a specific purpose?,int64


3. Mengubah Data

In [None]:
results = []

cols = df.select_dtypes(include=['float64'])

for col in cols:
  q1 = df[col].quantile(0.25)
  q3 = df[col].quantile(0.75)
  iqr = q3 - q1
  lower_bound = q1 - 1.5*iqr
  upper_bound = q3 + 1.5*iqr
  outliers = df[(df[col] < lower_bound) | (df[col] > upper_bound)]
  percent_outliers = (len(outliers)/len(df))*100
  results.append({'Kolom': col, 'Persentase Outliers': percent_outliers})

# Dataframe dari list hasil
results_df = pd.DataFrame(results)
results_df.set_index('Kolom', inplace=True)
results_df = results_df.rename_axis(None, axis=0).rename_axis('Kolom', axis=1)

# Tampilkan dataframe
display(results_df)

Kolom,Persentase Outliers
1. What is your age?,17.463617


In [None]:
columns_to_impute = ["1. What is your age?"]

for col in columns_to_impute:
    Q1 = df[col].quantile(0.25)
    Q3 = df[col].quantile(0.75)
    IQR = Q3 - Q1
    lower_bound = Q1 - 1.5 * IQR
    upper_bound = Q3 + 1.5 * IQR

    # Menggunakan .loc[] agar tidak muncul SettingWithCopyWarning
    df.loc[:, col] = df[col].clip(lower=lower_bound, upper=upper_bound)

In [None]:
columns_to_impute = ["9. How often do you find yourself using Social media without a specific purpose?"]

for col in columns_to_impute:
    Q1 = df[col].quantile(0.25)
    Q3 = df[col].quantile(0.75)
    IQR = Q3 - Q1
    lower_bound = Q1 - 1.5 * IQR
    upper_bound = Q3 + 1.5 * IQR

    # Menggunakan .loc[] agar tidak muncul SettingWithCopyWarning
    df.loc[:, col] = df[col].clip(lower=lower_bound, upper=upper_bound)

 3.  2.  4.  3.  3.  3.  3.  4.  4.  2.  3.  4.  4.  3.  3.  4.  5.  3.
 5.  2.  4.  4.  1.5 3.  1.5 3.  5.  4.  5.  4.  3.  3.  4.  5.  3.  3.
 4.  2.  3.  3.  4.  1.5 4.  2.  4.  3.  2.  3.  5.  2.  4.  4.  2.  5.
 4.  5.  4.  2.  4.  4.  4.  2.  4.  4.  5.  3.  4.  4.  3.  4.  5.  4.
 3.  4.  4.  4.  4.  5.  5.  4.  4.  4.  3.  5.  3.  4.  2.  2.  3.  1.5
 3.  4.  3.  4.  4.  2.  2.  5.  5.  5.  5.  4.  2.  4.  4.  4.  5.  4.
 1.5 3.  5.  3.  2.  4.  3.  3.  2.  5.  3.  3.  4.  4.  3.  4.  4.  3.
 4.  4.  4.  3.  3.  4.  3.  5.  3.  1.5 5.  2.  1.5 3.  3.  5.  4.  3.
 4.  3.  3.  2.  5.  2.  2.  2.  5.  4.  5.  4.  4.  4.  3.  5.  3.  3.
 5.  4.  3.  3.  5.  3.  4.  4.  5.  5.  3.  3.  3.  5.  4.  3.  5.  3.
 2.  3.  3.  4.  5.  4.  5.  3.  4.  5.  3.  2.  3.  5.  5.  1.5 3.  4.
 1.5 4.  3.  3.  4.  4.  4.  3.  3.  2.  5.  5.  2.  4.  2.  3.  5.  2.
 4.  5.  2.  5.  5.  3.  4.  3.  3.  5.  3.  5.  4.  4.  2.  5.  2.  5.
 4.  5.  4.  4.  2.  4.  3.  2.  4.  3.  2.  4.  5.  3.  4.  3.

In [None]:
columns_to_impute = ["11. Do you feel restless if you haven't used Social media in a while?"]

for col in columns_to_impute:
    Q1 = df[col].quantile(0.25)
    Q3 = df[col].quantile(0.75)
    IQR = Q3 - Q1
    lower_bound = Q1 - 1.5 * IQR
    upper_bound = Q3 + 1.5 * IQR

    # Menggunakan .loc[] agar tidak muncul SettingWithCopyWarning
    df.loc[:, col] = df[col].clip(lower=lower_bound, upper=upper_bound)

 2.  3.  2.  4.  3.  2.  4.5 3.  3.  4.  3.  4.5 3.  3.  3.  3.  4.5 4.5
 4.  1.  4.5 2.  1.  2.  1.  4.  1.  4.  2.  1.  3.  1.  3.  4.5 3.  1.
 1.  4.5 1.  2.  2.  1.  3.  1.  3.  1.  1.  4.  4.5 2.  2.  3.  3.  4.5
 2.  1.  3.  1.  3.  4.  3.  4.  4.  2.  3.  4.5 3.  2.  3.  2.  2.  4.
 2.  4.  3.  3.  3.  1.  2.  1.  4.  2.  3.  3.  3.  4.  4.  4.  2.  1.
 1.  3.  4.  4.  4.  3.  4.  4.  4.5 4.5 4.5 3.  1.  1.  4.5 4.  3.  3.
 3.  4.  1.  2.  1.  2.  2.  3.  1.  3.  3.  1.  2.  4.  2.  2.  1.  1.
 4.  3.  3.  3.  3.  2.  1.  4.  2.  2.  1.  1.  2.  1.  2.  4.  1.  1.
 4.5 3.  3.  2.  1.  2.  1.  2.  4.5 4.  2.  4.  4.5 3.  1.  3.  4.  4.5
 4.  2.  1.  4.5 4.  3.  2.  2.  4.  4.5 2.  3.  2.  3.  4.  1.  2.  3.
 2.  3.  2.  2.  3.  3.  4.5 3.  4.  4.5 1.  1.  3.  1.  2.  1.  1.  1.
 1.  1.  2.  3.  4.  1.  1.  3.  4.  3.  4.5 3.  1.  3.  1.  1.  1.  1.
 3.  4.  1.  3.  4.5 2.  1.  2.  1.  2.  2.  3.  1.  1.  1.  3.  1.  3.
 2.  2.  1.  3.  4.  2.  2.  4.  1.  2.  2.  3.  4.5 3.  1.  

In [None]:
columns_to_impute = ["16. Following the previous question, how do you feel about these comparisons, generally speaking?"]

for col in columns_to_impute:
    Q1 = df[col].quantile(0.25)
    Q3 = df[col].quantile(0.75)
    IQR = Q3 - Q1
    lower_bound = Q1 - 1.5 * IQR
    upper_bound = Q3 + 1.5 * IQR

    # Menggunakan .loc[] agar tidak muncul SettingWithCopyWarning
    df.loc[:, col] = df[col].clip(lower=lower_bound, upper=upper_bound)

 3.  4.5 2.  2.  3.  2.  1.  4.  3.  4.  3.  4.  2.  2.  3.  2.  3.  3.
 3.  3.  3.  3.  1.  4.  3.  2.  3.  3.  4.  2.  3.  2.  3.  2.  4.5 3.
 3.  3.  3.  3.  3.  3.  2.  2.  3.  3.  3.  3.  2.  3.  4.  2.  1.  4.5
 2.  3.  2.  1.  3.  1.  3.  2.  3.  3.  3.  4.  2.  2.  2.  4.  3.  2.
 3.  3.  3.  2.  3.  1.  3.  2.  3.  1.  2.  1.  3.  3.  4.  4.  3.  3.
 1.  3.  3.  4.  3.  3.  1.  3.  1.  3.  4.5 3.  3.  2.  3.  1.  4.5 2.
 3.  1.  1.  4.  4.  2.  4.  1.  4.  3.  3.  1.  3.  4.  4.  2.  3.  3.
 2.  2.  1.  2.  3.  4.  4.  4.5 3.  1.  3.  3.  3.  1.  2.  4.5 2.  3.
 3.  3.  4.5 3.  1.  4.5 3.  3.  3.  2.  2.  3.  4.  2.  3.  3.  2.  2.
 2.  3.  3.  4.  3.  2.  3.  3.  3.  4.5 3.  4.5 2.  3.  3.  3.  4.  4.
 2.  3.  2.  4.  3.  2.  3.  2.  2.  4.  1.  4.  4.  3.  4.  1.  3.  3.
 4.5 4.  4.  4.5 2.  2.  2.  3.  4.  3.  1.  3.  2.  3.  4.  1.  4.5 1.
 2.  4.  3.  1.  4.5 1.  1.  4.  3.  3.  4.  2.  2.  4.5 3.  4.  1.  3.
 3.  3.  4.  1.  1.  2.  3.  4.  3.  3.  3.  3.  3.  3.  4.  3.

In [None]:
columns_to_impute = ["12. On a scale of 1 to 5, how easily distracted are you?"]
for col in columns_to_impute:
    Q1 = df[col].quantile(0.25)
    Q3 = df[col].quantile(0.75)
    IQR = Q3 - Q1
    lower_bound = Q1 - 1.5 * IQR
    upper_bound = Q3 + 1.5 * IQR

    # Menggunakan .loc[] agar tidak muncul SettingWithCopyWarning
    df.loc[:, col] = df[col].clip(lower=lower_bound, upper=upper_bound)

 4.  3.  3.  2.  3.  4.  5.  3.  3.  4.  4.  5.  3.  3.  3.  3.  5.  4.
 3.  1.5 4.  4.  1.5 3.  1.5 4.  3.  4.  2.  3.  3.  3.  4.  5.  5.  3.
 2.  4.  2.  2.  4.  2.  4.  3.  5.  3.  5.  3.  5.  3.  3.  3.  2.  5.
 2.  3.  3.  3.  4.  4.  3.  3.  5.  5.  5.  5.  4.  3.  4.  5.  2.  5.
 4.  5.  4.  4.  2.  3.  4.  2.  4.  5.  4.  4.  3.  3.  3.  3.  4.  1.5
 3.  3.  4.  4.  4.  3.  4.  5.  5.  4.  5.  4.  3.  2.  5.  5.  3.  5.
 3.  3.  2.  3.  3.  3.  3.  2.  2.  3.  3.  3.  3.  5.  2.  5.  2.  1.5
 3.  3.  4.  4.  3.  2.  1.5 3.  3.  3.  2.  5.  5.  2.  2.  3.  2.  1.5
 5.  3.  5.  3.  2.  3.  2.  4.  4.  4.  3.  4.  5.  2.  1.5 4.  5.  2.
 3.  3.  2.  4.  5.  3.  4.  3.  3.  5.  2.  3.  4.  3.  4.  1.5 2.  2.
 2.  2.  5.  2.  3.  5.  5.  2.  5.  5.  1.5 2.  3.  1.5 3.  1.5 3.  1.5
 1.5 3.  3.  3.  5.  3.  3.  3.  4.  3.  3.  5.  1.5 4.  2.  1.5 1.5 2.
 5.  4.  2.  5.  5.  5.  5.  5.  3.  3.  4.  4.  2.  3.  1.5 3.  1.5 5.
 5.  3.  3.  4.  4.  1.5 3.  2.  2.  3.  2.  4.  4.  3.  3. 

In [None]:
results = []

cols = df.select_dtypes(include=['float64', 'int64'])

for col in cols:
  q1 = df[col].quantile(0.25)
  q3 = df[col].quantile(0.75)
  iqr = q3 - q1
  lower_bound = q1 - 1.5*iqr
  upper_bound = q3 + 1.5*iqr
  outliers = df[(df[col] < lower_bound) | (df[col] > upper_bound)]
  percent_outliers = (len(outliers)/len(df))*100
  results.append({'Kolom': col, 'Persentase Outliers': percent_outliers})

# Dataframe dari list hasil
results_df = pd.DataFrame(results)
results_df.set_index('Kolom', inplace=True)
results_df = results_df.rename_axis(None, axis=0).rename_axis('Kolom', axis=1)

# Tampilkan dataframe
display(results_df)

Kolom,Persentase Outliers
1. What is your age?,0.0
9. How often do you find yourself using Social media without a specific purpose?,0.0
10. How often do you get distracted by Social media when you are busy doing something?,0.0
11. Do you feel restless if you haven't used Social media in a while?,0.0
"12. On a scale of 1 to 5, how easily distracted are you?",0.0
"13. On a scale of 1 to 5, how much are you bothered by worries?",0.0
14. Do you find it difficult to concentrate on things?,0.0
"15. On a scale of 1-5, how often do you compare yourself to other successful people through the use of social media?",0.0
"16. Following the previous question, how do you feel about these comparisons, generally speaking?",0.0
17. How often do you look to seek validation from features of social media?,0.0


In [None]:
df

Unnamed: 0,Timestamp,1. What is your age?,2. Gender,3. Relationship Status,4. Occupation Status,5. What type of organizations are you affiliated with?,6. Do you use social media?,7. What social media platforms do you commonly use?,8. What is the average time you spend on social media every day?,9. How often do you find yourself using Social media without a specific purpose?,...,11. Do you feel restless if you haven't used Social media in a while?,"12. On a scale of 1 to 5, how easily distracted are you?","13. On a scale of 1 to 5, how much are you bothered by worries?",14. Do you find it difficult to concentrate on things?,"15. On a scale of 1-5, how often do you compare yourself to other successful people through the use of social media?","16. Following the previous question, how do you feel about these comparisons, generally speaking?",17. How often do you look to seek validation from features of social media?,18. How often do you feel depressed or down?,"19. On a scale of 1 to 5, how frequently does your interest in daily activities fluctuate?","20. On a scale of 1 to 5, how often do you face issues regarding sleep?"
0,4/18/2022 19:18:47,21.0,Male,In a relationship,University Student,University,Yes,"Facebook, Twitter, Instagram, YouTube, Discord...",Between 2 and 3 hours,5.0,...,2.0,5.0,2,5,2,3.0,2,5,4,5
1,4/18/2022 19:19:28,21.0,Female,Single,University Student,University,Yes,"Facebook, Twitter, Instagram, YouTube, Discord...",More than 5 hours,4.0,...,2.0,4.0,5,4,5,1.0,1,5,4,5
2,4/18/2022 19:25:59,21.0,Female,Single,University Student,University,Yes,"Facebook, Instagram, YouTube, Pinterest",Between 3 and 4 hours,3.0,...,1.0,2.0,5,4,3,3.0,1,4,2,5
3,4/18/2022 19:29:43,21.0,Female,Single,University Student,University,Yes,"Facebook, Instagram",More than 5 hours,4.0,...,1.0,3.0,5,3,5,1.0,2,4,3,2
4,4/18/2022 19:33:31,21.0,Female,Single,University Student,University,Yes,"Facebook, Instagram, YouTube",Between 2 and 3 hours,3.0,...,4.0,4.0,5,5,3,3.0,3,4,4,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
476,5/21/2022 23:38:28,24.0,Male,Single,Salaried Worker,"University, Private",Yes,"Facebook, Instagram, YouTube",Between 2 and 3 hours,3.0,...,3.0,4.0,3,4,4,2.0,4,3,4,4
477,5/22/2022 0:01:05,26.0,Female,Married,Salaried Worker,University,Yes,"Facebook, YouTube",Between 1 and 2 hours,2.0,...,2.0,3.0,4,4,4,2.0,4,4,4,1
478,5/22/2022 10:29:21,29.0,Female,Married,Salaried Worker,University,Yes,"Facebook, YouTube",Between 2 and 3 hours,3.0,...,4.0,3.0,2,3,3,3.0,4,2,2,2
479,7/14/2022 19:33:47,21.0,Male,Single,University Student,University,Yes,"Facebook, Twitter, Instagram, YouTube, Discord...",Between 2 and 3 hours,2.0,...,2.0,3.0,3,2,2,3.0,4,4,5,4


5. Data Reduction

Type of organizations saya rasa tidak memiliki pengaruh langsung terhadap variabel yang ingin saya Aanalisis atau modelkan (seperti kesehatan mental atau penggunaan media sosial)

In [None]:
df = df.drop('5. What type of organizations are you affiliated with?', axis=1)

In [None]:
df

Unnamed: 0,Timestamp,1. What is your age?,2. Gender,3. Relationship Status,4. Occupation Status,6. Do you use social media?,7. What social media platforms do you commonly use?,8. What is the average time you spend on social media every day?,9. How often do you find yourself using Social media without a specific purpose?,10. How often do you get distracted by Social media when you are busy doing something?,11. Do you feel restless if you haven't used Social media in a while?,"12. On a scale of 1 to 5, how easily distracted are you?","13. On a scale of 1 to 5, how much are you bothered by worries?",14. Do you find it difficult to concentrate on things?,"15. On a scale of 1-5, how often do you compare yourself to other successful people through the use of social media?","16. Following the previous question, how do you feel about these comparisons, generally speaking?",17. How often do you look to seek validation from features of social media?,18. How often do you feel depressed or down?,"19. On a scale of 1 to 5, how frequently does your interest in daily activities fluctuate?","20. On a scale of 1 to 5, how often do you face issues regarding sleep?"
0,4/18/2022 19:18:47,21.0,Male,In a relationship,University Student,Yes,"Facebook, Twitter, Instagram, YouTube, Discord...",Between 2 and 3 hours,5.0,3,2.0,5.0,2,5,2,3.0,2,5,4,5
1,4/18/2022 19:19:28,21.0,Female,Single,University Student,Yes,"Facebook, Twitter, Instagram, YouTube, Discord...",More than 5 hours,4.0,3,2.0,4.0,5,4,5,1.0,1,5,4,5
2,4/18/2022 19:25:59,21.0,Female,Single,University Student,Yes,"Facebook, Instagram, YouTube, Pinterest",Between 3 and 4 hours,3.0,2,1.0,2.0,5,4,3,3.0,1,4,2,5
3,4/18/2022 19:29:43,21.0,Female,Single,University Student,Yes,"Facebook, Instagram",More than 5 hours,4.0,2,1.0,3.0,5,3,5,1.0,2,4,3,2
4,4/18/2022 19:33:31,21.0,Female,Single,University Student,Yes,"Facebook, Instagram, YouTube",Between 2 and 3 hours,3.0,5,4.0,4.0,5,5,3,3.0,3,4,4,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
476,5/21/2022 23:38:28,24.0,Male,Single,Salaried Worker,Yes,"Facebook, Instagram, YouTube",Between 2 and 3 hours,3.0,4,3.0,4.0,3,4,4,2.0,4,3,4,4
477,5/22/2022 0:01:05,26.0,Female,Married,Salaried Worker,Yes,"Facebook, YouTube",Between 1 and 2 hours,2.0,1,2.0,3.0,4,4,4,2.0,4,4,4,1
478,5/22/2022 10:29:21,29.0,Female,Married,Salaried Worker,Yes,"Facebook, YouTube",Between 2 and 3 hours,3.0,3,4.0,3.0,2,3,3,3.0,4,2,2,2
479,7/14/2022 19:33:47,21.0,Male,Single,University Student,Yes,"Facebook, Twitter, Instagram, YouTube, Discord...",Between 2 and 3 hours,2.0,3,2.0,3.0,3,2,2,3.0,4,4,5,4
