# Social Media Mental Health Indicators Dataset

This dataset captures the relationship between social media usage, screen-time behavior, and daily lifestyle factors such as sleep duration and interaction quality. It is useful for analyzing patterns that may influence mental well-being, digital habits, and behavioral trends among users.

The data contains individual-level entries with details like daily screen time, social media time, positive vs. negative interactions, demographic information, and sleep hours.

In [1]:
#The data is available at https://www.kaggle.com/datasets/sonalshinde123/social-media-mental-health-indicators-dataset
#Importing the libraries
import pandas as pd
import numpy as np

#Importing the dataset
media = pd.read_csv('mental_health_social_media_dataset.csv')

# Dataset overview and basic statistics

In [2]:
media.head()
#first 5 rows

Unnamed: 0,person_name,age,date,gender,platform,daily_screen_time_min,social_media_time_min,negative_interactions_count,positive_interactions_count,sleep_hours,physical_activity_min,anxiety_level,stress_level,mood_level,mental_state
0,Reyansh Ghosh,35,1/1/2024,Male,Instagram,320,160,1,2,7.4,28,2,7,6,Stressed
1,Neha Patel,24,1/12/2024,Female,Instagram,453,226,1,3,6.7,15,3,8,5,Stressed
2,Ananya Naidu,26,1/6/2024,Male,Snapchat,357,196,1,2,7.2,24,3,7,6,Stressed
3,Neha Das,66,1/17/2024,Female,Snapchat,190,105,0,1,8.0,41,2,6,6,Stressed
4,Reyansh Banerjee,31,1/28/2024,Male,Snapchat,383,211,1,2,7.1,22,3,7,6,Stressed


In [3]:
media.tail()
#last 5 rows

Unnamed: 0,person_name,age,date,gender,platform,daily_screen_time_min,social_media_time_min,negative_interactions_count,positive_interactions_count,sleep_hours,physical_activity_min,anxiety_level,stress_level,mood_level,mental_state
4995,Sai Menon,42,1/21/2025,Female,WhatsApp,254,64,0,1,7.7,35,1,5,7,At_Risk
4996,Neha Ansari,33,1/26/2025,Female,TikTok,330,214,1,2,7.4,27,3,7,6,Stressed
4997,Aarav Sharma,13,2/6/2025,Male,TikTok,403,262,2,2,7.0,20,4,9,4,Stressed
4998,Aadhya Patil,21,2/17/2025,Male,TikTok,476,309,2,3,6.6,12,4,9,4,Stressed
4999,Shaurya Das,42,2/28/2025,Female,TikTok,249,162,1,1,7.8,35,2,6,6,Stressed


In [4]:
media.shape
#shape of dataset (rows,columns)

(5000, 15)

This dataset contains 15 columns and 5000 rows

In [5]:
media.info()
#the columns and their data types
#also knowning they have null values or not

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 15 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   person_name                  5000 non-null   object 
 1   age                          5000 non-null   int64  
 2   date                         5000 non-null   object 
 3   gender                       5000 non-null   object 
 4   platform                     5000 non-null   object 
 5   daily_screen_time_min        5000 non-null   int64  
 6   social_media_time_min        5000 non-null   int64  
 7   negative_interactions_count  5000 non-null   int64  
 8   positive_interactions_count  5000 non-null   int64  
 9   sleep_hours                  5000 non-null   float64
 10  physical_activity_min        5000 non-null   int64  
 11  anxiety_level                5000 non-null   int64  
 12  stress_level                 5000 non-null   int64  
 13  mood_level        

There is no missing values in this dataset

In [6]:
from datetime import datetime
media['date'] = pd.to_datetime(media['date'])
#the data type of "date" is object, let's convert it date type

In [7]:
media.isnull().sum()
#Do we have null values or not

Unnamed: 0,0
person_name,0
age,0
date,0
gender,0
platform,0
daily_screen_time_min,0
social_media_time_min,0
negative_interactions_count,0
positive_interactions_count,0
sleep_hours,0


We don't have null values, that's why we don't need to drop or fill any  column

In [8]:
media.nunique()
#the count of unique values each column

Unnamed: 0,0
person_name,891
age,57
date,686
gender,3
platform,7
daily_screen_time_min,342
social_media_time_min,304
negative_interactions_count,3
positive_interactions_count,5
sleep_hours,19


In [9]:
media['gender'].value_counts()
#Before starting the analysis, it is noticeable that the gender variable includes three distinct categories

Unnamed: 0_level_0,count
gender,Unnamed: 1_level_1
Female,2474
Male,2427
Other,99


# Dataset Analysis

In [10]:
media['mental_state'].value_counts()
#we have mental state column, first look which state is high

Unnamed: 0_level_0,count
mental_state,Unnamed: 1_level_1
Stressed,4601
Healthy,341
At_Risk,58


We notice that most people are stressed, while very few are healthy or at risk.

Let's search who they are


### Mental Health Status by Gender

In [11]:
media.groupby('gender')['mental_state'].value_counts()

Unnamed: 0_level_0,Unnamed: 1_level_0,count
gender,mental_state,Unnamed: 2_level_1
Female,Stressed,2268
Female,Healthy,181
Female,At_Risk,25
Male,Stressed,2242
Male,Healthy,153
Male,At_Risk,32
Other,Stressed,91
Other,Healthy,7
Other,At_Risk,1


The most stressed people are females. But also most healthy people are women.

Let's look which platform they use mostly.

### Platform Usage Across Genders

In [12]:
media.groupby('platform')['gender'].value_counts("mean").mul(100).round(2)

Unnamed: 0_level_0,Unnamed: 1_level_0,proportion
platform,gender,Unnamed: 2_level_1
Facebook,Female,50.27
Facebook,Male,47.18
Facebook,Other,2.55
Instagram,Male,49.36
Instagram,Female,49.08
Instagram,Other,1.56
Snapchat,Female,50.92
Snapchat,Male,46.52
Snapchat,Other,2.55
TikTok,Male,50.62


Women mostly use Facebook, Snapchat and Twitter.

### Dominant Gender on Each Platform

In [13]:
media.groupby('platform')['gender'].value_counts().reset_index(name='count').sort_values('count', ascending=False).drop_duplicates(subset='platform', keep='first')

Unnamed: 0,platform,gender,count
0,Facebook,Female,374
9,TikTok,Male,366
6,Snapchat,Female,359
15,WhatsApp,Female,354
18,YouTube,Male,353
12,Twitter,Female,349
3,Instagram,Male,347


For deep analys search age levels, not only genders


In [14]:
media['age level']=''
media.head()
#add new column

Unnamed: 0,person_name,age,date,gender,platform,daily_screen_time_min,social_media_time_min,negative_interactions_count,positive_interactions_count,sleep_hours,physical_activity_min,anxiety_level,stress_level,mood_level,mental_state,age level
0,Reyansh Ghosh,35,2024-01-01,Male,Instagram,320,160,1,2,7.4,28,2,7,6,Stressed,
1,Neha Patel,24,2024-01-12,Female,Instagram,453,226,1,3,6.7,15,3,8,5,Stressed,
2,Ananya Naidu,26,2024-01-06,Male,Snapchat,357,196,1,2,7.2,24,3,7,6,Stressed,
3,Neha Das,66,2024-01-17,Female,Snapchat,190,105,0,1,8.0,41,2,6,6,Stressed,
4,Reyansh Banerjee,31,2024-01-28,Male,Snapchat,383,211,1,2,7.1,22,3,7,6,Stressed,


In [15]:
media.age.mean()
#find mean value for filtering the age

np.float64(29.9478)

Average age is approximately 30

### Age Group Categorization

In [16]:
media['age_level'] = pd.cut(
    media['age'],
    bins=[0, 19, 30, 45, 100],
    labels=['Teenager', 'Young', 'Adult', 'Old']
)
#add filtering age column for deep analys

In [17]:
media.head()
#the column succesfully added

Unnamed: 0,person_name,age,date,gender,platform,daily_screen_time_min,social_media_time_min,negative_interactions_count,positive_interactions_count,sleep_hours,physical_activity_min,anxiety_level,stress_level,mood_level,mental_state,age level,age_level
0,Reyansh Ghosh,35,2024-01-01,Male,Instagram,320,160,1,2,7.4,28,2,7,6,Stressed,,Adult
1,Neha Patel,24,2024-01-12,Female,Instagram,453,226,1,3,6.7,15,3,8,5,Stressed,,Young
2,Ananya Naidu,26,2024-01-06,Male,Snapchat,357,196,1,2,7.2,24,3,7,6,Stressed,,Young
3,Neha Das,66,2024-01-17,Female,Snapchat,190,105,0,1,8.0,41,2,6,6,Stressed,,Old
4,Reyansh Banerjee,31,2024-01-28,Male,Snapchat,383,211,1,2,7.1,22,3,7,6,Stressed,,Adult


### Mental State Count by Gender and Age Group

In [18]:
media.pivot_table(index=["gender","age level"],values="mental_state",aggfunc=["count"])

Unnamed: 0_level_0,Unnamed: 1_level_0,count
Unnamed: 0_level_1,Unnamed: 1_level_1,mental_state
gender,age level,Unnamed: 2_level_2
Female,,2474
Male,,2427
Other,,99


Most people in the dataset are young, both male and female. Females and males have almost the same age distribution. The "Other" group has much fewer people compared to others.

Let's compare all mental states with stressed level

### Stressed Users by Gender and Age Group

In [19]:
media_stressed = media[media['mental_state'] == 'Stressed']
result = media_stressed.pivot_table(index=['gender', 'age level'], values='mental_state', aggfunc='count')
result

Unnamed: 0_level_0,Unnamed: 1_level_0,mental_state
gender,age level,Unnamed: 2_level_1
Female,,2268
Male,,2242
Other,,91


With comparing we notice that young women and men,teenager women and men have the same value in each table.So it means all youngs and teenagers are in stressed mood.

Let's find some adding information.

### Healthy Users Subset

In [20]:
media_healthy=media[media['mental_state']=='Healthy']
media_healthy.head()
#fetching information with healthy value for deciding exact result

Unnamed: 0,person_name,age,date,gender,platform,daily_screen_time_min,social_media_time_min,negative_interactions_count,positive_interactions_count,sleep_hours,physical_activity_min,anxiety_level,stress_level,mood_level,mental_state,age level,age_level
10,Suhani Das,39,2024-01-16,Male,WhatsApp,230,58,0,1,7.8,37,1,5,7,Healthy,,Adult
24,Payal Ansari,63,2024-02-26,Female,Facebook,162,57,0,1,8.2,44,1,5,7,Healthy,,Old
40,Saanvi Patil,40,2024-04-01,Female,Facebook,238,83,0,1,7.8,36,1,5,7,Healthy,,Adult
62,Aditi Deshmukh,53,2024-05-26,Male,Twitter,180,72,0,1,8.1,42,1,5,7,Healthy,,Old
81,Payal Shetty,65,2024-08-02,Male,Twitter,154,62,0,1,8.2,45,1,5,7,Healthy,,Old


So these 2 table show all teenager and youngers are stressed. There are no healthy or at_risk value in them.

Let's analys based on platforms.

### Platform Usage by Age Group

In [21]:
media.groupby('age level')['platform'].value_counts()
#first we try to find the reason of stress. All teenagers and youngers are stressed, so we can use media data without filtering "stressed".

Unnamed: 0_level_0,Unnamed: 1_level_0,count
age level,platform,Unnamed: 2_level_1
,Facebook,744
,TikTok,723
,YouTube,716
,WhatsApp,710
,Snapchat,705
,Instagram,703
,Twitter,699


If we focus on "young" they spend more time in whatsApp and they spend little time in snapchat. It means young people use phone for messaging. For "teenager" they mostly use facebook because of profession. It means they focus on their career and they can't socializing with peaople. In conclusion, each of them can't socializing in real world, they communicate only with phone.

Let's search about healthy people.

### Platform Usage by Age Level (Healthy Users)

In [22]:
media_healthy.groupby('age level')['platform'].value_counts()

Unnamed: 0_level_0,Unnamed: 1_level_0,count
age level,platform,Unnamed: 2_level_1
,Facebook,129
,WhatsApp,101
,Twitter,46
,Instagram,29
,Snapchat,22
,YouTube,14


When we look at yourgers and teenagers we also see max value facebook in them. So it means adults and olds healthy mental state don't depend on platform.

Let's analys other factor sleep time.

### Sleep Hours Statistics by Gender and Age Group

In [23]:
media.pivot_table(index=["gender","age level"],values="sleep_hours",aggfunc=["max","min"])

Unnamed: 0_level_0,Unnamed: 1_level_0,max,min
Unnamed: 0_level_1,Unnamed: 1_level_1,sleep_hours,sleep_hours
gender,age level,Unnamed: 2_level_2,Unnamed: 3_level_2
Female,,8.3,6.4
Male,,8.3,6.4
Other,,8.3,6.4


We notice that teenagers and youngers have min sleep_hours in each catagory(min,max) of column while adults and olds have enough sleep hours. That's why we can say the sleep hours effect mental status.

Let's find the reason of fewer sleep hours.

### Min-Max Daily Screen Time Across Gender and Age Levels

In [24]:
media.pivot_table(index=["gender","age level"],values="daily_screen_time_min",aggfunc=["max","min"])

Unnamed: 0_level_0,Unnamed: 1_level_0,max,min
Unnamed: 0_level_1,Unnamed: 1_level_1,daily_screen_time_min,daily_screen_time_min
gender,age level,Unnamed: 2_level_2,Unnamed: 3_level_2
Female,,520,140
Male,,520,140
Other,,520,140


We notice that the youngers and teenagers min screen time is nearly closed to adults and olds max screen time.

Let's find how much time they spend in social media.

### Min-Max Social Media Usage by Gender and Age Levels

In [25]:
media.pivot_table(index=["gender","age level"],values="social_media_time_min",aggfunc=["max","min"])

Unnamed: 0_level_0,Unnamed: 1_level_0,max,min
Unnamed: 0_level_1,Unnamed: 1_level_1,social_media_time_min,social_media_time_min
gender,age level,Unnamed: 2_level_2,Unnamed: 3_level_2
Female,,338,35
Male,,337,36
Other,,335,43


Teenagers and youngs spend more time on social media than adults and olds. So we can say both screen hours and social media time effect the mental state.

### Conclusion

In conclusion, gender distribution is nearly equal and does not strongly affect stress levels. Teenagers and young people show higher stress, while adults and older users tend to be healthier. Although they use similar platforms, lower screen time among adults and older users may contribute to better mental health.