# Mental Health Status - 2025

---

### Importing Nessessary libraries & CSV file
source - [Kaggle](https://www.kaggle.com/datasets/ankushnarwade/global-mental-health-dataset-2025)




### About Dataset

Description:
* The Global Mental Health Dataset 2025 is a synthetic yet highly realistic dataset designed to support analysis, research, and machine learning applications in the field of mental health and healthcare analytics. It captures key psychological assessment scores, lifestyle factors, treatment details, and outcomes for individuals across multiple countries, making it suitable for both academic and industry-level projects.

* This dataset includes 2,500+ anonymized patient records with standardized mental health measures such as PHQ-9 (Depression Score) and GAD-7 (Anxiety Score), which are widely used screening tools in clinical and research settings. Alongside these, the dataset incorporates demographic attributes, stress levels, sleep patterns, physical activity, medical history, treatment types, and recovery outcomes.

* The global scope of the dataset enables cross-country comparisons and broader insights into mental health trends, while the clean structure makes it beginner-friendly and immediately usable for exploratory data analysis (EDA), visualization, and predictive modeling.


Key Use Cases:
* Mental health trend analysis
* Depression and anxiety severity classification
* Treatment effectiveness and outcome prediction
* Lifestyle impact analysis (sleep, activity, stress)
* Healthcare and public health research projects
* Machine learning & data science practice (EDA, ML, DL)

⚠️ Disclaimer
This dataset is synthetically generated for educational and research purposes only. It does not represent real patient data and should not be used for medical diagnosis or treatment decisions.

---

In [1]:
import pandas as pd

df = pd.read_csv(r'C:\Projects\MindScope-2025\Global_Mental_Health_Dataset_2025.csv')

df.head()

Unnamed: 0,Patient_ID,Age,Gender,Country,Depression_Score,Anxiety_Score,Stress_Level,Sleep_Hours,Physical_Activity,Chronic_Illness,Mental_Health_History,Treatment,Days_of_Treatment,Outcome,Work_Status
0,MH0001,56,Male,Germany,4,10,High,8.2,Moderate,No,No,Therapy,171,Excellent,Unemployed
1,MH0002,69,Male,South Africa,22,4,Severe,6.2,Low,No,Yes,,69,Fair,Employed
2,MH0003,46,Male,South Africa,4,2,Low,10.4,,Yes,Yes,Medication,70,Good,Employed
3,MH0004,32,Female,Australia,19,16,Low,10.0,Moderate,No,Yes,Both,254,Excellent,Employed
4,MH0005,60,Female,United States,14,1,Severe,7.9,Low,Yes,No,Therapy,130,Fair,Employed


## 1.Data Preprocessing

In [2]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2500 entries, 0 to 2499
Data columns (total 15 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Patient_ID             2500 non-null   object 
 1   Age                    2500 non-null   int64  
 2   Gender                 2500 non-null   object 
 3   Country                2500 non-null   object 
 4   Depression_Score       2500 non-null   int64  
 5   Anxiety_Score          2500 non-null   int64  
 6   Stress_Level           2500 non-null   object 
 7   Sleep_Hours            2500 non-null   float64
 8   Physical_Activity      2023 non-null   object 
 9   Chronic_Illness        2500 non-null   object 
 10  Mental_Health_History  2500 non-null   object 
 11  Treatment              1892 non-null   object 
 12  Days_of_Treatment      2500 non-null   int64  
 13  Outcome                2500 non-null   object 
 14  Work_Status            2500 non-null   object 
dtypes: f

---

### A - Finding Null Values

In [3]:
df.isnull().sum()

Patient_ID                 0
Age                        0
Gender                     0
Country                    0
Depression_Score           0
Anxiety_Score              0
Stress_Level               0
Sleep_Hours                0
Physical_Activity        477
Chronic_Illness            0
Mental_Health_History      0
Treatment                608
Days_of_Treatment          0
Outcome                    0
Work_Status                0
dtype: int64

In [5]:
print(df['Physical_Activity'].unique())

['Moderate' 'Low' nan 'High']


In [None]:
print(df['Treatment'].unique())

Treatment
Therapy       767
Medication    634
Both          491
Name: count, dtype: int64

---

### B - Fill Missing Values

In [6]:
df['Physical_Activity'] = df['Physical_Activity'].fillna('No Workout')

In [50]:
df['Physical_Activity'].value_counts()

Physical_Activity
Moderate      858
Low           764
No Workout    477
High          401
Name: count, dtype: int64

In [7]:
df['Treatment'] = df['Treatment'].fillna('No Treatment')

In [53]:
df['Treatment'].value_counts()

Treatment
Therapy         767
Medication      634
No Treatment    608
Both            491
Name: count, dtype: int64

In [49]:
df.isnull().sum()

Patient_ID               0
Age                      0
Gender                   0
Country                  0
Depression_Score         0
Anxiety_Score            0
Stress_Level             0
Sleep_Hours              0
Physical_Activity        0
Chronic_Illness          0
Mental_Health_History    0
Treatment                0
Days_of_Treatment        0
Outcome                  0
Work_Status              0
Depression_Level         0
Anxiety_Level            0
dtype: int64

---

### C - Finding Insights From the Dataset

Physical Activity

In [47]:
print(df['Physical_Activity'].value_counts(), "\n")


vc = df['Physical_Activity'].value_counts(normalize=True)

print("percentage of patients with Moderate Workout status - ", round(vc.iloc[0] * 100, 2), "%")
print("percentage of patients with Low Workout status - ", round(vc.iloc[1] * 100, 2), "%")
print("percentage of patients with No Workout status - ", round(vc.iloc[2] * 100, 2), "%")
print("percentage of patients with High Workout status - ", round(vc.iloc[3] * 100, 2), "% \n")

print(50*'-')

Physical_Activity
Moderate      858
Low           764
No Workout    477
High          401
Name: count, dtype: int64 

percentage of patients with Moderate Workout status -  34.32 %
percentage of patients with Low Workout status -  30.56 %
percentage of patients with No Workout status -  19.08 %
percentage of patients with High Workout status -  16.04 % 

--------------------------------------------------


/

Patient ID and Verification

In [46]:
print("No of Patients Id:", df['Patient_ID'].nunique())
print("Total Patients Count:", df['Patient_ID'].size)
print("Is Patient_ID unique?", df['Patient_ID'].is_unique, "\n")
print("It denotes there are", df['Patient_ID'].nunique(), "unique patients in the dataset. \n")

print(50*'-')

No of Patients Id: 2500
Total Patients Count: 2500
Is Patient_ID unique? True 

It denotes there are 2500 unique patients in the dataset. 

--------------------------------------------------


/

Age Catagories

In [45]:
print(df['Age'].unique(),"\n")
print("Age Ranges From - ", df['Age'].min(), "to", df['Age'].max(),"\n")

print(50*'-')

[56 69 46 32 60 25 78 38 75 36 40 28 41 70 53 57 20 39 19 61 47 55 77 50
 29 42 66 44 76 80 59 45 33 79 64 68 72 74 54 24 26 35 21 31 67 43 37 52
 34 23 71 51 27 48 65 62 58 18 22 30 49 73 63] 

Age Ranges From -  18 to 80 

--------------------------------------------------


/

Gender Identification 

In [12]:
print(df['Gender'].value_counts(),"\n")
print("There are probably 50% of women & 50% of men")
print(50*'-')

Gender
Female    1261
Male      1239
Name: count, dtype: int64 

There are probably 50% of women & 50% of men
--------------------------------------------------


/

No of Country's Patients

In [32]:
print("No of Country",df['Country'].value_counts(),"\n")
print(50*'-')

No of Country Country
Germany           281
South Africa      256
Japan             254
France            254
Canada            253
Brazil            250
India             248
United Kingdom    246
United States     233
Australia         225
Name: count, dtype: int64 

--------------------------------------------------


/

PHQ-9 (Patient Health Questionnaire-9) is a standard clinical scale used to measure severity of depression.
The total score ranges from 0 to 27.


Score Range	Severity Level:

* 0 – 4	Minimal / None
* 5 – 9	Mild depression
* 10 – 14	Moderate depression
* 15 – 19	Moderately severe
* 20 – 27	Severe depression

In [44]:
print("PHQ-9 (Depression Score Range) - ", df['Depression_Score'].min(), "to", df['Depression_Score'].max(),"\n")

bins = [-1, 4, 9, 14, 19, 27]
labels = ["Minimal", "Mild", "Moderate", "Moderately severe", "Severe"]

df["Depression_Level"] = pd.cut(df["Depression_Score"], bins=bins, labels=labels)
print(df['Depression_Level'].value_counts(),"\n")

vc = df['Depression_Level'].value_counts(normalize=True)

print("percentage of patients with Severe Depression - ", round(vc.iloc[0] * 100, 2), "%")
print("percentage of patients with Moderately severe Depression - ", round(vc.iloc[1] * 100, 2), "%")
print("percentage of patients with Minimal Depression - ", round(vc.iloc[2] * 100, 2), "%")
print("percentage of patients with Mild Depression - ", round(vc.iloc[3] * 100, 2), "%")
print("percentage of patients with Moderate Depression - ",round(vc.iloc[4] * 100, 2), "% \n")

print(50*'-')

PHQ-9 (Depression Score Range) -  0 to 27 

Depression_Level
Severe               727
Moderately severe    458
Minimal              454
Mild                 447
Moderate             414
Name: count, dtype: int64 

percentage of patients with Severe Depression -  29.08 %
percentage of patients with Moderately severe Depression -  18.32 %
percentage of patients with Minimal Depression -  18.16 %
percentage of patients with Mild Depression -  17.88 %
percentage of patients with Moderate Depression -  16.56 % 

--------------------------------------------------


/

GAD-7 (Generalized Anxiety Disorder Scale)

Total Score Range: 0 – 21
(7 questions × 0–3 each)

Score Range	Anxiety Severity

* 0 – 4	Minimal
* 5 – 9	Mild
* 10 – 14	Moderate
* 15 – 21	Severe

In [34]:
print("GAD-7 (Anxiety Score Range) - ", df['Anxiety_Score'].min(), "to", df['Anxiety_Score'].max(), "\n")


bins = [-1, 4, 9, 14, 21]
labels = ["Minimal", "Mild", "Moderate", "Severe"]

df["Anxiety_Level"] = pd.cut(df["Anxiety_Score"], bins=bins, labels=labels)
print(df['Anxiety_Level'].value_counts(),"\n")

vc = df['Anxiety_Level'].value_counts(normalize=True)

print("percentage of patients with Severe Anxiety - ", round(vc.iloc[0] * 100, 2), "%")
print("percentage of patients with Moderate Anxiety - ", round(vc.iloc[1] * 100, 2), "%")
print("percentage of patients with Mild Anxiety - ", round(vc.iloc[2] * 100, 2), "%")
print("percentage of patients with Minimal Anxiety - ",round(vc.iloc[3] * 100, 2), "% \n")

print(50*'-')

GAD-7 (Anxiety Score Range) -  0 to 21 

Anxiety_Level
Severe      806
Moderate    568
Mild        563
Minimal     563
Name: count, dtype: int64 

percentage of patients with Severe Anxiety -  32.24 %
percentage of patients with Moderate Anxiety -  22.72 %
percentage of patients with Mild Anxiety -  22.52 %
percentage of patients with Minimal Anxiety -  22.52 % 

--------------------------------------------------


/

PSS — Perceived Stress Scale (PSS-10) The Perceived Stress Scale (PSS-10) measures how stressful individuals perceive their life situations over the last month.

Total score range: 0 – 40

Score Range	Stress Level

* 0 – 13	Low stress
* 14 – 26	Moderate stress
* 27 – 32	High stress
* 33 – 40 Severe stress

In [35]:
print(df['Stress_Level'].value_counts(),"\n")

vc = df['Stress_Level'].value_counts(normalize=True)

print("percentage of patients with Medium stress level - ", round(vc.iloc[0] * 100, 2), "%")
print("percentage of patients with High stress level - ", round(vc.iloc[1] * 100, 2), "%")
print("percentage of patients with Low stress level - ", round(vc.iloc[2] * 100, 2), "%")
print("percentage of patients with Severe stress level - ", round(vc.iloc[3] * 100, 2), "% \n")

print(50*'-')

Stress_Level
Medium    879
High      670
Low       598
Severe    353
Name: count, dtype: int64 

percentage of patients with Medium stress level -  35.16 %
percentage of patients with High stress level -  26.8 %
percentage of patients with Low stress level -  23.92 %
percentage of patients with Severe stress level -  14.12 % 

--------------------------------------------------


/

Sleep hours Ranges

In [36]:
print(df['Sleep_Hours'].unique(),"\n")
print(f"{df['Sleep_Hours'].min()} to {df['Sleep_Hours'].max()} hours of sleep range \n")

print(50*'-')

[ 8.2  6.2 10.4 10.   7.9  4.4  5.3  9.6 10.3  5.5  6.3  3.7  9.4  3.
  3.4  8.8  6.8  5.7 11.2  4.7  6.5 11.5  8.  10.1 10.2 11.8  3.5  5.6
  8.5  6.   3.3  4.3 11.3 12.   6.6  9.7 11.7  7.1 10.5 10.9  5.   3.1
  7.4  7.8  4.8  9.9  7.2  5.9  8.6  8.9  8.4  6.4  7.   6.1  4.1  4.6
  4.2  8.3  8.7  5.2 10.7  6.9  3.2  5.8  8.1  7.5 10.6 10.8 11.6  7.3
 11.   9.  11.9 11.4  5.4  4.5  7.7  9.3  3.9  9.1  4.9  6.7  7.6  3.6
  9.5  3.8  4.   9.2 11.1  9.8  5.1] 

3.0 to 12.0 hours of sleep range 

--------------------------------------------------


/

No of Patients affected with chronic illness

In [37]:
print(df['Chronic_Illness'].value_counts(),"\n")

vc = df['Chronic_Illness'].value_counts(normalize=True)

print("percentage of patients with chronic illness - ", round(vc.iloc[1] * 100, 2), "%")
print("percentage of patients without chronic illness - ", round(vc.iloc[0] * 100, 2), "% \n")

print(50*'-')


Chronic_Illness
No     1740
Yes     760
Name: count, dtype: int64 

percentage of patients with chronic illness -  30.4 %
percentage of patients without chronic illness -  69.6 % 

--------------------------------------------------


/

No of patients Maintaining Mental Health Record

In [38]:
print(df['Mental_Health_History'].value_counts(), "\n")

vc = df['Mental_Health_History'].value_counts(normalize=True)

print("percentage of patients with mental health history - ", round(vc.iloc[1] * 100, 2), "%")
print("percentage of patients without mental health history - ", round(vc.iloc[0] * 100, 2), "% \n")

print(50*'-')

Mental_Health_History
No     1478
Yes    1022
Name: count, dtype: int64 

percentage of patients with mental health history -  40.88 %
percentage of patients without mental health history -  59.12 % 

--------------------------------------------------


/

Patients Treatement Type & Count

In [39]:
print(df['Treatment'].value_counts(), "\n")

vc = df['Treatment'].value_counts(normalize=True)

print("percentage of patients taking Therapy - ", round(vc.iloc[0] * 100, 2), "%")
print("percentage of patients taking Medication - ", round(vc.iloc[1] * 100, 2), "%")
print("percentage of patients without any Treatment - ", round(vc.iloc[2] * 100, 2), "%")
print("percentage of patients taking both Therapy & Medication - ", round(vc.iloc[3] * 100, 2), "% \n")

print(50*'-')

Treatment
Therapy         767
Medication      634
No Treatment    608
Both            491
Name: count, dtype: int64 

percentage of patients taking Therapy -  30.68 %
percentage of patients taking Medication -  25.36 %
percentage of patients without any Treatment -  24.32 %
percentage of patients taking both Therapy & Medication -  19.64 % 

--------------------------------------------------


/

No of Days Patients Taking Treatement

In [41]:
print(df['Days_of_Treatment'].unique(), "\n")
print("Days of Treatment Range - ", df['Days_of_Treatment'].min(), "to", df['Days_of_Treatment'].max(),"Days \n")

print(50*'-')

[171  69  70 254 130  48  20 319 219 186  31 202 275 241  15 315 172 136
 178 358  60 103 316 194 255  75 270 180 131 150 154 234 247 164 204 212
 253 156 176  74 177 283 329 132 124 311 126 344 224  16 352 310 245 216
 195 200 365 206  12  77 213 340 328 118 323 354 207  53  78 117 105 104
 302 228 147 353 149 303  81 192 210  73 276  23  22 342 257 197  52 138
 163  91  87 181 282 189  43 299 127 188 169 278 291 273 293  19  18 205
  29 298  42  92  65 308  71 333 161 327 268 217  58 338 119 309   1  36
  96 256 155 144  56 211 174 140 157  84 321  17  41 133 101 221 305  98
 349 158 238 334 248  25 139 162  46 193 223 236 279   9 151 343 123 184
 289  40  10   0 170 337 281 264 198  37 242 209 314 284 167 263  97 364
 363 148 243 312 261 230 190 251 244 324   3 306  95 226 265 208  59   4
 201  11 239 296 351  45 355 347 203 129 348 332 115 350 160 362  13 146
 259 313 232 142 307 145 168 116 166   6 231 250 318 122  88  94 357  44
 252  50  64 185 271  79 359 137 300  90 173 218   

/

Treatement Outcome Results

In [42]:
print(df['Outcome'].value_counts(), "\n")

vc = df['Outcome'].value_counts(normalize=True)

print("percentage of patients with Good Treatment result - ", round(vc.iloc[0] * 100, 2), "%")
print("percentage of patients with Fair Treatment result - ", round(vc.iloc[1] * 100, 2), "%")
print("percentage of patients with Poor Treatment result - ", round(vc.iloc[2] * 100, 2), "%")
print("percentage of patients with Excellent Treatment result - ", round(vc.iloc[3] * 100, 2), "% \n")

print(50*'-')

Outcome
Good         877
Fair         771
Poor         494
Excellent    358
Name: count, dtype: int64 

percentage of patients with Good Treatment result -  35.08 %
percentage of patients with Fair Treatment result -  30.84 %
percentage of patients with Poor Treatment result -  19.76 %
percentage of patients with Excellent Treatment result -  14.32 % 

--------------------------------------------------


/

Work Status of Patients

In [43]:
print(df['Work_Status'].value_counts(), "\n")

vc = df['Work_Status'].value_counts(normalize=True)

print("percentage of patients are Employed - ", round(vc.iloc[0] * 100, 2), "%")
print("percentage of patients are Students - ", round(vc.iloc[1] * 100, 2), "%")
print("percentage of patients are Unemployed - ", round(vc.iloc[2] * 100, 2), "%")
print("percentage of patients are Retired - ", round(vc.iloc[3] * 100, 2), "% \n")

print(50*'-')

Work_Status
Employed      1171
Student        647
Unemployed     467
Retired        215
Name: count, dtype: int64 

percentage of patients are Employed -  46.84 %
percentage of patients are Students -  25.88 %
percentage of patients are Unemployed -  18.68 %
percentage of patients are Retired -  8.6 % 

--------------------------------------------------


/