## **Problem Identification**

A decrease in a student's cumulative achievement index can be influenced by various things, one of which is the student's poor mental health. This poor mental health can be caused by students' activities being too heavy or other problems, causing students to become stressed, feel anxious, depressed, have difficulty sleeping, or other problems.

### **Objectives**

- Determining the number of student's with mental health issues.
- Understanding the influence of mental health on student's GPA.
- Determining the type of issues that most affect student's GPA.
- Determining which major has the most student's with mental issues

# Import Libraries

In [1]:
import pandas as pd

# Data Loading

In [2]:
df = pd.read_csv('Student Mental health.csv')
df.head()

Unnamed: 0,Timestamp,Choose your gender,Age,What is your course?,Your current year of Study,What is your CGPA?,Marital status,Do you have Depression?,Do you have Anxiety?,Do you have Panic attack?,Did you seek any specialist for a treatment?
0,8/7/2020 12:02,Female,18.0,Engineering,year 1,3.00 - 3.49,No,Yes,No,Yes,No
1,8/7/2020 12:04,Male,21.0,Islamic education,year 2,3.00 - 3.49,No,No,Yes,No,No
2,8/7/2020 12:05,Male,19.0,BIT,Year 1,3.00 - 3.49,No,Yes,Yes,Yes,No
3,8/7/2020 12:06,Female,22.0,Laws,year 3,3.00 - 3.49,Yes,Yes,No,No,No
4,8/7/2020 12:13,Male,23.0,Mathemathics,year 4,3.00 - 3.49,No,No,No,No,No


In [3]:
df.tail()

Unnamed: 0,Timestamp,Choose your gender,Age,What is your course?,Your current year of Study,What is your CGPA?,Marital status,Do you have Depression?,Do you have Anxiety?,Do you have Panic attack?,Did you seek any specialist for a treatment?
96,13/07/2020 19:56:49,Female,21.0,BCS,year 1,3.50 - 4.00,No,No,Yes,No,No
97,13/07/2020 21:21:42,Male,18.0,Engineering,Year 2,3.00 - 3.49,No,Yes,Yes,No,No
98,13/07/2020 21:22:56,Female,19.0,Nursing,Year 3,3.50 - 4.00,Yes,Yes,No,Yes,No
99,13/07/2020 21:23:57,Female,23.0,Pendidikan Islam,year 4,3.50 - 4.00,No,No,No,No,No
100,18/07/2020 20:16:21,Male,20.0,Biomedical science,Year 2,3.00 - 3.49,No,No,No,No,No


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 101 entries, 0 to 100
Data columns (total 11 columns):
 #   Column                                        Non-Null Count  Dtype  
---  ------                                        --------------  -----  
 0   Timestamp                                     101 non-null    object 
 1   Choose your gender                            101 non-null    object 
 2   Age                                           100 non-null    float64
 3   What is your course?                          101 non-null    object 
 4   Your current year of Study                    101 non-null    object 
 5   What is your CGPA?                            101 non-null    object 
 6   Marital status                                101 non-null    object 
 7   Do you have Depression?                       101 non-null    object 
 8   Do you have Anxiety?                          101 non-null    object 
 9   Do you have Panic attack?                     101 non-null    obj

# Data Cleaning

In [5]:
df.duplicated().sum()

0

Insight: There is no duplicated data

In [6]:
df.isnull().sum()

Timestamp                                       0
Choose your gender                              0
Age                                             1
What is your course?                            0
Your current year of Study                      0
What is your CGPA?                              0
Marital status                                  0
Do you have Depression?                         0
Do you have Anxiety?                            0
Do you have Panic attack?                       0
Did you seek any specialist for a treatment?    0
dtype: int64

Insight: There is missing value on 'Age' that should be handled. Since the missing value only 1, we can drop the data that contains missing value anyway. 

In [7]:
df = df.dropna().reset_index(drop=True)

In [8]:
df.isnull().sum()

Timestamp                                       0
Choose your gender                              0
Age                                             0
What is your course?                            0
Your current year of Study                      0
What is your CGPA?                              0
Marital status                                  0
Do you have Depression?                         0
Do you have Anxiety?                            0
Do you have Panic attack?                       0
Did you seek any specialist for a treatment?    0
dtype: int64

Insigth: The missing value already handled so there is no missing value in the new dataframe.

In [9]:
df.rename(columns = {'Timestamp' : 'timestamp', 
                     'Age' : 'age',
                     'Choose your gender':'gender','What is your course?':'course', 
                     'Your current year of Study' : 'year_of_study', 
                     'What is your CGPA?': 'cgpa', 
                     'Marital status' : 'marital_status', 
                     'Do you have Depression?' : 'depression', 
                     'Do you have Anxiety?' : 'anxiety', 
                     'Do you have Panic attack?': 'panic_attack',
                     'Did you seek any specialist for a treatment?' : 'treatement_with_specialist'}, inplace = True) 

In [10]:
df.columns

Index(['timestamp', 'gender', 'age', 'course', 'year_of_study', 'cgpa',
       'marital_status', 'depression', 'anxiety', 'panic_attack',
       'treatement_with_specialist'],
      dtype='object')

In [11]:
cols = df.columns
listItem = [] # Make empty list
for col in cols: # forloop for columns in cols 
  listItem.append([col, df[col].nunique(), df[col].unique()]) # insert columsn name an total unique value in the empty list

pd.DataFrame(columns=['Nama Kolom', 'Jumlah Nilai Unique', 'Nilai Unique'], data=listItem) # create dataframe from listItem

Unnamed: 0,Nama Kolom,Jumlah Nilai Unique,Nilai Unique
0,timestamp,91,"[8/7/2020 12:02, 8/7/2020 12:04, 8/7/2020 12:0..."
1,gender,2,"[Female, Male]"
2,age,7,"[18.0, 21.0, 19.0, 22.0, 23.0, 20.0, 24.0]"
3,course,49,"[Engineering, Islamic education, BIT, Laws, Ma..."
4,year_of_study,7,"[year 1, year 2, Year 1, year 3, year 4, Year ..."
5,cgpa,6,"[3.00 - 3.49, 3.50 - 4.00, 3.50 - 4.00 , 2.50 ..."
6,marital_status,2,"[No, Yes]"
7,depression,2,"[Yes, No]"
8,anxiety,2,"[No, Yes]"
9,panic_attack,2,"[Yes, No]"


In [12]:
df[df.columns[3]] = df[df.columns[3]].replace({'Pendidikan islam':'Islamic Education', 
                                               'Islamic education' : 'Islamic Education',
                                               'Pendidikan Islam ':'Islamic Education',
                                               'Pendidikan Islam':'Islamic Education',
                                               'psychology':'Psychology',
                                               'Laws':'Law',
                                               'Benl' :'BENL',
                                               'koe':'KOE',
                                               'Koe':'KOE',
                                               'Kop' :'KOP',
                                               'engin':'Engine',
                                               'Fiqh fatwa ':'Fiqh',
                                               'Diploma Nursing':'Nursing',
                                               'Nursing ':'Nursing',
                                               'Communication ':'Communication',
                                               'Human Sciences ' : 'Human Sciences',
                                               'Usuluddin ': 'Usuluddin',
                                               'Accounting ':'Accounting',
                                               'DIPLOMA TESL':'TESL',
                                               'Irkhs':'IRKHS',
                                               'Kirkhs':'KIRKHS',
                                               'Biomedical science':'Biomedical Science'}, regex=True)
df[df.columns[4]] = df[df.columns[4]].replace({'year ':'', 'Year ' : ''}, regex=True)
df[df.columns[5]] = df[df.columns[5]].replace({'0 - 1.99':'0', 
                                               '2.00 - 2.49' : '1',
                                               '2.50 - 2.99':'2', 
                                               '3.00 - 3.49':'3',
                                               '3.50 - 4.00 ':'4', 
                                               '3.50 - 4.00': '4'}, regex=True)

In [13]:
df.course.unique()

array(['Engineering', 'Islamic Education', 'BIT', 'Law', 'Mathemathics',
       'BCS', 'Human Resources', 'IRKHS', 'Psychology', 'KENMS',
       'Accounting', 'ENM', 'Marine science', 'KOE', 'Banking Studies',
       'Business Administration', 'KIRKHS', 'Usuluddin', 'TAASL',
       'Engine', 'ALA', 'Biomedical Science', 'BENL', 'IT', 'CTS',
       'Econs', 'MHSC', 'Malcom', 'KOP', 'Human Sciences',
       'Biotechnology', 'Communication', 'Nursing', 'Radiography', 'Fiqh',
       'TESL'], dtype=object)

In [14]:
df.year_of_study.unique()

array(['1', '2', '3', '4'], dtype=object)

In [15]:
df.cgpa.unique()

array(['3', '4', '2', '1', '0'], dtype=object)

In [16]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 11 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   timestamp                   100 non-null    object 
 1   gender                      100 non-null    object 
 2   age                         100 non-null    float64
 3   course                      100 non-null    object 
 4   year_of_study               100 non-null    object 
 5   cgpa                        100 non-null    object 
 6   marital_status              100 non-null    object 
 7   depression                  100 non-null    object 
 8   anxiety                     100 non-null    object 
 9   panic_attack                100 non-null    object 
 10  treatement_with_specialist  100 non-null    object 
dtypes: float64(1), object(10)
memory usage: 8.7+ KB


In [17]:
df[['year_of_study', 'cgpa']]=df[['year_of_study', 'cgpa']].apply(pd.to_numeric)
df['timestamp'] = df['timestamp'].apply(pd.to_datetime)

  df['timestamp'] = df['timestamp'].apply(pd.to_datetime)


In [18]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 11 columns):
 #   Column                      Non-Null Count  Dtype         
---  ------                      --------------  -----         
 0   timestamp                   100 non-null    datetime64[ns]
 1   gender                      100 non-null    object        
 2   age                         100 non-null    float64       
 3   course                      100 non-null    object        
 4   year_of_study               100 non-null    int64         
 5   cgpa                        100 non-null    int64         
 6   marital_status              100 non-null    object        
 7   depression                  100 non-null    object        
 8   anxiety                     100 non-null    object        
 9   panic_attack                100 non-null    object        
 10  treatement_with_specialist  100 non-null    object        
dtypes: datetime64[ns](1), float64(1), int64(2), object(7)
memor

In [19]:
df.drop(columns=['timestamp'], inplace=True)

In [20]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 10 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   gender                      100 non-null    object 
 1   age                         100 non-null    float64
 2   course                      100 non-null    object 
 3   year_of_study               100 non-null    int64  
 4   cgpa                        100 non-null    int64  
 5   marital_status              100 non-null    object 
 6   depression                  100 non-null    object 
 7   anxiety                     100 non-null    object 
 8   panic_attack                100 non-null    object 
 9   treatement_with_specialist  100 non-null    object 
dtypes: float64(1), int64(2), object(7)
memory usage: 7.9+ KB


In [21]:
df.to_csv('student_mental_health_cleaned.csv', index=False)