![Illustration of silhouetted heads](mentalhealth.jpg)

Does going to university in a different country affect your mental health? A Japanese international university surveyed its students in 2018 and published a study the following year that was approved by several ethical and regulatory boards.

The study found that international students have a higher risk of mental health difficulties than the general population, and that social connectedness (belonging to a social group) and acculturative stress (stress associated with joining a new culture) are predictive of depression.


Explore the `students` data using PostgreSQL to find out if you would come to a similar conclusion for international students and see if the length of stay is a contributing factor.

Here is a data description of the columns you may find helpful.

| Field Name    | Description                                      |
| ------------- | ------------------------------------------------ |
| `inter_dom`     | Types of students (international or domestic)   |
| `japanese_cate` | Japanese language proficiency                    |
| `english_cate`  | English language proficiency                     |
| `academic`      | Current academic level (undergraduate or graduate) |
| `age`           | Current age of student                           |
| `stay`          | Current length of stay in years                  |
| `todep`         | Total score of depression (PHQ-9 test)           |
| `tosc`          | Total score of social connectedness (SCS test)   |
| `toas`          | Total score of acculturative stress (ASISS test) |

In [1]:
import pandas as pd
import sqlite3

In [2]:
students = pd.read_csv('students.csv')

In [3]:
conn = sqlite3.connect(':memory:')
cur = conn.cursor()

In [4]:
# Load the DataFrame into the SQLite database
students.to_sql('students', conn, index=False, if_exists='replace')

286

In [5]:
q1 = """
SELECT *
FROM students;
"""

result = pd.read_sql(q1, conn)

In [6]:
print(result)

    inter_dom region  gender academic   age  age_cate  stay stay_cate  \
0       Inter    SEA    Male     Grad  24.0       4.0   5.0      Long   
1       Inter    SEA    Male     Grad  28.0       5.0   1.0     Short   
2       Inter    SEA    Male     Grad  25.0       4.0   6.0      Long   
3       Inter     EA  Female     Grad  29.0       5.0   1.0     Short   
4       Inter     EA  Female     Grad  28.0       5.0   1.0     Short   
..        ...    ...     ...      ...   ...       ...   ...       ...   
281      None   None    None     None   NaN       NaN   NaN      None   
282      None   None    None     None   NaN       NaN   NaN      None   
283      None   None    None     None   NaN       NaN   NaN      None   
284      None   None    None     None   NaN       NaN   NaN      None   
285      None   None    None     None   NaN       NaN   NaN      None   

     japanese japanese_cate  ...  friends_bi parents_bi relative_bi  \
0         3.0       Average  ...         Yes        

The answer we are trying to prove is that international students have a higher risk of mental health difficulties that the general population in Japan. We are also trying to prove the social connectedness and acculturative stress are predictive of depression.

<!-- The key columns that we will focus on is:
- **inter_dom**
- **stay**
- **todep**
- **tosc**
- **toas** -->

### Missing values in our data

#### Check for missing values

In [7]:
q2 = """ 
SELECT 
    SUM(CASE WHEN inter_dom IS NULL THEN 1 ELSE 0 END) AS inter_dom_missing,
    SUM(CASE WHEN academic IS NULL THEN 1 ELSE 0 END) AS academic_missing,
    SUM(CASE WHEN age IS NULL THEN 1 ELSE 0 END) AS age_missing,
    SUM(CASE WHEN english_cate IS NULL THEN 1 ELSE 0 END) AS english_cate_missing,
    SUM(CASE WHEN japanese_cate IS NULL THEN 1 ELSE 0 END) AS japanese_cate_missing,
    SUM(CASE WHEN stay IS NULL THEN 1 ELSE 0 END) AS stay_missing,
    SUM(CASE WHEN todep IS NULL THEN 1 ELSE 0 END) AS todep_missing,
    SUM(CASE WHEN tosc IS NULL THEN 1 ELSE 0 END) AS tosc_missing,
    SUM(CASE WHEN toas IS NULL THEN 1 ELSE 0 END) AS toas_missing
FROM students;
"""

result = pd.read_sql(q2, conn)

print(result)

   inter_dom_missing  academic_missing  age_missing  english_cate_missing  \
0                 18                18           18                    18   

   japanese_cate_missing  stay_missing  todep_missing  tosc_missing  \
0                     18            18             18            18   

   toas_missing  
0            18  


In [8]:
# Each key column we are focusing has 18 missing values

#### Handling missing values
We will handle the missing values by removing them from the dataset. Since the dataset is large enough, removing the missing values will not affect the analysis.

In [9]:
q3 = """
DELETE FROM students
WHERE inter_dom IS NULL OR
        academic IS NULL OR
        age IS NULL OR
        english_cate IS NULL OR
        japanese_cate IS NULL OR
        stay IS NULL OR
        todep IS NULL OR
        tosc IS NULL OR
        toas IS NULL;
"""

cur.execute(q3)

<sqlite3.Cursor at 0x24437ee7140>

In [10]:
## Check that the missing values have been removed
q4 = """
SELECT 
    SUM(CASE WHEN inter_dom IS NULL THEN 1 ELSE 0 END) AS inter_dom_missing,
    SUM(CASE WHEN academic IS NULL THEN 1 ELSE 0 END) AS academic_missing,
    SUM(CASE WHEN age IS NULL THEN 1 ELSE 0 END) AS age_missing,
    SUM(CASE WHEN english_cate IS NULL THEN 1 ELSE 0 END) AS english_cate_missing,
    SUM(CASE WHEN japanese_cate IS NULL THEN 1 ELSE 0 END) AS japanese_cate_missing,
    SUM(CASE WHEN stay IS NULL THEN 1 ELSE 0 END) AS stay_missing,
    SUM(CASE WHEN todep IS NULL THEN 1 ELSE 0 END) AS todep_missing,
    SUM(CASE WHEN tosc IS NULL THEN 1 ELSE 0 END) AS tosc_missing,
    SUM(CASE WHEN toas IS NULL THEN 1 ELSE 0 END) AS toas_missing
FROM students;
"""

result = pd.read_sql(q4, conn)

print(result)

   inter_dom_missing  academic_missing  age_missing  english_cate_missing  \
0                  0                 0            0                     0   

   japanese_cate_missing  stay_missing  todep_missing  tosc_missing  \
0                      0             0              0             0   

   toas_missing  
0             0  


### Exploring Data Distribution

In [11]:
# Distribution of depression between international and domestic students
q5 = """
SELECT inter_dom, AVG(todep) AS avg_depression
FROM students
GROUP BY inter_dom;
"""

result = pd.read_sql(q5, conn)

print(result)

  inter_dom  avg_depression
0       Dom        8.611940
1     Inter        8.044776


In [12]:
# Distribution of social connectedness between international and domestic students
q6 = """
SELECT inter_dom, AVG(tosc) AS avg_social_connectedness
FROM students
GROUP BY inter_dom;
"""

result = pd.read_sql(q6, conn)

print(result)

  inter_dom  avg_social_connectedness
0       Dom                 37.641791
1     Inter                 37.417910


In [13]:
# Distribution of acculturative stress between international and domestic students
q7 = """
SELECT inter_dom, AVG(toas) AS avg_acculturative_stress
FROM students
GROUP BY inter_dom;
"""

result = pd.read_sql(q7, conn)

print(result)

  inter_dom  avg_acculturative_stress
0       Dom                 62.835821
1     Inter                 75.562189


Based on the analysis carried out on the distribution of data, we observe the following:
- The average depression of domestic students is higher than that of international students this is contrary to the study that was carried out by the university. This shows that domestic students are more depressed than international students.
- The average social connectedness of domestic students is higher than that of international students.This shows that domestic students are more socially connected than international students. 
- The average acculturative stress of international students is higher than that of domestic students. This shows that international students have more acculturative stress than domestic students. 

Based on the findings above we will need to carry out further analysis to determine if international students have a higher risk of mental health difficulties than the domestic students in Japan. We will also need to determine if social connectedness and acculturative stress are predictive of depression.