![Illustration of silhouetted heads](mentalhealth.jpg)

Does going to university in a different country affect your mental health? A Japanese international university surveyed its students in 2018 and published a study the following year that was approved by several ethical and regulatory boards.

The study found that international students have a higher risk of mental health difficulties than the general population, and that social connectedness (belonging to a social group) and acculturative stress (stress associated with joining a new culture) are predictive of depression.


Explore the `students` data using PostgreSQL to find out if you would come to a similar conclusion for international students and see if the length of stay is a contributing factor.

Here is a data description of the columns you may find helpful.

| Field Name    | Description                                      |
| ------------- | ------------------------------------------------ |
| `inter_dom`     | Types of students (international or domestic)   |
| `japanese_cate` | Japanese language proficiency                    |
| `english_cate`  | English language proficiency                     |
| `academic`      | Current academic level (undergraduate or graduate) |
| `age`           | Current age of student                           |
| `stay`          | Current length of stay in years                  |
| `todep`         | Total score of depression (PHQ-9 test)           |
| `tosc`          | Total score of social connectedness (SCS test)   |
| `toas`          | Total score of acculturative stress (ASISS test) |

-- Data set:London Public Transport;
-- Source:sample data source from DataLab;
-- Queried using PostgreSQL, Environment: DataLab;

In [4]:
-- Explore the data in the table 
SELECT * 
FROM 'students.csv' 
LIMIT 5;

Unnamed: 0,inter_dom,region,gender,academic,age,age_cate,stay,stay_cate,japanese,japanese_cate,english,english_cate,intimate,religion,suicide,dep,deptype,todep,depsev,tosc,apd,ahome,aph,afear,acs,aguilt,amiscell,toas,partner,friends,parents,relative,profess,phone,doctor,reli,alone,others,internet,partner_bi,friends_bi,parents_bi,relative_bi,professional_bi,phone_bi,doctor_bi,religion_bi,alone_bi,others_bi,internet_bi
0,Inter,SEA,Male,Grad,24,4,5,Long,3,Average,5,High,,Yes,No,No,No,0,Min,34,23,9,11,8,11,2,27,91,5,5,6,3,2,1,4,1,3,4,,Yes,Yes,Yes,No,No,No,No,No,No,No,No
1,Inter,SEA,Male,Grad,28,5,1,Short,4,High,4,High,,No,No,No,No,2,Min,48,8,7,5,4,3,2,10,39,7,7,7,4,4,4,4,1,1,1,,Yes,Yes,Yes,No,No,No,No,No,No,No,No
2,Inter,SEA,Male,Grad,25,4,6,Long,4,High,4,High,Yes,Yes,No,No,No,2,Min,41,13,4,7,6,4,3,14,51,3,3,3,1,1,2,1,1,1,1,,No,No,No,No,No,No,No,No,No,No,No
3,Inter,EA,Female,Grad,29,5,1,Short,2,Low,3,Average,No,No,No,No,No,3,Min,37,16,10,10,8,6,4,21,75,5,5,5,5,5,2,2,2,4,4,,Yes,Yes,Yes,Yes,Yes,No,No,No,No,No,No
4,Inter,EA,Female,Grad,28,5,1,Short,1,Low,3,Average,Yes,No,No,No,No,3,Min,37,15,12,5,8,7,4,31,82,5,5,5,2,5,2,5,5,4,4,,Yes,Yes,Yes,No,Yes,No,Yes,Yes,No,No,No


In [19]:
SELECT DISTINCT inter_dom
FROM 'students.csv';

Unnamed: 0,inter_dom
0,Inter
1,Dom
2,


In [11]:
--How the length of stay (stay) impacts the average mental health diagnostic scores of the international students?
SELECT 
    stay,
    COUNT(stay) AS count_int,
    ROUND(AVG(todep), 2) AS average_phq, --depression
    ROUND(AVG(tosc), 2) AS average_scs, --social connectedness
    ROUND(AVG(toas), 2) AS average_as --acculturative stress
FROM students
WHERE inter_dom = 'Inter'
GROUP BY stay
ORDER BY stay DESC
LIMIT 9;

Unnamed: 0,stay,count_int,average_phq,average_scs,average_as
0,10,1,13.0,32.0,50.0
1,8,1,10.0,44.0,65.0
2,7,1,4.0,48.0,45.0
3,6,3,6.0,38.0,58.67
4,5,1,0.0,34.0,91.0
5,4,14,8.57,33.93,87.71
6,3,46,9.09,37.13,78.0
7,2,39,8.28,37.08,77.67
8,1,95,7.48,38.11,72.8


Conclusions:
1. Longer stays appear correlated with slightly higher depressive symptoms (phq) and social isolation (scs). 
BUT: Number of students (count_int) is much larger for students with shorter stays compared to longer stays (statistical reliability)
!! Additional statistical tests should be done
2. Students with a stay of 1-5 years have the lowest social isolation (as)
3. We need to collect more data for longer stay students


In [5]:
--How the gender, age, level of japanease and english language affectcs on length of stay?
SELECT stay_cate,gender, AVG(age) AS avg_age, AVG(japanese) AS avg_japanese, AVG(english) AS avg_english
FROM 'students.csv'
GROUP BY stay_cate, gender  
ORDER BY stay_cate, avg_age DESC;

Unnamed: 0,stay_cate,gender,avg_age,avg_japanese,avg_english
0,Long,Male,22.941176,3.705882,3.764706
1,Long,Female,22.2,3.733333,3.4
2,Medium,Male,21.47619,3.380952,3.547619
3,Medium,Female,21.063291,3.379747,3.696203
4,Short,Male,20.153846,2.692308,3.820513
5,Short,Female,19.986842,2.592105,3.592105
6,,,,,


In [18]:
--How does the level of Japanese language proficiency (japanese_cate) affect the length of stay?
SELECT stay_cate, japanese_cate, COUNT(*) AS count
FROM 'students.csv'
GROUP BY stay_cate, japanese_cate
ORDER BY stay_cate;

Unnamed: 0,stay_cate,japanese_cate,count
0,Long,Average,11
1,Long,Low,4
2,Long,High,17
3,Medium,High,46
4,Medium,Low,29
5,Medium,Average,46
6,Short,Low,59
7,Short,Average,32
8,Short,High,24
9,,,18


The better the knowledge of the Japanese language, the longer the stay 

In [10]:
--Are there differences in levels of depression (depsev) by region or gender?
SELECT region, gender, depsev, COUNT(*) AS count
FROM 'students.csv'
WHERE depsev IS NOT NULL
GROUP BY region, gender, depsev
ORDER BY region, gender, count DESC;

Unnamed: 0,region,gender,depsev,count
0,EA,Female,Mild,9
1,EA,Female,Mod,8
2,EA,Female,Min,5
3,EA,Female,ModSev,1
4,EA,Female,Sev,1
5,EA,Male,Mild,9
6,EA,Male,Min,7
7,EA,Male,Mod,6
8,EA,Male,Sev,1
9,EA,Male,ModSev,1


In [15]:
--Which age groups are most likely to use the Internet as a means of support?
SELECT age, COUNT(*) AS internet_users
FROM 'students.csv'
WHERE internet_bi = 'Yes'
GROUP BY age
ORDER BY internet_users DESC;

Unnamed: 0,age,internet_users
0,19,13
1,20,11
2,21,7
3,18,6
4,23,4
5,22,3
6,30,1


Younger students are more likely to seek help online

In [24]:
---- People from different regions differ and their attitudes toward using professional help (professional_bi)?
SELECT 
    region,
    professional_bi,
    COUNT(*) AS count,
    ROUND(100 * COUNT(*) / SUM(COUNT(*)) OVER (PARTITION BY region), 2) AS percentage
FROM 'students.csv'
GROUP BY region, professional_bi
ORDER BY region, percentage DESC;

Unnamed: 0,region,professional_bi,count,percentage
0,EA,No,39,81.25
1,EA,Yes,9,18.75
2,JAP,No,55,79.71
3,JAP,Yes,14,20.29
4,Others,No,7,63.64
5,Others,Yes,4,36.36
6,SA,No,12,66.67
7,SA,Yes,6,33.33
8,SEA,No,94,77.05
9,SEA,Yes,28,22.95


EA and JAP regions have the highest percentage of “No” responses (81.25% and 79.71 %)
 SA region and Others regions are more open to professional help, with the highest percentage of “Yes” responses (36.36% and 33.33%).

In [29]:
--Are older students more likely to seek professional help than younger people?
SELECT 	professional_bi,
		ROUND(AVG(age),2) AS avg_age,
		COUNT(*) AS count
FROM 'students.csv'
GROUP BY professional_bi
ORDER BY avg_age DESC;

Unnamed: 0,professional_bi,avg_age,count
0,Yes,21.3,61
1,No,20.75,207
2,,,14
3,61,,2
4,207,,2


Older students are slightly more open to using professional help (average age 21.3),

In [3]:
--Does language proficiency affect attitudes toward professional help (professional_bi)?
SELECT 	professional_bi,
		ROUND(AVG(japanese),2) AS avg_japanese, 
		ROUND(AVG(english), 2) AS avg_english,
		COUNT(*) AS count
FROM 'students.csv'
GROUP BY professional_bi;

Unnamed: 0,professional_bi,avg_japanese,avg_english,count
0,No,3.11,3.61,207
1,Yes,3.05,3.77,61
2,,,,14
3,61,,,2
4,207,,,2


It seems that those more advanced in English are more open to using professional help, but the difference is small. It is possible that these two variables are not related

In [11]:
-- Which region has the highest percentage of people receiving professional help?
WITH counts_per_region AS (
	SELECT 	region,
		SUM(CASE WHEN professional_bi='Yes' THEN 1 ELSE 0 END) AS prof_help_count,
		COUNT(*) AS total_count
FROM 'students.csv'
GROUP BY region
)
SELECT 	region,
		prof_help_count,
		total_count,
		ROUND((100.0 * prof_help_count / total_count),2) AS percentage_prof_help
FROM counts_per_region
ORDER BY percentage_prof_help;

Unnamed: 0,region,prof_help_count,total_count,percentage_prof_help
0,,0.0,18,0.0
1,EA,9.0,48,18.75
2,JAP,14.0,69,20.29
3,SEA,28.0,122,22.95
4,SA,6.0,18,33.33
5,Others,4.0,11,36.36


The largest percentage is in SA and Others regions, but there are relatively fewer people that in the rest of regions

In [7]:
--Let's explore differences in language skills between regions.
SELECT
	region,
	ROUND(AVG(japanese), 2) AS avg_japanese,
    ROUND(AVG(english), 2) AS avg_english,
    COUNT(*) AS count
FROM 'students.csv'
--WHERE inter_dom = 'Inter'
GROUP BY region
ORDER BY avg_japanese DESC, avg_english DESC;

Unnamed: 0,region,avg_japanese,avg_english,count
0,JAP,4.81,2.93,69
1,EA,2.73,3.56,48
2,SA,2.72,4.33,18
3,SEA,2.41,3.91,122
4,Others,2.18,4.55,11
5,,,,18


In JAP region Japanese is the dominant language and English is at the lowest level, compared to other regions .

In [14]:
--What is the correlation between the level of Japanese and English?
SELECT CORR(japanese, english) AS correlation
FROM 'students.csv';

Unnamed: 0,correlation
0,-0.308978


Moderate negative correlation: if one language increases,  the other language decreases

In [17]:
--Does the level of Japanese language proficiency vary by gender?
SELECT 	gender,
		 ROUND(AVG(japanese), 2) AS avg_japanese,
		 COUNT(*) AS count
FROM 'students.csv'
GROUP BY gender
ORDER BY avg_japanese DESC;

Unnamed: 0,gender,avg_japanese,count
0,Male,3.16,98
1,Female,3.06,170
2,,,18


Men perform slightly better in knowledge of Japanese