### 1. Back ground
Education is the core of a nation which makes life easy for people living in that nation in all parts of modern life. In other words, it shapes the nation's future. 

As a country shifted from a centrally planned to a market economy, Vietnam has transformed from one of the poorest in the world into a lower middle-income country. Along with economic development strategies, Vietnam has intensively invested in education. In this analysis, I will break down the nationally educational large scales aspects, which have been changed rapidly altogether with the economy since the political reformed under Doi Moi, launched in 1986. 

This analysis is also helpful for investors which interest in investing in the education sector in Vietnam, to have an overview of the development of the country's education. 

### 2. About the dataset

One of the biggest challengence of this project is the lack of data. I have personally collected and manipulated data from:
- World Bank data
- General Statistics office of Vietnam: https://www.gso.gov.vn
- Ministry of Education: https://moet.gov.vn/Pages/home.aspx
- OEDC data: https://data.oecd.org

They are represented in 4 tables: pop - Population of Vietnam over years, 'edu' - The number of Vietnamese students in each level over years, 'gdp' - GDP of Vietnam and some other countries in South East Asia over years, and PISA - The average reading, mathematics and science score in 2018. 

Note:
- edu: From 2015, the number of higher education students does not include students in college.
- Viet Nam participated in PISA 2018 using paper-based instruments. By the time the OEDC report of PISA 2018 was published, the international comparability of Viet Nam’s performance in reading, mathematics and science could not be fully ensured. For this reason, the OECD does not report comparisons of Viet Nam’s performance in PISA with other countries. The data of Vietnam's PISA score uses here is collected from the Ministry of Education. 


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
#loading dataset
#population dataset
pop = pd.read_csv('../input/vietnam-population-dgp-education-data/pop.csv')

#Education Dataset
edu = pd.read_csv('../input/vietnam-population-dgp-education-data/Vietnamstudent.csv')
                  
#DGP comparing dataset
gdp = pd.read_csv('../input/vietnam-population-dgp-education-data/GDPcompare.csv')
#Average pisa score 2018
pisa = pd.read_csv('../input/vietnam-population-dgp-education-data/Pisa_GDP.csv')

### 3. Exploretory Analysis
#### 3.1 An overview of economic development

In [None]:
#Plotting GDP and GDP PPP over years 
sns.set_style('whitegrid')
gdp = gdp.query('Year >= 1990')
fig, ax = plt.subplots(figsize =(10,7))
rapid_year = [2005, 2010, 2019]
gdp['color_cats'] = ['blue' if x in rapid_year else 'red' for x in gdp['Year']]
ax.bar(gdp['Year'], gdp['Vietnam'], color = gdp['color_cats'], alpha = 0.5)
ax2 = ax.twinx()
ax2.plot(gdp['Year'], gdp['Vietnam GDP PPP'])
ax.set_ylabel('GDP - Hundered Billion USD')
ax2.set_ylabel('GDP per person - USD')
ax.set_title('Vietnam GDP and GDP PPP 1990 - 2019', size =14)

In [None]:
#Plotting GPD growth rate for Vietnam and Indonesia, Malaysia, Thailand and Singapore
fig, ax = plt.subplots(figsize =(10,7))
ax.plot(gdp['Year'], gdp['Indonesia %'], color = 'Blue', alpha = 0.3)
ax.plot(gdp['Year'], gdp['Malatsia %'], color = 'Green', alpha = 0.3)
ax.plot(gdp['Year'], gdp['Thailand %'], color = 'Gray', alpha = 0.3)
ax.plot(gdp['Year'], gdp['Singapore %'], color = 'Yellow', alpha = 0.3)
ax.plot(gdp['Year'], gdp['Vietnam %'], color = 'Red')
ax.set_xlabel('Year')
ax.set_ylabel('GDP Growth %')
ax.set_title('Vietnam and some other countries in SEA GDP growth rate', y = 1.1, size = 14)
ax.legend(['Indonesia','Malaysia','Thailand','Singapore','Vietnam'])
plt.show()

Growing rapidly, Vietnam doubled its GDP of 2005 in 2010, and of 2010 in 2019. GDP of the country in 2019 is 261 billion USD.
It keeps a stable growth rate between 5 - 10%, unlike the fluctuated trends of other richer countries in South East Asia. The country's GDP grew despite the big dip of the 1997 Asian financial crisis.  

#### 3.2 Vietnam population
As the purpose of the analysis is making an overview of education in Vietnam,  therefore I interested in the size of the population in school-age. We make the standard age group for primary level is 6 to 10 years old as children in Vietnam start schooling at 6, and the standard age group for K-12 students is 6 to 18 years old. 
The data of these age groups are not available, but WorldBank has the data of population and birth rate of Vietnam from 1960. So I can estimate the number of people in these group by newborn children every year. 
For any given year, the population from 6 to 10 years old are who were born from 10 years ago to 6 years ago. Similarly to the age group 6 - 18. 


In [None]:
#Calculating number of children in 2 groups: 6-10 (equivalent Students in primary schools) and 6-18 - equivalent to k12 students age
X = np.array([pop['Newborn']])
Y = pop.index.tolist()
P = []
K = []
for i in Y:
    p = np.sum(X[0][i-10:i-5])
    k = np.sum(X[0][i-18:i-5])
    P.append(p)
    K.append(k)
print(P)
print(K)

In [None]:
#Adding P as '6-10 yo' and K as '6-18 yo' columns to pop DataFrame
pop['6_10yo'] = P
pop['6_18yo'] = K
#The caculation only makes sense after the year has data availabe 19 years, from 1981. We will slice this part of the dataset for further analysis
#Moreover, the data for GDP only available from 1990, so we can start from this point. 
pg = pop.query('Year >= 1990')

In [None]:
#Population distribution via areas
fig, ax = plt.subplots(figsize = (10, 7))
ax.bar(pg['Year'], pg['Urbanpop'], alpha = 0.4)
ax.bar(pg['Year'], pg['Rural Population'], bottom = pg['Urbanpop'], color = 'Orange', alpha = 0.4)
ax.set_title('Vietnam population in Urban and Rural areas and Growth rate 1990 - 2020', y = 1.1, size = 14)
ax.legend(['Urbanpop','Ruralpop'])
ax.set_xlabel('Year')
ax.set_ylabel('Total population x 100 millions')
ax2 = ax.twinx()
ax2.plot(pg['Year'], pg['Population growth %'])
ax2.set_ylabel('% Population Growth')
plt.show()

Vietnam reached population of 96,5 millions in 2019. Urbanisation has been gradually increasing, and the growth rate has been decresed in the last 3 decades, from over 4% in 1990 to below 3% in 2019. 

#### 3.3 General Education

In [None]:
pdu = pd.merge(pg, edu, on = 'Year')
pdu1 = pdu.query('Year >= 2002')

In [None]:
#Now let see, if all children from 6-10 are getting education
fig, ax = plt.subplots(figsize = (10, 7))
ax.bar(pdu1['Year'], pdu1['6_10yo'], alpha = 0.4)
ax.bar(pdu1['Year'], pdu1['Primary'], alpha = 0.4)
ax.set_ylabel('Millions people')
ax.set_title('Number of 6-10 years old children and Primary students in Vietnam', y = 1.1, size = 14)
ax.legend(['6-10 years old','Primary student'])
plt.show()

In [None]:
fig, ax = plt.subplots(figsize = (10, 7))
ax.bar(pdu1['Year'], pdu1['6_18yo'], alpha = 0.4)
ax.bar(pdu1['Year'], pdu1['K-12'], alpha = 0.4)
ax.set_ylabel('Millions people')
ax.set_title('Population in 16 - 18 years old and K-12 students', y = 1.1, size = 14)
ax.legend(['6-18 yo','K-12'])
plt.show()

The primary school enrollment started to be greater than the number of children in 6 -10 years old since 2010. That can be explained by primary students include those whose age exceeds the official age group. This trend is a shred of evidence which shows the effort of the government to bring the foundation of education to remote areas, where are more difficult for children to go to school, that end up with starting and ending primary schools later than usual. In other words, more and more children in extremely poor areas get approached to the primary education. 

For the general K-12 education, the journey of universal K-12 education is still going on, with the gap between the population of 6-18 years old and K-12 students gradually decreased. 


#### 3.4 Higher Education

In [None]:
#The development of number of students in higher education
fig, ax = plt.subplots()
ax.bar(pdu['Year'], pdu['higher-publicedu'])
ax.bar(pdu['Year'], pdu['higher-prvedu'], bottom = pdu['higher-publicedu'])
ax.set_xlabel('Year')
ax.set_ylabel('Students - Millions')
ax.set_title('Higher Education Students in Public and Private instutions', y = 1.1)
ax.legend(['Public', 'Private'])
plt.show()

There is a big jump down in 2015 because the data from 2015 doesn't include students in college schools. 
To make the information less bias, I make a hypothesis that the growth rate of students in public and private institutions in 2015, 2016, 2017, 2018 equal to the average growth rate of each sector from 1995 to 2014. 
I have prepared another dataset with modified data of higher education students from 2015 to 2018. 

In [None]:
edu2 = pd.read_csv(r'../input/vietnam-population-dgp-education-data/Vietnamstudent2.csv')
#The development of private higher education instutions
fig, ax = plt.subplots(figsize = (12,7))
ax.bar(edu2['Year'], edu2['higher_publicedu'])
ax.bar(edu2['Year'], edu2['higher_prvedu'], bottom = edu2['higher_publicedu'])
ax.set_xlabel('Year')
ax.set_ylabel('Students - Millions')
ax.set_title('Higher Education Students in Public and Private instutions(estimating data for 2015 to 2018)', y = 1.1, size = 14)
ax.legend(['Public', 'Private'])
plt.show()

Not only general education, but higher education in Vietnam also grows fast to generate the high quality of human resource for the sharp development. 
If my estimate is accurate, Vietnam tripped number higher education students of 2000 in 2018 - to 3 million students. They are growing in both public and private institutions. This can be said, besides the intense investment to public education, the Vietnam government also stimulates investors in this sector. 

#### 3.5 Vietnamese students academic performance

In [None]:
sns.set_palette('RdBu')
g = sns.regplot(x = 'GDP_PPP',y ='Avg_score', data = pisa, ci = False, scatter_kws={"s": 100})
g.figure.set_size_inches(16, 8)
g.set_xlim(0,160000)
g.set_ylim(200, 600)
g.set_title('Programme for International Student Assetment (PISA) average score and GDP PPP in 2018', y = 1.1, size = 16)
g.set_xlabel('GDP per capital, purchasing power parity in USD', size = 14)
g.set_ylabel('Average Reading, Mathematics and Science scores', size= 14)
plt.text(5000, 518, 'Vietnam', size = 13, color = 'steelblue')
plt.text(15000, 582, 'China', size = 13, color = 'steelblue')
plt.text(100000, 560, 'Singapore', size = 13, color = 'steelblue')
plt.text(18000, 415, 'Thailand', size = 13, color = 'steelblue')
plt.text(28000, 435, 'Malaysia', size = 13, color = 'steelblue')
plt.text(1000, 385, 'Indonesia', size = 13, color = 'steelblue')
plt.text(42000, 525, 'S.Korea & Japan', size = 10, color = 'steelblue')
plt.text(80000, 480, 'Expecting performance base on GDP trend line', size = 12, color = 'steelblue')
plt.text(140000, 550, '^ GREAT', size = 15, color = 'steelblue')
plt.text(140000, 500, '^ GOOD', size = 15, color = 'steelblue')
plt.show()

Vietnam is a country that has a high performance when it comes to education. It had the highest score in PISA compared to countries with the same level of GDP PPP, and higher than many other high-income countries. The average score of Vietnam is as high as S.Korea and Japan.

### 4. Conclusion
Sharply rising economy, stable GDP growth rate, effective policies for education from the government, and excellent academic performance of students, those factors make a clear vision of sustainable development. They create a competitive investing environment for Vietnam compares to its neighbours as both a market(of almost 100 million population) and production for high-tech fields. 

Investing in education is also promising as it can start anytime, this conclusion comes apart from me being a Vietnamese, that eager to learn and always thirsty for good education is a character of Vietnamese people.
