# Mini Project for DSE200x

* Submitted by: Anivarth Peesapati

## Who is performing better among SAARC nations?

SAARC(South Asian Association for Regional Co-operation) is a group of nations which include Afghanistan, Bangladesh, Bhutan, India, Maldives, Nepal, Pakistan, Sri Lanka.

The main aim of the SAARC is to improve the trade, promote peace, agriculture, scientific research, disaster management etc.

With the data provided, who is performing better among SAARC nations? We will analyze only few important factors which might gives us a better understanding of the countries.

# Importing python modules
Lets first import all the python modules that we might use for the project.

In [None]:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
#import folium
import glob
import os

In [None]:
def plotting(variable, xlabel, ylabel, title, figsize):
    plt.figure(figsize = figsize)
    saarc_countries_codes = ["IND", "AFG", "PAK", "BTN", "NPL", "BGD", "MDV", "LKA"]           
    for country in saarc_countries_codes:
        mask = variable["CountryCode"].str.contains(country)
        variable_country = variable[mask]
        plt.plot(variable_country["Year"], variable_country["Value"], label = country)
    plt.xlabel(xlabel)
    plt.ylabel(ylabel)
    plt.title(title)
    plt.ylim(ymin = 0)
    plt.grid(True)
    plt.legend()
    plt.show()

# Importing Data

In [None]:
data = pd.read_csv("./world-development-indicators/Indicators.csv")

data.columns

## Data for SAARC country

As we are only working on SAARC countries, we will remove all other countries data except SAARC.

In [None]:
mask_saarc = data["CountryCode"].isin(["IND", "AFG", "PAK", "BTN", "NPL", "BGD", "MDV", "LKA"])

data_saarc = data[mask_saarc]

data_saarc.columns

## 1. GDP

Even though GDP alone cannot say about a country, the most important factor that determines the economic power of a country is GDP. So we will have a look at the country's GDP over the years.

In [None]:
mask_for_gdp = data_saarc["IndicatorName"].str.contains("^GDP at market prices \(constant 2005 US\$\)")

gdp_data = data_saarc[mask_for_gdp]

gdp_data.head()

In [None]:
plotting(gdp_data, "Year", "GDP at market prices (constant 2005 US $)", "GDP among SAARC", (10, 10))

** India is the winner among SAARC in overall GDP.**

Even though the GDP tells us the economic power of the country, we have to take a deeper look at GDP per capita which will clearly explain the people's contribution to the GDP.

In [None]:
mask_gdp_per_capita = data_saarc["IndicatorName"].str.contains("GDP per capita \(constant 2005 US\$\)")

gdp_per_capita = data_saarc[mask_gdp_per_capita]

gdp_per_capita.head()

In [None]:
plotting(gdp_per_capita, "Year", "GDP per capita", "GDP per capita(Constant 2005 USD $)", (10, 10))

Clearly **Maldives is the winner when it comes to GDP per capita**.

Tourism is the main industry of Maldives. According to <a href="https://en.wikipedia.org/wiki/Economy_of_the_Maldives">Wiki</a>, about 28% of the GPD was from tourism. Around 68% of tourists stay in maldives for more than 8 days.

India is far behind Maldives and is less than Bhutan and Sri Lanka. As all the other SAARC countries have very less population when compared to India, higher GDP levels can be attributed to the larger population.

To delve deep, we will check out the following factors for each country.(Source: <a href="https://prezi.com/s0x-bgq3rg9i/factors-affecting-gdp-per-capita-of-a-country/">Factors affecting GDP per capita</a>)
* Literacy Rate
* Unemployment Rate
* poverty
* Taxes
* Tourism

## 2. Literacy Rate

The more the people are literate, they add to the GDP of the country. But this is not certain. For example, in cuba a taxi driver earns more when compared to more literate doctor. Having said that we cannot negled the literacy rate when understanding a country.

Afterall "*Knowledge is wisdom*"

In [None]:
mask_literacy_rate = data_saarc["IndicatorName"].str.contains("Adult literacy rate, population 15\+ years, both sexes")

literacy_rate = data_saarc[mask_literacy_rate]

literacy_rate.head()

In [None]:
plotting(literacy_rate, "Year", "Adult Literacy rate", "Adult literacy rate, population 15+ years %", (10, 10))

**Maldives is again the winner here**. As can be seen, almost 100% of the population of Maldives were educated.

While between 1980 and 1985 Sri Lanka have started with same literacy rate Malidives have surpassed Sri lanka.

India will have to improve a lot in education sector to improve its GDP per capita.

But when considering the huge population of India, it can be seen that there are:

almost 0.9 billion educated people in India when compared to 0.3 million literates in Maldives.

## 3. Unemployment

*Hunger is not the worst feature of unemployment, Idleness is;*

In [None]:
mask_unemployment_rate = data_saarc["IndicatorName"].str.contains("Unemployment, total \(% of total labor force\)")

unemployment_rate = data_saarc[mask_unemployment_rate]

unemployment_rate.head()

In [None]:
plt.figure(figsize = (15, 15))

plt.title("Unemployment rate for SAARC. (Y-axis ajusted to accomodate all the values)")

i = 1

for country in saarc_countries_codes:
    plt.subplot(3, 3, i)
    i += 1
    mask = unemployment_rate["CountryCode"].str.contains(country)
    ue = unemployment_rate[mask]
    plt.plot(ue["Year"], ue["Value"])
    plt.xlabel("Year")
    plt.ylabel("Value %")
    plt.title("% Unemployment Rate in "+country+"(Y-axis adjusted)")

plt.show()

The country which started with very big number and showed a very good improvement was Sri Lanka.

The country with very lowest unemployment rate is Bhutan. One reason might be the construction industry in Bhutan. The largest contributor for Bhutan economy is Hydro Electric power plants. Because of this there is a lot of construction activity going on. This might be the reason why Bhutan has the lowest unemployment rate among the SAARC countreies. Source: <a href="http://www.molhr.gov.bt/molhr/wp-content/uploads/2016/06/National-Workforce-Plan-2016-22.pdf">National Workforce plan</a>.

Two countries Nepal and Bhutan have atleast 1/3 of the people employed in Agriculture. These two countries share their borders with India and China. They have the lowest unemployment rate.

Nepal and Bhutan also has lower literacy rate when compared to India. Eventhough they good literacy rate(about 60%), they have very less unemployment rate. This reinforces that the people of these two countries are mostly unskilled and work in agriculture and other sectors like construction.

** Just for the sake of our understanding we will try to use correlation between literacy rate and Unemployment rate.**

** As we have found that Nepal and Bhutan are agricultural countries, we will also take a look at agriculture**

There is one more interesting observation among these countries: Except Maldives, almost all the countries showed a decline in unemployment rate between 1995 and 2000. 

For most of the countries the unemployment increased from 2000 to 2005.

### 3.1 Literacy rate vs Unemployment rate

In [None]:
literacy_rate_a = literacy_rate[["CountryCode", "Year", "Value"]]

unemployment_rate_a = unemployment_rate[["CountryCode", "Year", "Value"]]

result = pd.merge(left = literacy_rate_a, right = unemployment_rate_a, 
                  on = ["CountryCode", "Year"], suffixes = ("_literacy", "_unemployment"))

result.head()

In [None]:
result[["Value_literacy", "Value_unemployment"]].corr()

In [None]:
for country in saarc_countries_codes:
    mask = result["CountryCode"].str.contains(country)
    result_country = result[mask]
    print("Correlation between Literacy and Unemployment for "+country)
    print("\n")
    print(result_country[["Value_literacy", "Value_unemployment"]].corr())
    print("\n")

The main problem with Afganistan and Bhutan is that many data points are missing in Literacy rate data.

Leaving aside AFG and BTN, all the other countries except IND and LKA have shown increase in Unemployment rate with increase in Literacy rate.

Even Sri Lanka has not shown very good positive correlation between Literacy rate and Unemployment rate when compared to India. We can further analyze this by taking parameters like Dropouts in primary education, secondary education etc., but is not in the scope of the research question.

### 3.2 Agriculture

In [None]:
agri_data_mask = data_saarc["IndicatorName"].str.contains("Agriculture, value added \(\% of GDP\)")

agri_data = data_saarc[agri_data_mask]

agri_data.head()

In [None]:
plotting(agri_data, "Year", "Value", agri_data.iloc[0]["IndicatorName"], (10, 10))

This graph seems to be exactly the opposite of the Literacy rate. Interesting isn't it?

We will also see the employment in agriculture.

In [None]:
agri_emp_data_mask = data_saarc["IndicatorName"].str.contains("Employment in agriculture \(\% of total employment\)")

agri_emp_data = data_saarc[agri_emp_data_mask]

agri_emp_data.head()

plotting(agri_emp_data,
        "Year",
        "% Value",
        agri_emp_data.iloc[0]["IndicatorName"],(10, 10))

Somewhat similar to the %Value added graph. There seems to be some correlation between employment in agriculture and unemployment rate. We will check that out.

In [None]:
agri_emp_data_a = agri_emp_data[["Year", "CountryCode", "Value"]]

result = pd.merge(left = agri_emp_data_a, right = unemployment_rate_a, 
                  on = ["CountryCode", "Year"], suffixes = ("_agri", "_unemployment"))

result.head()

In [None]:
result[["Value_agri", "Value_unemployment"]].corr()

Thats a moderate relationship.

We can assume that the more the people employed in the agriculture, there are higher chances that lesser people are unemployed.

At the end, for the overall **Unemployment scenario Bhutan is the winner**.

## 4. Poverty
*Poverty is like punishment for crime, you didn't commit!*

In [None]:
poverty_mask = data_saarc["IndicatorName"].str.contains("Poverty headcount ratio at \$1.90 a day \(2011 PPP\) \(\% of population\)")

poverty = data_saarc[poverty_mask]

poverty.head()

In [None]:
plotting(poverty, "Year", "Poverty %", poverty.iloc[0].IndicatorName, (10, 10))

** Sri Lanka is the winner here**
The reasons can be found from the following links.
* [Wiki](https://en.wikipedia.org/wiki/Poverty_in_Sri_Lanka)
* [Youtube - Worldbank](http://www.worldbank.org/en/news/video/2016/02/15/ending-poverty-and-boosting-prosperity-sri-lanka-systematic-country-diagnostic)

## 5. Taxes

In [None]:
tax_mask = data_saarc["IndicatorName"].str.contains("Tax revenue \(\% of GDP\)")

tax = data_saarc[tax_mask]

tax.head()

In [None]:
plotting(tax, "Year", "Tax Revenue % of GDP", tax.iloc[0].IndicatorName, (10, 10))

The lowering trend of tax revenue as a percent of GDP might be the Civil war that started in 1983.

Nepal has the highest Tax revenue. It increased gradually from very low number to the highest. The main reason might be the government's measures on tax sops. Source: [Tax in Nepal](https://thehimalayantimes.com/opinion/tax-system-in-nepal-valid-reforms-essential/).

## 6. Tourism

In [None]:
tourism_mask = data_saarc["IndicatorName"].str.contains("International tourism, expenditures \(current US\$\)")

tourism = data_saarc[tourism_mask]

tourism.head()

In [None]:
plotting(tourism, "Year", "Expenditure", tourism.iloc[0].IndicatorName, (10, 10))

** India is the winner **. Because of its varied culture and heritage it was able to attract many tourists from the world.

## 7. Income per capita

In [None]:
income_mask = data_saarc["IndicatorName"].str.contains("Survey mean consumption or income per capita\, total population \(2011 PPP \$ per day\)")

income = data_saarc[income_mask]

income.head()

In [None]:
plotting(income, "Year", "Income per capita", income.iloc[0].IndicatorName, (10, 10))

Seems intersting, again Nepal and Bhutan make it to the top list.

** Bhutan is the winner**.

## Conclusion

After considering many factors it seems that Bhutan has performed consistently well among all the SAARC nations. One reason might be that this is the only country among the SAARC nations which has a King. They also give importance to GNH(Gross National Happiness) rather than GDP. I think the more the happier the people the better they work for their career with a hope to be successful. 

Even though India has performed well in many of the factors in terms of the numbers when seen from the whole country perspective, because of its population, its per capita is always lower than its neighbours in many parameters. I think India who has a better resources than all its neighbours should improve taking some of its neighbours as its role models.

So my vote goes to Bhutan as the better performing country among SAARC. What is your opinion?