# Comparative Analysis on AI and ML between High Income Countries and Middle Income Countries

![](https://blogs.forbes.com/duncanmadden/files/2018/03/Literal-Translation-of-Country-Names.jpg)
*Source: forbes.com*

## Introduction

Hi Everyone, this is our first time to participate in the ML & DS Survey 2019. Hoping that our topic catch your attention and can give a valuable information to everyone.

## High and Middle Income Countries Definition

According to the World Bank there are 4 income groups: low income, lower middle, upper middle and high. These groups are classed separately based on Gross National Income (GNI) per capita in US Dollars (converted using the World Atlas Bank method). For 1st July 2019 until the following year; High Income Country (HIC) economies are countries with GNI per capita exceeding 12,376 while Upper Middle Income Country (UMIC) with GNI per capita ranging from 3,996 to 12,375 Lower Middle Income Country (LMIC) with GNI per capita ranging from 1,026 to 3,995.

Economic growth are driven by more than just a mere few factors, however, machine learning (ML) has caught attention of economists as a revolution that cannot be overlooked by policy-makers. A widely respected firm, PWC from the United Kingdom (UK) published “The economic impact of artificial intelligence on the UK economy” in 2017 claiming that “there will be significant gains across all UK region as an impact from AI in 2030 at least as large as 5% of GDP, and extra spending power per household of up to £1,800-£2,300 a year by 2030”. 

Specifically, ML addresses the question of how to build computers that improve automatically through experience. It is an application of artificial intelligence (AI) that allows systems to keep built-in algorithms current by adapting to changes and new data - giving insightful, accurate information without human intervention or assistance. ML focuses on the development of computer programs that can access data and use it learn for themselves. Some examples would be online chatbots on businesses’ websites, or social media services and targeted ads. 

The common point for economics and ML is that they are both driven by data. Albeit the process differ in many aspects, however, the goal is the same - efficiency. With that in mind, let’s think about how ML can help the economy? Intuitively, and supported by the findings in their 2017 report by PWC UK - better productivity!

In [None]:
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

raw_data1 = pd.read_csv('../input/kaggle-survey-2019/multiple_choice_responses.csv' , header=0, skiprows=[1])
raw_data2 = pd.read_csv('../input/kaggle-survey-2019/other_text_responses.csv', header=0, skiprows=[1])
country_list = pd.read_excel('../input/income-countries/CLASS.xls', header=0, skiprows=[0,1,2,3,4])
pd.set_option('display.max_columns', None)

country_list = country_list.drop(columns=['x','x.1','x.4','x.7','x.8'])
country_list = country_list.rename(columns={"x.2": "Country", "x.3": "Abbre", "x.5":"Region", "x.6":"Income Group"})

country_data = country_list
country_data = country_data.replace({'Country': {'Russian Federation': 'Russia', 'Timor-Leste': 'East Timor', 
                                        'United States': 'United States of America',
                                       'Korea, Rep.': 'South Korea', "Korea, Dem. People's Rep.": 'Republic of Korea', 
                                        'United Kingdom': 'United Kingdom of Great Britain and Northern Ireland',
                                       'Hong Kong SAR, China': 'Hong Kong (S.A.R.)', 'Taiwan, China': 'Taiwan', 
                                        'Egypt, Arab Rep.': 'Egypt', 'Vietnam': 'Viet Nam', 
                                        'Iran, Islamic Rep.': 'Iran, Islamic Republic of...'}})                          
data = pd.DataFrame(raw_data1.merge(country_data, left_on='Q3', right_on='Country'))
data = data.replace({'Country': {'Republic of Korea': 'North Korea'}})
data = data.replace({'Q3': {'Republic of Korea': 'North Korea'}})

In [None]:
c=data[1:]
countries=c[["Q3","Income Group", "Q1"]].groupby(["Income Group","Q3"]).count().stack().reset_index()
countries=pd.DataFrame(countries)
countries=countries.rename({'Q3':'country'}, axis='columns')

def colors (row):
    if row['Income Group'] == 'Low income' :
      return 1
    if row['Income Group'] == 'Lower middle income' :
      return 2
    if row['Income Group'] == 'Upper middle income' :
      return 3
    if row['Income Group'] == 'High income' :
      return 4

countries['Color'] = countries.apply (lambda row: colors (row) , axis=1)

import plotly.express as px
fig = px.choropleth(countries,
                    locations="country",
                    color="Color",
                    hover_name= 'Income Group',
                    locationmode = 'country names',
                    color_continuous_scale='haline'
                   )
                    
fig.show()

From this survey, result showed 30 countries are HIC, coloured in yellow (e.g United Kingdom (UK) and Australia) followed by UMIC, 15 countries coloured in green (e.g Malaysia and South Africa) LMIC, by 12 countries coloured in blue (e.g Indonesia and Vietnam) and lastly LIC, by 1 country (i.e North Korea).

## Profile of Respondents

In [None]:
table1 = data['Income Group'].value_counts()
table1 = pd.DataFrame(table1)
table1

This findings showed, 8762 respondents are from HIC, followed by LMIC 6505 respondents, and UMIC 3323 respondents. Since, only 73 respondents represent for LIC, we decided to exclude LIC from this findings. 

In [None]:
analysis_data = data[data['Income Group'] !='Low income']

fig, ax = plt.subplots(figsize=(15,7))
labels = ['High income', 'Lower middle income', 'Upper middle income']
sizes = [8762, 6505, 3323]
labels_gender = ['Female','Male','Prefer not to say/self-describe',
                 'Female','Male','Prefer not to say/self-describe',
                 'Female','Male','Prefer not to say/self-describe']
labels_gender2 = ['Female','Male','Prefer not to say/self-describe']
sizes_gender = [1453, 7120, 189, 1099, 5334, 72, 490, 2780, 53]

cmap1 = plt.get_cmap("tab20c")
cmap2 = plt.get_cmap("Set3")
outer_colors = cmap1(np.arange(3))
inner_colors = cmap2(np.arange(3)*3)
inner_colors = np.concatenate((inner_colors, inner_colors, inner_colors))

explode = (0.1,0.15,0.2) 
explode_gender = (0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1)

plt.pie(sizes, explode=explode, colors=outer_colors, labels=labels, frame=True, autopct='%1.1f%%', radius=3, 
        pctdistance = 0.8, shadow=True, startangle=90)

patches, texts = plt.pie(sizes_gender, explode=explode_gender, radius=2, colors=inner_colors, labeldistance = 0.7,
                         labels=sizes_gender, startangle=95)
plt.legend(patches, labels_gender2, loc="best")

centre_circle = plt.Circle((0,0),1,color='black', fc='white',linewidth=0)
fig = plt.gcf()
fig.gca().add_artist(centre_circle)
 
plt.axis('equal')
plt.tight_layout()
plt.show()

Facts from this survey 47.1% of respondents are classed in the HIC group; meanwhile 17.9% are UMIC and the rest LMIC. 
Surprisingly, LMIC showed higher percentage of respondents compared to UMIC different almost by half. Our assumption is that LMIC have started to explore on the AI and DS recently compared to UMIC. Despite the growth of AI and ML, there are big difference in gender which Male is the dominance group in using AI and ML.

In [None]:
fig, ax = plt.subplots(figsize=(21,8))
ax = sns.countplot(x="Q1", hue = "Income Group", data=analysis_data, 
                   order = ['18-21','22-24','25-29','30-34','35-39','40-44','45-49','50-54','55-59','60-69','70+'])

Survey recorded participants from HIC group and UMIC are mostly aged 25-29; while participants in LMIC group consists of largely those aged 22-24 above or below 21 (18-21).

## Results

In [None]:
cmap = plt.get_cmap("tab20")
bar_colors = cmap(np.arange(12))
g = sns.catplot(x="Q1", hue="Q5", col="Income Group", col_wrap=1,
                data=analysis_data, kind="count", sharex=False, sharey=False,
                col_order= ['High income', 'Upper middle income', 'Lower middle income'],
                height=4, aspect=4, legend_out = False,
                order = ['18-21','22-24','25-29','30-34','35-39','40-44','45-49','50-54','55-59','60-69','70+'],
               palette=bar_colors)

Ignoring the students in the survey, Data Scientist is the most popular occupation among HIC and UMIC where most of the workers are in group 22-29 years old.

In [None]:
cmap = plt.get_cmap("tab20")
bar_colors = cmap(np.arange(12))
g = sns.catplot(x="Q15", hue="Q5", col="Income Group", col_wrap=1,
                data=analysis_data, kind="count", sharex=False, sharey=False,
                col_order= ['High income', 'Upper middle income', 'Lower middle income'],
                height=4, aspect=4, legend_out = False,
                order = ['20+ years','10-20 years','5-10 years','3-5 years','1-2 years','< 1 years',
                         'I have never written code'], palette=bar_colors)

From this analysis, we found out an interesting fact that the involvement of HIC and UMIC in the application of using ML and AI has develop from the past 20 years. Meanwhile, for the LMIC their involment just started to growth. It shows most of the respondent from LMIC, had exprienced in using the application less than a year. 

In [None]:
cmap = plt.get_cmap("tab20")
bar_colors = cmap(np.arange(12))
ax = sns.catplot(x="Q4", hue="Q5", col="Income Group", col_wrap=1,
                data=analysis_data, kind="count", sharex=True, sharey=False,
                col_order= ['High income', 'Upper middle income', 'Lower middle income'],
                height=4, aspect=4, legend_out = False,
                order = ["Doctoral degree","Master’s degree","Professional degree","Bachelor’s degree",
                         "Some college/university study without earning a bachelor’s degree",
                         "No formal education past high school","I prefer not to answer"], palette=bar_colors)
ax.set_xticklabels(rotation=30, horizontalalignment='right')

Overall, HIC, UMIC, and LMIC at least need to have Bachelors's degree for thier involvement in using ML and AI application. However, for HIC and UMIC their level of education is one step higher, which is Master's degree because we believe other than experience, level of educations also play a big role for one's to secure a job in the HIC and UMIC. 

In [None]:
ax = sns.catplot(x="Q5", hue="Q8", col="Income Group", col_wrap=1,
                data=analysis_data, kind="count", sharex=True, sharey=False,
                col_order= ['High income', 'Upper middle income', 'Lower middle income'],
                height=4, aspect=4, legend_out = False)
ax.set_xticklabels(rotation=30, horizontalalignment='right')

Since, the popular job is Data Scientist it proven the respondent's knowledge in ML and AI application is well established, and experience in using this application are between less than 2 years or more than 2 years for HIC,UMIC and LMIC. Meanwhile for Software Engineer, recently show interest in exploring this application. 

We believe, for the next 10 or 20 years, the application of ML and AI is widely used in this 3 income group and in every job sectors. Since, most of the workers are started to exploring and using this application in their production. 

In [None]:
data_compensation = analysis_data[analysis_data['Q5'] !='Student']
fig, ax = plt.subplots(figsize=(21,8)) 
ax = sns.countplot(x="Q10", hue = "Income Group", data=data_compensation,
                   order = ['$0-999','1,000-1,999','2,000-2,999','3,000-3,999','4,000-4,999','5,000-7,499','7,500-9,999',
                            '10,000-14,999','15,000-19,999','20,000-24,999','25,000-29,999','30,000-39,999','40,000-49,999',
                           '50,000-59,999','60,000-69,999','70,000-79,999','80,000-89,999','90,000-99,999','100,000-124,999',
                           '125,000-149,999','150,000-199,999','200,000-249,999','250,000-299,999','300,000-500,000',
                           '> $500,000'])
ax.set_xticklabels(ax.get_xticklabels(), rotation = 25, horizontalalignment='right') 


We are now looking at income group countries against yearly compensation group. Not very surprising that most of the respondents in Kaggler for HIC have high yearly compensation with more than 30,000 compared to other income group countries. For LMIC and UMIC, mostly their yearly compensations are less than 29,999 where LMIC is slightly higher than UMIC except for yearly compensation less than 999 where number of respondents for LMIC are 3 times higher than UMIC.

## Conclusion

With all the excitement that comes with this revolutionary ML there are some who voiced concerns; particularly for the LIC, UMIC and LMIC to cope with the waves of change. HIC economies are looking set to invest and solidify their positions in riding the waves and be race-winners but the rest are not looking good if they are unable to capitalize on these waves. LI as well as middle income nations may no longer be able to count on low-cost manufacturing as their booster for rapid growth and likely struggle to ‘add value’ to ML.

A threatening landscape of inequalities between nations and economies seems to be looming. The accelerated pace that the market leaders are at will create gaps. Therefore, there is a pressing need for policy-makers of all levels of economy to find a solution to this inevitable future.

## Reference

1. https://www.brookings.edu/research/using-big-data-and-artificial-intelligence-to-accelerate-global-development/

2. https://carnegieendowment.org/2019/10/02/world-isn-t-ready-for-ai-to-upend-global-economy-pub-79961

3. https://www.pwc.co.uk/economic-services/assets/ai-uk-report-v2.pdf

4. https://data.worldbank.org/indicator/NY.GNP.PCAP.CD?locations=MY

5. https://www.investopedia.com/terms/m/machine-learning.asp

6. https://en.wikipedia.org/wiki/World_Bank_high-income_economy#cite_note-worldbank1-4

7. https://addepto.com/machine-learning-in-economics-how-is-it-used/
