# GDP Data Analysis with Numpy

-  We will analyze the [World Bank national GDP data](https://data.worldbank.org/indicator/NY.GDP.MKTP.CD) from 2012 to 2017.
-  Data file `GDP.csv` is downloadable from the class repository.

In [1]:
# import pandas
import pandas as pd
# import Numpy
import numpy as np

## Load the Data Set

-  The code snippet below reads the data set and generates `gdp`, a Numpy ndarray.

In [10]:
# load the data set
df = pd.read_csv('GDP.csv')
# gdp is a numpy ndarray
gdp = df.loc[:, '2012':'2017'].values

In [12]:
df

Unnamed: 0,Country Name,Country Code,Region,2012,2013,2014,2015,2016,2017
0,Afghanistan,AFG,South Asia,0.0200,0.0206,0.0205,0.0199,0.0194,0.0202
1,Albania,ALB,Europe & Central Asia,0.0123,0.0128,0.0132,0.0114,0.0119,0.0130
2,Algeria,DZA,Middle East & North Africa,0.2091,0.2098,0.2138,0.1660,0.1601,0.1676
3,American Samoa,ASM,East Asia & Pacific,0.0006,0.0006,0.0006,0.0007,0.0007,0.0006
4,Andorra,AND,Europe & Central Asia,0.0032,0.0033,0.0034,0.0028,0.0029,0.0030
5,Angola,AGO,Sub-Saharan Africa,0.1281,0.1367,0.1457,0.1162,0.1011,0.1221
6,Antigua and Barbuda,ATG,Latin America & Caribbean,0.0012,0.0012,0.0013,0.0014,0.0015,0.0015
7,Argentina,ARG,Latin America & Caribbean,0.5460,0.5520,0.5263,0.5947,0.5575,0.6427
8,Armenia,ARM,Europe & Central Asia,0.0106,0.0111,0.0116,0.0106,0.0105,0.0115
9,Aruba,ABW,Latin America & Caribbean,0.0025,0.0026,0.0026,0.0027,0.0026,0.0027


## Explore Data

-  In this section and the next, explore and manipulate the data in the Numpy array `gdp`.
-  The array contains the national GDP data (in trillion US Dollars) from 2012 through 2017. The countries are organized by rows. Each column includes the national GDP data in a year. 
-  Write a Python code snippet with Numpy to answer each question. Do **not** use any explicit loop.

### Question 1. How many rows (countries) are there in array `gdp`?

In [14]:
pass
gdp.shape[0]

197

### Question 2. How many columns (years) are there in array `gdp`?

In [15]:
pass
gdp.shape[1]

6

### Question 3. What is the data type of array `gdp`?

In [18]:
pass
type(gdp)

numpy.ndarray

### Question 4. Output the first five countries' GDPs from 2013 through 2016 (from the second column through the fifth column). 

In [21]:
pass
df.loc[:, '2013':'2016'].head()

Unnamed: 0,2013,2014,2015,2016
0,0.0206,0.0205,0.0199,0.0194
1,0.0128,0.0132,0.0114,0.0119
2,0.2098,0.2138,0.166,0.1601
3,0.0006,0.0006,0.0007,0.0007
4,0.0033,0.0034,0.0028,0.0029


### Question 5. Output the last ten countries' GDPs in 2017 (the last column).

In [22]:
pass
df.loc[:,'2017'].tail(10)

187    19.4854
188     0.0565
189     0.0592
190     0.0008
191     0.2238
192     0.0039
193     0.0145
194     0.0268
195     0.0259
196     0.0228
Name: 2017, dtype: float64

### Question 6. Was the eighth country's GDP in 2017 higher than 0.5 trillion US Dollars? (True or false?)

In [23]:
pass
df.loc[7, '2017'] > 0.5

True

## Manipulate and Aggregate Data

### Question 7. How many GDP values in the array are higher than 0.5 trillion US Dollars?

*Hint: use `np.sum()` to count the number of True elements.*

In [24]:
pass
np.sum(gdp > 0.5)

142

### Question 8. How many countries had a GDP higher than 0.5 trillion US Dollars in 2017? 

In [25]:
pass
np.sum(df.loc[:,'2017'] > 0.5)

23

### Question 9. Out of those countries that had a GDP higher than 0.5 trillion US Dollars in 2017, how many countries' GDP in 2016 was lower than 0.5 trillion US Dollars?

In [27]:
pass
np.sum(df.loc[df.loc[:, '2017'] > 0.5, '2016'] < 0.5)

1

### Question 10. How many countries had a lower GDP in 2017 than in 2016?

In [31]:
pass
np.sum(df.loc[:,'2016'] > df.loc[:, '2017'])

14

### Question 11. Output the row index of the country with the highest GDP in 2015.

*Hint: use `np.argmax()`.*

In [32]:
pass
np.argmax(df.loc[:, '2015'])

The current behaviour of 'Series.argmax' is deprecated, use 'idxmax'
instead.
The behavior of 'argmax' will be corrected to return the positional
maximum in the future. For now, use 'series.values.argmax' or
'np.argmax(np.array(values))' to get the position of the maximum
row.
  return getattr(obj, method)(*args, **kwds)


187

### Question 12. Output the first fifteen countries' respective average yearly GDP from 2012 through 2017.

- Hint: use parameter `axis` in `np.mean()`.

In [35]:
pass
np.mean(df.loc[0:15, '2012':'2017'],axis = 1)

0     0.020100
1     0.012433
2     0.187733
3     0.000633
4     0.003100
5     0.124983
6     0.001350
7     0.569867
8     0.010983
9     0.002617
10    1.413700
11    0.412367
12    0.058500
13    0.011350
14    0.032567
15    0.187083
dtype: float64

### Question 13. What was the global GDP in 2016 and 2017, respectively?

In [40]:
pass
np.sum(df.loc[:, '2016'])

74.9563

In [41]:
np.sum(df.loc[:, '2017'])

79.66299999999998

### Question 14. What was the fifth highest national GDP in 2017?

*Hint: use `np.sort()`.*

In [42]:
pass
np.sort(df.loc[:, '2017'])[-5]

2.6526

### Question 15. How many countries' GDP increased by at least 30% from 2012 to 2017?

In [45]:
pass
np.sum(df.loc[:, '2012'] * 1.3 < df.loc[:, '2017'])

38