# Hello World

The following data comes from the  United Nations Population Division Department of Economic and Social Affairs. Specifically, this data is a part of the [World Population Prospects: The 2017 Revision](http://data.un.org/Data.aspx?d=PopDiv&f=variableID:12&c=2,4,6,7&s=_crEngNameOrderBy:asc,_timeEngNameOrderBy:desc,_varEngNameOrderBy:asc&v=1) dataset that has been an ongoing project since the mid 20th-century. It was last updated in June 2017 when it was released as the *2017 Revision* edition of the project.

The numbers used in the following analysis are based off the median fertility variant prediction for 2018. The original dataset depicts the number of people in thousands and can be downloaded [here](http://data.un.org/Data.aspx?d=PopDiv&f=variableID:12&c=2,4,6,7&s=_crEngNameOrderBy:asc,_timeEngNameOrderBy:desc,_varEngNameOrderBy:asc&v=1).

In [324]:
# setting up basic notebook functionalities
import pandas as pd

In [325]:
# importing relevant data
totalpop = pd.read_csv("data/UNdata_totalpop.csv")

In [326]:
# double checking that the csv file was imported correctly and recognizes the second column as numbers
totalpop.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 273 entries, 0 to 272
Data columns (total 2 columns):
Country    273 non-null object
Value      273 non-null float64
dtypes: float64(1), object(1)
memory usage: 4.3+ KB


In [327]:
# creating a dictionary and assinging each row as a unique key with the second row as a value
countries = {row[0] : row[1] for _, row in pd.read_csv("data/UNdata_totalpop.csv").iterrows()}

In [328]:
# defining a few variables to save time/space
usa = countries['United States of America']
world = countries['World']
ratio = round((usa/world *100),0)
ratio

4.0

In [329]:
# basis = proportional size of the world if there were only one american in it
basis = 100/ratio
base = 100/(usa/world*100)
base, basis

(23.358617031008308, 25.0)

In [330]:
# list of 'seven' continents
cont = ['Asia', 'Africa', 'Northern America', 'South America', 'Central America','Europe', 'Australia/New Zealand', 'Oceania']

In [331]:
# creating a new dicitonary with just the continents in it
continents = {k: countries[k] for k in set(cont) & set(countries.keys())}

In [332]:
# combining data to reflect continental divide
continents['Oceania'] = continents['Australia/New Zealand'] + continents['Oceania']
del continents['Australia/New Zealand']

In [333]:
continents['North America'] = continents['Northern America'] + continents['Central America']
del continents['Northern America']
del continents['Central America']

In [334]:
world = sum(continents.values())

In [335]:
# finding their percentage and double checking that they add up to 1
continents.update((x,round((y/world *3.95*basis),1)) for x, y in continents.items())
continents, sum(continents.values())

({'Africa': 16.7,
  'Asia': 58.9,
  'Europe': 9.6,
  'North America': 7.0,
  'Oceania': 0.9,
  'South America': 5.6},
 98.69999999999999)

In [336]:
# rounding to 0 decimal points
continents.update((x,round(y,0)) for x, y in continents.items())
continents, sum(continents.values())

({'Africa': 17.0,
  'Asia': 59.0,
  'Europe': 10.0,
  'North America': 7.0,
  'Oceania': 1.0,
  'South America': 6.0},
 100.0)

In [337]:
# writing dictionary to json file
import json
data = [continents]
with open('sevencont.json', 'w') as txtfile:
    json.dump(data, txtfile, sort_keys=True, indent = 4)

I've officially got it working with the seven continents. For those interested, you can check it out [here](https://farnothing.com/visualization.php) (assuming I haven't changed the link since writing this). Now the task will be to see if I can get it working (with minimal effort) to show all the countries in the world. (Yikes, wish me luck.)

In [338]:
#importing the super crazy large excel file thats the official version of what I used for the seven continents
worldpop = pd.read_excel('data/WPP17_totalpop.xlsx', sheet_name="MEDIUM VARIANT")

In [339]:
#looking to see how many total rows there are
worldpop.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 288 entries, 0 to 287
Data columns (total 91 columns):
Unnamed: 0     283 non-null object
Unnamed: 1     274 non-null object
Unnamed: 2     274 non-null object
Unnamed: 3     36 non-null object
Unnamed: 4     274 non-null object
Unnamed: 5     275 non-null object
Unnamed: 6     274 non-null float64
Unnamed: 7     274 non-null float64
Unnamed: 8     274 non-null float64
Unnamed: 9     274 non-null float64
Unnamed: 10    274 non-null float64
Unnamed: 11    274 non-null float64
Unnamed: 12    274 non-null float64
Unnamed: 13    274 non-null float64
Unnamed: 14    274 non-null float64
Unnamed: 15    274 non-null float64
Unnamed: 16    274 non-null float64
Unnamed: 17    274 non-null float64
Unnamed: 18    274 non-null float64
Unnamed: 19    274 non-null float64
Unnamed: 20    274 non-null float64
Unnamed: 21    274 non-null float64
Unnamed: 22    274 non-null float64
Unnamed: 23    274 non-null float64
Unnamed: 24    274 non-null float64
Un

In [340]:
#removed excess heading stuff, and replaced the header with the relevant data
header = worldpop.iloc[14]
worldpop = worldpop.tail(288-14)
worldpop = worldpop[1:]
worldpop.columns = header
worldpop.head()

14,Index,Variant,"Region, subregion, country or area *",Notes,Country code,2015,2016.0,2017.0,2018.0,2019.0,...,2091.0,2092.0,2093.0,2094.0,2095.0,2096.0,2097.0,2098.0,2099.0,2100.0
15,1,Medium variant,WORLD,,900,7383010.0,7466964.28,7550262.101,7632819.325,7714576.923,...,11066590.0,11082470.0,11097670.0,11112190.0,11126030.0,11139170.0,11151600.0,11163290.0,11174220.0,11184370.0
16,2,Medium variant,More developed regions,a,901,1253210.0,1256576.162,1259922.493,1263199.677,1266335.192,...,1285146.0,1285199.0,1285246.0,1285282.0,1285303.0,1285301.0,1285272.0,1285209.0,1285106.0,1284957.0
17,3,Medium variant,Less developed regions,b,902,6129800.0,6210388.118,6290339.608,6369619.648,6448241.731,...,9781447.0,9797266.0,9812422.0,9826911.0,9840730.0,9853871.0,9866325.0,9878077.0,9889112.0,9899411.0
18,4,Medium variant,Least developed countries,c,941,956631.0,979387.925,1002485.957,1025936.734,1049764.676,...,3024520.0,3045449.0,3065994.0,3086153.0,3105922.0,3125299.0,3144282.0,3162871.0,3181063.0,3198860.0
19,5,Medium variant,"Less developed regions, excluding least develo...",d,934,5173170.0,5231000.193,5287853.651,5343682.914,5398477.055,...,6756927.0,6751818.0,6746428.0,6740758.0,6734808.0,6728572.0,6722042.0,6715206.0,6708049.0,6700551.0


In [341]:
# selecting just the countries (removing all the regions also included in the list)
percountry = worldpop[worldpop['Country code'] < 900]
percountry = percountry[['Region, subregion, country or area *', 2018.0]]
percountry.head()

14,"Region, subregion, country or area *",2018.0
29,Burundi,11216.45
30,Comoros,832.347
31,Djibouti,971.408
32,Eritrea,5187.948
33,Ethiopia,107534.882


In [342]:
world = percountry[2018.0].sum()

In [343]:
percountry[2018.0] = percountry[2018.0] / world *100


In [344]:
percountry[2018.0] = percountry[2018.0].round(2)

In [345]:
percountry.columns = ['Country', 'Population']
percountry.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 233 entries, 29 to 287
Data columns (total 2 columns):
Country       233 non-null object
Population    233 non-null float64
dtypes: float64(1), object(1)
memory usage: 5.5+ KB


In [346]:
# creating a dictionary and assinging each row as a unique key with the second row as a value
percountry = {row[0] : row[1] for _, row in percountry.iterrows()}

In [347]:
# writing dictionary to json file
import json
data = [percountry]
with open('totalpop.json', 'w') as txtfile:
    json.dump(data, txtfile, sort_keys=True, indent = 4)

Adding in functionality for population percentage based on income (upper middle and lower divisions as classified by the 2016 GNI per capita from the world bank)

In [348]:
incomedist = worldpop[(worldpop['Country code'] > 1000)&(worldpop['Country code']<2000)]
incomedist = incomedist[['Region, subregion, country or area *', 2018.0]]
world

7632819.325000001

In [349]:
incomedist[2018.0] = incomedist[2018.0] / world *100

In [350]:
# creating a dictionary and assinging each row as a unique key with the second row as a value
econdist = {row[0] : row[1] for _, row in incomedist.iterrows()}

In [353]:
incomedist.columns = ['Country', 'Population']

del econdist['Middle-income countries']
econdist

{u'High-income countries': 15.684781481448203,
 u'Low-income countries': 9.106204017216141,
 u'Lower-middle-income countries': 40.580115487012385,
 u'Upper-middle-income countries': 34.590558305924525}

In [354]:
# writing dictionary to json file
import json
data = [econdist]
with open('incomepop.json', 'w') as txtfile:
    json.dump(data, txtfile, sort_keys=True, indent = 4)