# Lottery of Birth
The following data comes from the  United Nations Population Division Department of Economic and Social Affairs. Specifically, this data is a part of the [World Population Prospects: The 2017 Revision](http://data.un.org/Data.aspx?d=PopDiv&f=variableID:12&c=2,4,6,7&s=_crEngNameOrderBy:asc,_timeEngNameOrderBy:desc,_varEngNameOrderBy:asc&v=1) dataset that has been an ongoing project since the mid 20th-century. It was last updated in June 2017 when it was released as the *2017 Revision* edition of the project.

The numbers used in the following analysis are based off the median fertility variant prediction for 2015-2020. The original dataset depicts the number of people in thousands and can be downloaded [here](http://data.un.org/Data.aspx?d=PopDiv&f=variableID:12&c=2,4,6,7&s=_crEngNameOrderBy:asc,_timeEngNameOrderBy:desc,_varEngNameOrderBy:asc&v=1).

In [187]:
# setting up basic notebook functionalities
import pandas as pd
from fractions import Fraction
from decimal import Decimal

In [188]:
# importing relevant data
worldbirths = pd.read_excel('data/WPP17_totalbirths.xlsx', sheet_name="MEDIUM VARIANT")

In [189]:
# removed excess heading stuff, and replaced the header with the relevant data
header = worldbirths.iloc[14]
worldbirths = worldbirths.tail(255-13)
worldbirths = worldbirths[1:]
worldbirths.columns = header
worldbirths.head()

14,Index,Variant,"Region, subregion, country or area *",Notes,Country code,2015-2020,2020-2025,2025-2030,2030-2035,2035-2040,...,2050-2055,2055-2060,2060-2065,2065-2070,2070-2075,2075-2080,2080-2085,2085-2090,2090-2095,2095-2100
15,1,Medium variant,WORLD,,900,704458,701332.0,699681.0,702024.0,707655.0,...,711742.0,706622.0,702725.0,699809.0,696550.0,691419.0,683971.0,674834.0,665139.0,655591.0
16,2,Medium variant,More developed regions,a,901,68429,67412.7,65777.3,64932.1,65451.9,...,66480.9,65844.9,65310.8,65212.9,65420.3,65576.4,65365.7,64843.8,64248.4,63766.8
17,3,Medium variant,Less developed regions,b,902,636029,633920.0,633904.0,637092.0,642203.0,...,645261.0,640777.0,637415.0,634596.0,631130.0,625843.0,618605.0,609990.0,600890.0,591824.0
18,4,Medium variant,Least developed countries,c,941,160742,170343.0,179798.0,188660.0,196649.0,...,215402.0,220119.0,224164.0,227205.0,229237.0,230349.0,230648.0,230227.0,229140.0,227327.0
19,5,Medium variant,"Less developed regions, excluding least develo...",d,934,475287,463577.0,454106.0,448432.0,445554.0,...,429859.0,420659.0,413251.0,407391.0,401894.0,395494.0,387957.0,379763.0,371750.0,364497.0


In [190]:
# selecting just the countries (removing all the regions also included in the list)
percountry = worldbirths[worldbirths['Country code'] < 900]
percountry = percountry[['Region, subregion, country or area *', '2015-2020']]
percountry.head()

14,"Region, subregion, country or area *",2015-2020
29,Burundi,2297.63
30,Comoros,132.115
31,Djibouti,108.394
32,Eritrea,803.674
33,Ethiopia,16476.6


In [191]:
# converting from births every five years to annual birth rate and accounting for original table being in thousands.
percountry['2015-2020'] = percountry['2015-2020']/5 * 1000
percountry.head()

14,"Region, subregion, country or area *",2015-2020
29,Burundi,459527.0
30,Comoros,26423.0
31,Djibouti,21678.8
32,Eritrea,160735.0
33,Ethiopia,3295320.0


In [192]:
# renaming columns of the 'percountry' table
percountry.columns = ['Country', 'Annual Births']
percountry.head()

Unnamed: 0,Country,Annual Births
29,Burundi,459527.0
30,Comoros,26423.0
31,Djibouti,21678.8
32,Eritrea,160735.0
33,Ethiopia,3295320.0


In [193]:
# defining the total amount of births per year
totalbirths = percountry['Annual Births'].sum()

In [194]:
totalbirths

140877657.60000008

In [195]:
# adding two empty columns to place data in
percountry['Percent'] = [[]] *len(percountry)
percountry['Fraction']= [[]]* len(percountry)

In [196]:
# defining the percent column, adding data, and double checking type of data
percountry['Percent'] = percountry['Annual Births'] / totalbirths *100
percountry.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 201 entries, 29 to 255
Data columns (total 4 columns):
Country          201 non-null object
Annual Births    201 non-null object
Percent          201 non-null object
Fraction         201 non-null object
dtypes: object(4)
memory usage: 7.9+ KB


In [197]:
# kept getting errors when attempting to find the fraction. This attempts to convert the column to a float value in order to do numerical operations
percountry['Percent']=percountry.Percent.astype(float)

In [198]:
# another attempt to make the column do what I need it to (this one also seems to fail, although it doesn't return an error)
percountry['Fraction']= pd.to_numeric(percountry['Percent'], errors='coerce')

In [199]:
# finally figure out what I was doing wrong
percountry['Fraction'] = pd.Series(percountry['Percent']).infer_objects()

In [200]:
# loop through all the rows in the table to determine the fraction
data = percountry['Fraction']
percountry['Fraction'] = [Fraction(round(x,2)/1000).limit_denominator() for x in data]

In [201]:
percountry.head()

Unnamed: 0,Country,Annual Births,Percent,Fraction
29,Burundi,459527.0,0.326189,33/100000
30,Comoros,26423.0,0.018756,1/50000
31,Djibouti,21678.8,0.015388,1/50000
32,Eritrea,160735.0,0.114095,11/100000
33,Ethiopia,3295320.0,2.339137,117/50000


## Country Classification
This second dataset is taken from the Statistical Annex portion of the United Nations *World Economic Situation and Prospects 2018 Report*, found on pages 139-147. Specifically, the list of developed economies comes from Table A "Developed economies" (page 141), the list of economies in transition comes from Table B "Economies in transition" (page 141), and the list of developing economies comes from Table C "Developing economies by region" (page 142). The orignal report which these lists were transcribed from can be found [here](https://www.un.org/development/desa/dpad/wp-content/uploads/sites/45/publication/WESP182018_Full_Web-1.pdf). The excel document that was created based off of this document can be downloaded [here](data/WESP18_countryclassification.xlsx).

It should be noted that some of the country names have been modified in order to match the formating of the UNited Nations Population Division Department of Economic and Social Affairs *World Population Prospects: The 2017 Revision*.

In [202]:
# importing relevant data
countryclass = pd.read_excel('data/WESP18_countryclassification.xlsx', sheet_name="2018")
countryclass = countryclass[countryclass['Development Code'] > 0]

In [203]:
majordeveloped

Unnamed: 0,Country,Annual Births,Percent,Fraction,Status
62,Japan,1027890.0,0.729636,73/100000,1200.0
128,United Kingdom,805698.0,0.571913,57/100000,1200.0
133,Italy,484046.0,0.343593,17/50000,1200.0
143,France,760039.0,0.539503,27/50000,1200.0
144,Germany,728033.0,0.516784,13/25000,1200.0
186,Canada,387942.0,0.275375,7/25000,1200.0
187,United States of America,4120700.0,2.925017,293/100000,1200.0


In [204]:
# rename the columns of the 'countryclass' table.
countryclass.columns = ['Country', 'Status']

In [205]:
# attempt to combine the development status table and the births per country table (this didn't do what I was wanting it to do, so I had to try a different way of doing it)
percountry.join(majordeveloped,on='Country',how='left', rsuffix='_right')

Unnamed: 0,Country,Annual Births,Percent,Fraction,Country_right,Annual Births_right,Percent_right,Fraction_right,Status
29,Burundi,459527,0.326189,33/100000,,,,,
30,Comoros,26423,0.018756,1/50000,,,,,
31,Djibouti,21678.8,0.015388,1/50000,,,,,
32,Eritrea,160735,0.114095,11/100000,,,,,
33,Ethiopia,3.29532e+06,2.339137,117/50000,,,,,
34,Kenya,1.54722e+06,1.098275,11/10000,,,,,
35,Madagascar,852114,0.604861,3/5000,,,,,
36,Malawi,688217,0.488521,49/100000,,,,,
37,Mauritius,13386.2,0.009502,1/100000,,,,,
38,Mayotte,7248.6,0.005145,1/100000,,,,,


In [206]:
# combine the development status table and the births per country table
ratestatus = pd.merge(percountry,countryclass, on='Country', how='outer')

In [207]:
# likelihood of being born into a highly developed country
majordeveloped = ratestatus[ratestatus['Status'] == 1200]
majorchance = (majordeveloped['Annual Births'].sum()) / totalbirths *100

In [208]:
# likelihood of being born into a developed country
developed = ratestatus[ratestatus['Status'] == 1201]
develchance = (majordeveloped['Annual Births'].sum() + developed['Annual Births'].sum()) / totalbirths *100
develchance

7.885286133547973

In [209]:
# likelihood of being born into a transitioning country
transition = ratestatus[ratestatus['Status'] == 1202]
transchance = (transition['Annual Births'].sum()) / totalbirths *100
transchance

3.1160445700085213

In [210]:
# likelihood of being born into a developing country
developing = ratestatus[ratestatus['Status'] == 1203]
developingchance = (developing['Annual Births'].sum()) / totalbirths *100
developingchance

88.22891033077478

In [211]:
# creating a dictionary and assinging each row as a unique key with the second row as a value
annualbirth = {row[0] : row[1] for _, row in percountry.iterrows()}
perbirth = {row[0] : row[2] for _, row in percountry.iterrows()}
fracbirth = {row[0] : row[3] for _, row in percountry.iterrows()}

In [212]:
chance = { 'Major Developed': majorchance, 'Developed': develchance, 'Transitioning': transchance, 'Developing': developingchance}

In [213]:
# writing dictionaries to json file
import json
data = [annualbirth]
with open('json/annualbirth.json', 'w') as txtfile:
    json.dump(data, txtfile, sort_keys=True, indent = 4)

In [214]:
# writing dictionaries to json file
import json
data = [perbirth]
with open('json/countrylottery.json', 'w') as txtfile:
    json.dump(data, txtfile, sort_keys=True, indent = 4)

In [215]:
# writing dictionaries to json file
import json
data = [chance]
with open('json/worldlottery.json', 'w') as txtfile:
    json.dump(data, txtfile, sort_keys=True, indent = 4)

## American Classification
This next portion will (attempt) to do two different things: first, it will look at how likely you are to be born into the US, and then born into a family above the poverty line, and then one who goes to college. Secondly, this will morph into/transition into an analysis of the wealth distribution within america, and then zoom back out to look at the wealth distribution accross the world and compare it to how the US stacks up. 

But wealth is entierly subjective, right? So how can we look at someone in America compared to someone in say, the philipenes as far as wealth goes when obviously the exchange rates won't be the same. This portion will also attempt to look at how much average goods cost, and what countries spend the most money on overall.

This could also be a way to transition into looking at what people worry about across the world, and what people spend their time on. And finally, that could then be a transition into the World Happiness Index that looks at various metrics to determine how 'happy' people are. 

In [216]:
percountry[percountry['Country'] =='United States of America']

Unnamed: 0,Country,Annual Births,Percent,Fraction
237,United States of America,4120700.0,2.925017,293/100000


In [217]:
usbirth = percountry.loc[237]['Percent'] * (1.00-.14)

In [218]:
usbirth # = percent of the world born into the US above the poverty line.

2.5155144771515547

middle income = ( according to the pEW research) Americans whose annual household income is two-thirds to double the national median, after incomes have been adjusted for household size. Translation? Middle income americans are those who make 1.5x as much as the median. For example, if the median income is 50k a year, then middle income americans would be those who make (50k)(1.5) = 66.6k a year.