# Final Proposal: the relationship between socio-economic factors and the age of female marriage

**Principal investigators:** Oleksandra Plyska and Alia Abboud

**Email:** ovp203@nyu.edu, aaa832@nyu.edu



This project aims to study how communities prioritize female marriage based on how they value female secondary education. We will study the relationship between women married early and women who attended secondary education.


The goal is not just to find the relationship between education and marriage for females but also other circumstances that surround such relationship. That is because, while there might be a correlation between the first two variables, the other variables may have impact as well. The data for those variables has been collected from the World Bank site.
An expected correlation is that the higher percentage of female enrollment in secondary education, the lower percentage of those married by the age of 18. This hypothesis is based of our belief that when female education is endorsed within a community, girls have more opportunities and pursue their careers then decide on marriage later in their lives. However, an article posted two months ago by The Atlantic had the headline “The More Gender Equality, The Fewer Women in STEM” showed that women in countries where there is oppression or relatively less gender equality pursue a career in STEM -science, technology, engineering, and math- more than those in countries with relatively higher gender equality. One explanation was that these women in the oppressed countries have the urge to rebel or prove themselves so they pursue the unconventional whereas those who live in more liberal countries do not view it as challenging. This interesting perspective only made us more determinant to carry our test and see what results we’ll get.

Initially, the main ways in which we are planning on presenting our data is through line graphs, bar graphs and maps. The graphs are for quick presentation of the trend whereas the map would be for further quantifying and grouping the values while locating them on maps to be able to assess the countries or areas with common features and trends. We intend to make graphs and charts using data for two time periods: 2006 and 2016 in order to observe the differences and patterns across time. It is within our expectations to see that the overall age of marriage is increasing over time due to social, economic, and cultural changes in the world. Moreover, it would be interesting to observe whether the relationship between variables across time is stronger or weaker between the groups of countries, which we expect to demonstrate on a map where the variables will correlate with the intensity of color across countries. We would categorize the countries by their level of GDP and create graphs that would take a closer look at countries with high, medium and low GDP. In addition to that, we will create closer look graphs that will demonstrate a relationship between countries in one geographic region (Latin America, Middle East, Europe, etc.). That will allow us to observe the variation and differences on a smaller scale. 


Ultimately, our project will serve as an explanation of the factors that influence the early marriages across countries, which would be demonstrated visually with the use of Python data analysis tools. 



# Data report

**Overview:** the data for our project comes from the World Bank site: (http://www.worldbank.org). 

**Important variables:** The key indicator that we are going to pull out from the World Bank is early female marriage which is defined as:

Women who were first married by age 18 (% of women ages 20-24) based on Demographic and Health Surveys ( DHS ), Multiple Indicator Cluster Surveys ( MICS ), AIDS Indicator Surveys( AIS ), Reproductive Health Survey( RHS ), and other household surveys.

The independent variables we will be measuring include: 

Female secondary education level that will be meassured as the percentage of females within the pupils in each country who are in secondary education. The project also aims to look at other variables and their relation with our two main variables like GDP annual growth and employment in senior and middle management.

**Requisite Packages** Below we bring in the packages we need...

In [2]:
import pandas as pd # We know this one...
import requests # This is usefull with the API
import numpy as np # For performing numerical analysis
import matplotlib.pyplot as plt # Plotting

**Grabing the Data:** We will grad the data for all of the variables from the excel sheet we downloaded from the World Bank website. The dataset has all the variables listed above in one column "Series name".

In [3]:
url = "https://raw.githubusercontent.com/ovp203/My_first_repository_/master/Databootcamp_project_Oleksandra_Alia%20-%20Sheet1-2.csv"
data = pd.read_csv(url)

In [4]:
data.head(10)

Unnamed: 0,Country Name,Country Code,Series Name,Series Code,2006 [YR2006],Year 2016
0,Afghanistan,AFG,Female share of employment in senior and middl...,SL.EMP.SMGT.FE.ZS,..,..
1,Albania,ALB,Female share of employment in senior and middl...,SL.EMP.SMGT.FE.ZS,..,..
2,Algeria,DZA,Female share of employment in senior and middl...,SL.EMP.SMGT.FE.ZS,..,..
3,American Samoa,ASM,Female share of employment in senior and middl...,SL.EMP.SMGT.FE.ZS,..,..
4,Andorra,AND,Female share of employment in senior and middl...,SL.EMP.SMGT.FE.ZS,..,..
5,Angola,AGO,Female share of employment in senior and middl...,SL.EMP.SMGT.FE.ZS,..,..
6,Antigua and Barbuda,ATG,Female share of employment in senior and middl...,SL.EMP.SMGT.FE.ZS,..,..
7,Arab World,ARB,Female share of employment in senior and middl...,SL.EMP.SMGT.FE.ZS,..,..
8,Argentina,ARG,Female share of employment in senior and middl...,SL.EMP.SMGT.FE.ZS,29.41,..
9,Armenia,ARM,Female share of employment in senior and middl...,SL.EMP.SMGT.FE.ZS,..,..


In [5]:
data.drop(["Series Code"], axis=1, inplace = True)


In [9]:
data.head(50)

Unnamed: 0,Country Name,Country Code,Variable,Year 2006,Year 2016
0,Afghanistan,AFG,Female share of employment in senior and middl...,..,..
1,Albania,ALB,Female share of employment in senior and middl...,..,..
2,Algeria,DZA,Female share of employment in senior and middl...,..,..
3,American Samoa,ASM,Female share of employment in senior and middl...,..,..
4,Andorra,AND,Female share of employment in senior and middl...,..,..
5,Angola,AGO,Female share of employment in senior and middl...,..,..
6,Antigua and Barbuda,ATG,Female share of employment in senior and middl...,..,..
7,Arab World,ARB,Female share of employment in senior and middl...,..,..
8,Argentina,ARG,Female share of employment in senior and middl...,29.41,..
9,Armenia,ARM,Female share of employment in senior and middl...,..,..


In [7]:
data.rename(columns={"Series Name":"Variable", "2006 [YR2006]": "Year 2006"}, inplace=True)


In [37]:
data.head(3)

Unnamed: 0,Country Name,Country Code,Variable,Year 2006,Year 2016
0,Afghanistan,AFG,Female share of employment in senior and middl...,..,..
1,Albania,ALB,Female share of employment in senior and middl...,..,..
2,Algeria,DZA,Female share of employment in senior and middl...,..,..


In [33]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1062 entries, 0 to 1061
Data columns (total 5 columns):
Country Name    1059 non-null object
Country Code    1057 non-null object
Variable        1057 non-null object
Year 2006       1057 non-null object
Year 2016       1057 non-null object
dtypes: object(5)
memory usage: 41.6+ KB


In [35]:
data[data.Variable == "GDP growth (annual %)"].head(10)



Unnamed: 0,Country Name,Country Code,Variable,Year 2006,Year 2016
264,Afghanistan,AFG,GDP growth (annual %),5.554138,2.366712
265,Albania,ALB,GDP growth (annual %),5.431013,3.369989
266,Algeria,DZA,GDP growth (annual %),1.684488,3.3
267,American Samoa,ASM,GDP growth (annual %),-4.16667,-2.61941
268,Andorra,AND,GDP growth (annual %),4.536353,1.232243
269,Angola,AGO,GDP growth (annual %),20.73512,-0.66535
270,Antigua and Barbuda,ATG,GDP growth (annual %),12.72851,5.342479
271,Arab World,ARB,GDP growth (annual %),6.495336,3.221987
272,Argentina,ARG,GDP growth (annual %),8.047152,-2.24534
273,Armenia,ARM,GDP growth (annual %),13.198,0.2


In [34]:
data[data.Variable == "GDP growth (annual %)"].tail(10)


Unnamed: 0,Country Name,Country Code,Variable,Year 2006,Year 2016
518,Uzbekistan,UZB,GDP growth (annual %),7.3,7.8
519,Vanuatu,VUT,GDP growth (annual %),8.46516,4.000574
520,"Venezuela, RB",VEN,GDP growth (annual %),9.872149,..
521,Vietnam,VNM,GDP growth (annual %),6.977955,6.210812
522,Virgin Islands (U.S.),VIR,GDP growth (annual %),3.625816,..
523,West Bank and Gaza,PSE,GDP growth (annual %),-3.9006,4.115658
524,World,WLD,GDP growth (annual %),4.316059,2.487061
525,"Yemen, Rep.",YEM,GDP growth (annual %),3.170409,-9.77917
526,Zambia,ZMB,GDP growth (annual %),7.903694,3.609742
527,Zimbabwe,ZWE,GDP growth (annual %),-3.4615,0.615714
