<a href="https://colab.research.google.com/github/ruthiang/Ruth_data690/blob/main/Project/World%20Development%20Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Income Effect on Prevalence of Anemia among Women of Reproductive Age
## UMBC Masters in Data Science Project
## Ruth Iang

# Introduction:
Anemia is a condition where the body's blood cells are depleted and the demand to meet oxygen cells is reduced. This affects approximately 1.76 billion people around the world, which means it is a big public health problem that must be resolved. In addition, it can lead to detrimental birth consequences along with defective cognitive performance in infants and young kids. The prevalence of anemia differs based on geographic region, but the highest prevalence is known to be among women of reproductive age(WRA). In this article, we are going to look at the relationship between income class, and anemia among women of reproductive age since geographic region often has to do with GDP and socio-economic class. We will also be looking at proximal (social class, contraception use, and access to healthcare), and an intermediate factor (educational attainment) that leads to anemia in women of reproductive age.

 ## Approach:
 ### **Source of data & graphs**: THE WORLD DEVELOPMENT EXPLORER
### **Countries/Class Compared**: Comparing different income level from 200 countries around the world
### **Timeline**: 2014 - 2019
### **Topics & Indicators**:
Economy and Growth:
 - GDP per capita (current international $)- This is gross domestic product(GDP) converted to international dollars using purchasing power parity rates. GDP is the sum of gross value added by all resident producers in the economy plus any product taxes minus any subsidiaries not included in the value of products. It is calculated without making deductions for depreciated fabricated assets or for the depletion and degradation of natural resources. Data are in 2011 international dollars.

Health :
 - Prevalence of anemia among women of reproductive age (% of women ages 15-49). This refers to the combined prevalence of pregnant women with hemoglobin levels below 12g/dL and non-pregnant women with hemoglobin levels below 11 g/dL.
 - Condom use, population ages 15-24, female (% of females ages 15-24)- This is a percentage of how many females age 15-24 use condoms during sexual activity.
 - Problems in accessing health care (any of the specified problems) (% of women): Q1 (lowest) - This refers to the percentage of women who have problems accessing health care for the following reasons: knowing where to go for treatment, getting permission to go for treatment, money for treatment, distance to health facility, transportation, not wanting to go alone, and concern for not having female provider.

Education:
 - Educational attainment, at least Bachelor's or equivalent, population 25+, female (%) (cumulative)- the percentage of female population ages 25 and over that attained or completed Bachelor's degree or equivalent.

In [None]:
import pandas as pd
import plotly
import plotly.express as px

data_url = "https://raw.githubusercontent.com/ruthiang/Ruth_data690/main/wdi_data%20(8).csv"

In [None]:
df = pd.read_csv(data_url)
df.head()

Unnamed: 0.1,Unnamed: 0,Year,value,indicator,Country Code,Country Name,Region,Income Group,Lending Type
0,0,2014,31.1,SH.CON.1524.FE.ZS,BEN,Benin,Sub-Saharan Africa,Lower middle income,IDA
1,1,2014,12.3,SH.CON.1524.FE.ZS,COD,"Congo, Dem. Rep.",Sub-Saharan Africa,Low income,IDA
2,2,2014,44.0,SH.CON.1524.FE.ZS,DOM,Dominican Republic,Latin America & Caribbean,Upper middle income,IBRD
3,3,2014,29.0,SH.CON.1524.FE.ZS,SLV,El Salvador,Latin America & Caribbean,Lower middle income,IBRD
4,4,2014,42.7,SH.CON.1524.FE.ZS,GNB,Guinea-Bissau,Sub-Saharan Africa,Low income,IDA


### Time series for GDP and Income Group from 2014-2019

In [None]:
df_anemia = df.query("indicator == 'SH.ANM.ALLW.ZS'")
df_anemia = df_anemia[df_anemia["Income Group"].isin(["High income","Low income", "Upper middle income", "Lower middle income","Not classified"])]
df_anemia["Year"] = df_anemia["Year"].astype("str")
df_anemia.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1140 entries, 10 to 1149
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Unnamed: 0    1140 non-null   int64  
 1   Year          1140 non-null   object 
 2   value         1140 non-null   float64
 3   indicator     1140 non-null   object 
 4   Country Code  1140 non-null   object 
 5   Country Name  1140 non-null   object 
 6   Region        1140 non-null   object 
 7   Income Group  1140 non-null   object 
 8   Lending Type  1140 non-null   object 
dtypes: float64(1), int64(1), object(7)
memory usage: 89.1+ KB


In [None]:
df_years = df_anemia.groupby(['Year','indicator','Income Group']).mean().reset_index().sort_values(by="value", ascending=False)
df_years = df_years.sort_values("Year")
df_years.head()

Unnamed: 0.1,Year,indicator,Income Group,Unnamed: 0,value
0,2014,SH.ANM.ALLW.ZS,High income,575.071429,15.5625
1,2014,SH.ANM.ALLW.ZS,Low income,602.666667,41.803704
4,2014,SH.ANM.ALLW.ZS,Upper middle income,536.384615,24.092308
2,2014,SH.ANM.ALLW.ZS,Lower middle income,595.222222,33.190741
3,2014,SH.ANM.ALLW.ZS,Not classified,1120.0,21.5


In [None]:
fig = px.line(
    df_years,
    x="Year",
    labels={"value":"Prevalence of anemia among women of reproductive age (% of women ages 15-49)"},
    y="value",
    color="Income Group",
    hover_name = "Income Group",
    title = "The Relationship between Socio-economic Class and Prevalence of Anemia among Women of Reproductive Age from 2014-2019",
    height=800,
    width=1200,
    template=list(plotly.io.templates.keys())[5]
)

fig.update_layout(showlegend=True)
fig.show()

- As we can see on the time series plot below, those that fall in the high income class have the lowest prevalence of anemia in WRA, and those that fall in the low income class have the highest prevalence of anemia for WRA. We can also see that the percentage of people getting anemia is increasing throughout the years.

## 2014 Bar chart

In [None]:
df_2014_anemia = df.query("Year == 2014").query("indicator == 'SH.ANM.ALLW.ZS'")
df_2014_anemia.sample()

Unnamed: 0.1,Unnamed: 0,Year,value,indicator,Country Code,Country Name,Region,Income Group,Lending Type
232,232,2014,32.8,SH.ANM.ALLW.ZS,COM,Comoros,Sub-Saharan Africa,Lower middle income,IDA


In [None]:
df_2014_anemia_sort = df_2014_anemia.sort_values(by= "value", ascending=False)

In [None]:
df_gdp_mean = df_2014_anemia_sort.groupby("Income Group").mean().reset_index().sort_values(by="value", ascending=False)

In [None]:
fig=px.bar(
    df_gdp_mean,
    x="Income Group",
    labels={"value":"Prevalence of anemia among women of reproductive age(% of women ages 15-49)(group mean)", "Income Group":"Income Group"},
    y="value",
    color="Income Group",
    title = "Comparison of Social Class for Frequency of Anemia",
    height=800,
    width=1000,
    template=list(plotly.io.templates.keys())[5]
)

fig.update_layout(title_text='Comparison of Social Class for Frequency of Anemia', title_x=0.5)
fig.update_layout(showlegend=True)
fig.show()

- Based on the bar graph above, those that fall in the low income class have the highest percentage for having anemia, followed by lower middle income, upper income, not classified, and high income. This just shows that poorer women at the age of procreating are more likely to have anemia, and those with more money are less likely to develop it.

## Scatter Plot

In [None]:
df.drop(columns=["Unnamed: 0"], inplace=True)
df.tail()

Unnamed: 0,Year,value,indicator,Country Code,Country Name,Region,Income Group,Lending Type
3933,2015,2679.507615,NY.GDP.PCAP.PP.CD,ZWE,Zimbabwe,Sub-Saharan Africa,Lower middle income,Blend
3934,2016,2806.469032,NY.GDP.PCAP.PP.CD,ZWE,Zimbabwe,Sub-Saharan Africa,Lower middle income,Blend
3935,2017,3795.642431,NY.GDP.PCAP.PP.CD,ZWE,Zimbabwe,Sub-Saharan Africa,Lower middle income,Blend
3936,2018,4017.221716,NY.GDP.PCAP.PP.CD,ZWE,Zimbabwe,Sub-Saharan Africa,Lower middle income,Blend
3937,2019,3783.547898,NY.GDP.PCAP.PP.CD,ZWE,Zimbabwe,Sub-Saharan Africa,Lower middle income,Blend


In [None]:
df_2019= df.query("Year == 2019")

In [None]:
df_pivot = df_2019.pivot_table(index=["Year", "Country Code", "Country Name", "Region", "Income Group", "Lending Type"], 
                                    columns="indicator", values="value")
df_pivot.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,indicator,NY.GDP.PCAP.PP.CD,SE.TER.CUAT.BA.FE.ZS,SH.ACS.PROB.Q1.ZS,SH.ANM.ALLW.ZS,SP.POP.TOTL
Year,Country Code,Country Name,Region,Income Group,Lending Type,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2019,ABW,Aruba,Latin America & Caribbean,High income,Not classified,,,,,106310.0
2019,AFG,Afghanistan,South Asia,Low income,IDA,2152.366489,,,42.6,38041757.0
2019,AGO,Angola,Sub-Saharan Africa,Lower middle income,IBRD,6952.419362,,,44.5,31825299.0
2019,ALB,Albania,Europe & Central Asia,Upper middle income,IBRD,14012.976435,,,24.8,2854191.0
2019,AND,Andorra,Europe & Central Asia,High income,Not classified,,,,12.1,77146.0


In [None]:
fig = px.scatter(df_pivot, 
                 x="NY.GDP.PCAP.PP.CD", 
                 y="SH.ANM.ALLW.ZS",
                 trendline="ols",
                 height = 800,
                 width = 1000,
                 title = "Association between Anemia among Women of Reproducitve Age and GDP Per Capita",
                 template = list(plotly.io.templates.keys())[5],
                 labels={"NY.GDP.PCAP.PP.CD":"2019 GDP per capita, PPP (current international $)",
                         "SH.ANM.ALLW.ZS":"2019 Prevalence of anemia among women of reproductive age (% of women ages 15-49)"},
                 )

fig.show()

- The scatter plot shows us that there is a negative correlation between the prevalence of anemia among women of reproductive age and GDP per capita. It shows us the more money a country has, the less chances there are of having anemia.

- The r-squared or coefficient of determination is 0.359, which means 35.9% of the variation in anemia is explained by changes in the GDP of a country.


## Contraception Bubble Chart

In [None]:
df.drop(columns=["Unnamed: 0"], inplace=True)
df.head()

Unnamed: 0,Year,value,indicator,Country Code,Country Name,Region,Income Group,Lending Type
0,2014,31.1,SH.CON.1524.FE.ZS,BEN,Benin,Sub-Saharan Africa,Lower middle income,IDA
1,2014,12.3,SH.CON.1524.FE.ZS,COD,"Congo, Dem. Rep.",Sub-Saharan Africa,Low income,IDA
2,2014,44.0,SH.CON.1524.FE.ZS,DOM,Dominican Republic,Latin America & Caribbean,Upper middle income,IBRD
3,2014,29.0,SH.CON.1524.FE.ZS,SLV,El Salvador,Latin America & Caribbean,Lower middle income,IBRD
4,2014,42.7,SH.CON.1524.FE.ZS,GNB,Guinea-Bissau,Sub-Saharan Africa,Low income,IDA


In [None]:
df_pivot_2014 = df.pivot_table(index=["Year", "Country Code", "Country Name", "Region", "Income Group", "Lending Type"], 
                                    columns="indicator", values="value")
df_pivot_2014.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,indicator,NY.GDP.PCAP.PP.CD,SE.TER.CUAT.BA.FE.ZS,SH.ACS.PROB.Q1.ZS,SH.ANM.ALLW.ZS,SH.CON.1524.FE.ZS,SP.POP.TOTL
Year,Country Code,Country Name,Region,Income Group,Lending Type,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2014,ABW,Aruba,Latin America & Caribbean,High income,Not classified,36779.429429,,,,,103776.0
2014,AFG,Afghanistan,South Asia,Low income,IDA,2069.424022,,,38.7,,33370804.0
2014,AGO,Angola,Sub-Saharan Africa,Lower middle income,IBRD,8179.297828,2.00061,,45.3,,26941773.0
2014,ALB,Albania,Europe & Central Asia,Upper middle income,IBRD,11259.267514,,,22.4,,2889104.0
2014,AND,Andorra,Europe & Central Asia,High income,Not classified,,21.54171,,10.9,,79213.0


In [None]:
df_pivot_2014 = df_pivot_2014.query("Year == 2014")
df_pivot_2014.head()

indicator,Year,Country Code,Country Name,Region,Income Group,Lending Type,NY.GDP.PCAP.PP.CD,SE.TER.CUAT.BA.FE.ZS,SH.ACS.PROB.Q1.ZS,SH.ANM.ALLW.ZS,SH.CON.1524.FE.ZS,SP.POP.TOTL
0,2014,ABW,Aruba,Latin America & Caribbean,High income,Not classified,36779.429429,,,,,103776.0
1,2014,AFG,Afghanistan,South Asia,Low income,IDA,2069.424022,,,38.7,,33370804.0
2,2014,AGO,Angola,Sub-Saharan Africa,Lower middle income,IBRD,8179.297828,2.00061,,45.3,,26941773.0
3,2014,ALB,Albania,Europe & Central Asia,Upper middle income,IBRD,11259.267514,,,22.4,,2889104.0
4,2014,AND,Andorra,Europe & Central Asia,High income,Not classified,,21.54171,,10.9,,79213.0


In [None]:
df_pivot_2014.reset_index(inplace=True)
df_pivot_2014.head()

indicator,index,Year,Country Code,Country Name,Region,Income Group,Lending Type,NY.GDP.PCAP.PP.CD,SE.TER.CUAT.BA.FE.ZS,SH.ACS.PROB.Q1.ZS,SH.ANM.ALLW.ZS,SH.CON.1524.FE.ZS,SP.POP.TOTL
0,0,2014,ABW,Aruba,Latin America & Caribbean,High income,Not classified,36779.429429,,,,,103776.0
1,1,2014,AFG,Afghanistan,South Asia,Low income,IDA,2069.424022,,,38.7,,33370804.0
2,2,2014,AGO,Angola,Sub-Saharan Africa,Lower middle income,IBRD,8179.297828,2.00061,,45.3,,26941773.0
3,3,2014,ALB,Albania,Europe & Central Asia,Upper middle income,IBRD,11259.267514,,,22.4,,2889104.0
4,4,2014,AND,Andorra,Europe & Central Asia,High income,Not classified,,21.54171,,10.9,,79213.0


In [None]:
df_2014_groupby = df_pivot_2014.groupby(["Income Group"]).agg(
    cont_mean  = ("SH.CON.1524.FE.ZS", "mean"),
    anemia_mean= ("SH.ANM.ALLW.ZS", "mean"),
    size = ("SP.POP.TOTL","mean"),
).reset_index()

df_2014_groupby

Unnamed: 0,Income Group,cont_mean,anemia_mean,size
0,High income,,15.5625,14645970.0
1,Low income,27.166667,41.803704,21771380.0
2,Lower middle income,30.925,33.190741,55604530.0
3,Not classified,,21.5,30042970.0
4,Upper middle income,46.65,24.092308,44068430.0


In [None]:
fig = px.scatter(df_2014_groupby,
                 x="cont_mean", 
                 y="anemia_mean",
                 size = "size",
                 size_max = 50,
                 height = 800,
                 width = 1200,
                 template = list(plotly.io.templates.keys())[5],
                 labels={"cont_mean":"2014 Condom use, population ages 15-24, female (% of females ages 15-24)(group mean)",
                         "anemia_mean":"2014 Prevalence of anemia among women of reproductive age (% of women ages 15-49)(group mean)"},
                 title="Relationship between Contraception Use and Commoness of Anemia",
                 color="Income Group")

fig.update_layout(title_text="Relationship between Contraception Use and Commoness of Anemia", title_x=0.5)
fig.show()


- There are not enough data for incomes above lower middle class even when we use all the quartiles, but as we look at the bubble chart, we can see that those in the lower middle income have less problems accessing healthcare and have less occurrence of anemia than those from low income.

## Bubble Chart for Health Access 

In [None]:
df_health_groupby = df_pivot_2014.groupby(["Income Group"]).agg(
    health_mean  = ("SH.ACS.PROB.Q1.ZS", "mean"),
    anemia_mean= ("SH.ANM.ALLW.ZS", "mean"),
    size = ("SP.POP.TOTL", "mean"),
).reset_index()

df_health_groupby

Unnamed: 0,Income Group,health_mean,anemia_mean,size
0,High income,,15.5625,14645970.0
1,Low income,82.35,41.803704,21771380.0
2,Lower middle income,70.516667,33.190741,55604530.0
3,Not classified,,21.5,30042970.0
4,Upper middle income,,24.092308,44068430.0


In [None]:
fig = px.scatter(df_health_groupby,
                 x="health_mean", 
                 y="anemia_mean",
                 size = "size",
                 size_max = 50,
                 height = 800,
                 width = 1200,
                 hover_name = "Income Group",
                 title="Relationship between Problems in Accessing Health Care and Prevalence of Anemia",
                 template = list(plotly.io.templates.keys())[5],
                 labels={"health_mean":"2014 Problems in accessing health care (any of the specified problems) (% of women): Q1 (lowest)(group mean)",
                         "anemia_mean":"2014 Prevalence of anemia among women of reproductive age (% of women ages 15-49)(group mean)"},
                 color="Income Group")

fig.update_layout(title_text="Relationship between Problems in Accessing Health Care and Prevalence of Anemia", title_x=0.5)
fig.show()

- There are not enough data for incomes above lower middle class even when we use all the quartiles, but as we look at the bubble chart, we can see that those in the lower middle income have less problems accessing healthcare and have less occurrence of anemia than those from low income.

## Bubble Chart for Educational Attainment

In [None]:
df_edu_groupby = df_pivot_2014.groupby(["Income Group"]).agg(
    edu_mean  = ("SE.TER.CUAT.BA.FE.ZS", "mean"),
    anemia_mean= ("SH.ANM.ALLW.ZS", "mean"),
    size = ("SP.POP.TOTL", "mean"),
).reset_index()

df_edu_groupby

Unnamed: 0,Income Group,edu_mean,anemia_mean,size
0,High income,24.123116,15.5625,14645970.0
1,Low income,0.914633,41.803704,21771380.0
2,Lower middle income,9.38825,33.190741,55604530.0
3,Not classified,,21.5,30042970.0
4,Upper middle income,15.49947,24.092308,44068430.0


In [None]:
fig = px.scatter(df_edu_groupby,
                 x="edu_mean", 
                 y="anemia_mean",
                 size = "size",
                 size_max = 50,
                 height = 800,
                 width = 1200,
                 title="Relationship between Educational Attainment and Frequency of Anemia",
                 hover_name = "Income Group",
                 template = list(plotly.io.templates.keys())[5],
                 labels={"edu_mean":"(Educational attainment, at least Bachelor's or equivalent, population 25+, female (%) (cumulative)(group mean)",
                         "anemia_mean":"2014 Prevalence of anemia among women of reproductive age (% of women ages 15-49)(group mean)"},
                 color="Income Group")

fig.update_layout(title_text="Relationship between Educational Attainment and Frequency of Anemia", title_x=0.5)
fig.show()

- Based on the bubble chart below, there is a strong negative correlation between educational attainment and frequency of anemia in women of reproductive age. Educational attainment factors cover women above 25 years that have at least a Bachelor's degree or equivalent. These relationships show us that educated women are more likely to be health conscious and take care of their health, resulting in better health. Now, we don't know if there is a causal relationship, but there is an association for sure.

## Conclusion:

As we look at the charts and graphs, we can see trends between prevalence of anemia in women of reproductive age and the indicators: GDP/social class, contraception use, access to healthcare, and educational attainment. Although it is hard to conclude a causal relationship between any of them, there are some relationships we found. We found that:

- GDP/social class and prevalence of getting anemia in women of reproductive age have a strong association. Those from lower classes are more likely to have anemia than those from richer families.
- Regarding contraceptive use, women from lower income families are less likely to use condoms than those from higher income families.
- There is a strong relationship between educational attainment and prevalence of anemia among women of reproductive age.

Based on the results that we got from the graphs and charts, we can say GDP and social class affects proximal factors of getting anemia. This means poorer people are less likely to be able to afford condoms, have access to healthcare, and obtain secondary education, all due to a lack of money.
