# Hi! Welcome to my Project.

I'm using the Human Freedom Index 2019 data set to ask two very simple questions, <br>

**1. Is there a correlation between Women's Safety & Security and the Economic Freedom of a country as a whole?** <br>
I believe that a country which allows for greater freedom for the movement of women, and the accompanying attitude and culture required to support this freedom, would be economically stronger, thanks to the contribution of women. This is why I would like to examine the correlation between the freedom of women’s movement and economic freedom as a whole.


**2. Is there a correlation between the Freedom of Women's Movement and Economic Freedom?** <br>
On a similar note to my previous research question, I believe that the security and safety of women is tied to the economic freedom of a country. Here, the security and safety of women is further composed of two categories; female genital mutilation, and inheritance rights, both of which are excellent and yet basic metrics to study the situation of women in any given country.

## The Variables

Women's Safety & Security
> This metric is further derived from scores for Female Genital Mutilation and Women’s Inheritance Rights. <br>
> The variable is *‘pf_ss_women’*. <br>
> Here, 10 refers to absolute safety and security of women, and 0 refers to none.

Freedom of Women’s Movement 
> This metric studies whether women and men have the same legal rights to apply for national identity cards, to apply for passports, and to travel outside the country. <br>
> The variable is *‘pf_movement_women’*. <br>
> Here, 10 refers to absolute freedom of women's movement, and 0 refers to none.

Economic Freedom is comprised of individual scores for 
> Size of Government <br>
> Legal System and Property Rights <br>
> Sound Money <br>
> Freedom to Trade Internationally <br>
> Regulation. <br>
> The variable is ‘ef_score’. <br>
> Here, 10 refers to absolute economic freedom, and 0 refers to none.

## Literature Review

**Search Terms** <br>
I started by looking for just ‘women’ and ‘economy’, because that is the platform that both of my research questions build on. I extended these terms with ‘GDP’, ‘Safety’, ‘Sexism’ and ‘Growth’.

**References** <br>
Firstly, this S&P Global Paper on Women and GDP Growth in the US.

[The Key to Unlocking U.S. GDP Growth? Women](https://www.spglobal.com/_Media/Documents/03651.00_Women_at_Work_Doc.8.5x11-R4.pdf) by  Beth Ann Bovino and Jason Gold.

Second, this IMF Working Paper on Women and Future Growth.

[Women Are Key for Future Growth: Evidence from Canada](https://www.imf.org/~/media/Files/Publications/WP/2017/wp17166.ashx) by Bengt Petersson, Rodrigo Mariscal, and Kotaro Ishi.

**Summary of Findings** <br>
The first paper by S&P Global as well as the IMF Working Paper have similar findings. They both consider the variables involving women’s quantitative contribution to the economy, as well as a qualitative look at the attitudes and culture surrounding women at work. They both agree and highlight in their findings the importance of recognizing and empowering women to contribute economically, and the long-reaching benefits that leads to, which includes economic, social, personal and cultural development. This literature review guides me to consider my own research questions with a highly analytical eye, and suggests that my hypotheses are in the correct direction.

Let's get started!

# Importing Libraries

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
print('Libraries ready')

# Getting Data

In [None]:
mydata = pd.read_csv(r'../input/the-human-freedom-index/hfi_cc_2019.csv')
mydata.head()

We do not need 120 columns for this project. So, let's simplify things.

In [None]:
rawdata = mydata[['year', 'ISO_code', 'countries', 'region', 'pf_ss_women_fgm', 
                  'pf_ss_women_inheritance', 'pf_ss_women', 'pf_movement_women', 'ef_money_growth', 
                  'ef_government', 'ef_legal', 'ef_money', 'ef_trade', 'ef_regulation', 'ef_score']].copy()
rawdata.head()

# Exloring and Managing Data

In [None]:
rawdata.info()
rawdata.describe()

While it says 1620 non-null values, I know there are missing '-'s hiding in there.  <br>
Plus, all those objects have to be converted to numeric.

In [None]:
rawdata['pf_ss_women_fgm'] = pd.to_numeric(rawdata['pf_ss_women_fgm'], errors = 'coerce')
rawdata['pf_ss_women_inheritance'] = pd.to_numeric(rawdata['pf_ss_women_inheritance'], errors = 'coerce')
rawdata['pf_ss_women'] = pd.to_numeric(rawdata['pf_ss_women'], errors = 'coerce')
rawdata['pf_movement_women'] = pd.to_numeric(rawdata['pf_movement_women'], errors = 'coerce')
rawdata['ef_money_growth'] = pd.to_numeric(rawdata['ef_money_growth'], errors = 'coerce')
rawdata['ef_government'] = pd.to_numeric(rawdata['ef_government'], errors = 'coerce')
rawdata['ef_legal'] = pd.to_numeric(rawdata['ef_legal'], errors = 'coerce')
rawdata['ef_money'] = pd.to_numeric(rawdata['ef_money'], errors = 'coerce')
rawdata['ef_trade'] = pd.to_numeric(rawdata['ef_trade'], errors = 'coerce')
rawdata['ef_regulation'] = pd.to_numeric(rawdata['ef_regulation'], errors = 'coerce')
rawdata['ef_score'] = pd.to_numeric(rawdata['ef_score'], errors = 'coerce')

In [None]:
# Let's look at the data one more time
rawdata.info()

Time to deal with missing values.

In [None]:
rawdata = rawdata.dropna()
rawdata.info()

# Visualizations
***1. Univariate Graphs.*** <br> <br>
**Part One.** <br>
First and foremost, the three basic univariate graphs for my chosen variables.

In [None]:
rawdata.hist(column = 'pf_ss_women')
plt.title('Frequency Distribution for Safety & Security')

rawdata.hist(column = 'pf_movement_women')
plt.title('Frequency Distribution for Freedom of Movement of Women')

rawdata.hist(column = 'ef_score')
plt.title('Frequency Distribution for Economic Freedom')

**Part Two.** <br>
Secondly, year wise univariate distributions for the three variables.

In [None]:
fig, ax = plt.subplots(figsize=(5, 5))
plt.title('Data in 2017')
rawdata[rawdata['year'] == 2017]['pf_ss_women'].plot.hist(color = 'red')
rawdata[rawdata['year'] == 2017]['ef_score'].plot.hist(color = 'darkorange')
rawdata[rawdata['year'] == 2017]['pf_movement_women'].plot.hist(color = 'gold')

That's 2017, and the key for the entire set.

> Red: Scores for Safety & Security of Women <br>
> Orange: Scores for Economic Freedom <br>
> Yellow: Scores for Freedom of Movement <br>

Expand the code and output below for the whole set!

In [None]:
fig, ax = plt.subplots(figsize=(5, 5))
plt.title('Data in 2016')
rawdata[rawdata['year'] == 2016]['pf_ss_women'].plot.hist(color = 'red')
rawdata[rawdata['year'] == 2016]['ef_score'].plot.hist(color = 'darkorange')
rawdata[rawdata['year'] == 2016]['pf_movement_women'].plot.hist(color = 'gold')

fig, ax = plt.subplots(figsize=(5, 5))
plt.title('Data in 2015')
rawdata[rawdata['year'] == 2015]['pf_ss_women'].plot.hist(color = 'red')
rawdata[rawdata['year'] == 2015]['ef_score'].plot.hist(color = 'darkorange')
rawdata[rawdata['year'] == 2015]['pf_movement_women'].plot.hist(color = 'gold')

fig, ax = plt.subplots(figsize=(5, 5))
plt.title('Data in 2014')
rawdata[rawdata['year'] == 2014]['pf_ss_women'].plot.hist(color = 'red')
rawdata[rawdata['year'] == 2014]['ef_score'].plot.hist(color = 'darkorange')
rawdata[rawdata['year'] == 2014]['pf_movement_women'].plot.hist(color = 'gold')

fig, ax = plt.subplots(figsize=(5, 5))
plt.title('Data in 2013')
rawdata[rawdata['year'] == 2013]['pf_ss_women'].plot.hist(color = 'red')
rawdata[rawdata['year'] == 2013]['ef_score'].plot.hist(color = 'darkorange')
rawdata[rawdata['year'] == 2013]['pf_movement_women'].plot.hist(color = 'gold')

fig, ax = plt.subplots(figsize=(5, 5))
plt.title('Data in 2012')
rawdata[rawdata['year'] == 2012]['pf_ss_women'].plot.hist(color = 'red')
rawdata[rawdata['year'] == 2012]['ef_score'].plot.hist(color = 'darkorange')
rawdata[rawdata['year'] == 2012]['pf_movement_women'].plot.hist(color = 'gold')

fig, ax = plt.subplots(figsize=(5, 5))
plt.title('Data in 2011')
rawdata[rawdata['year'] == 2011]['pf_ss_women'].plot.hist(color = 'red')
rawdata[rawdata['year'] == 2011]['ef_score'].plot.hist(color = 'darkorange')
rawdata[rawdata['year'] == 2011]['pf_movement_women'].plot.hist(color = 'gold')

fig, ax = plt.subplots(figsize=(5, 5))
plt.title('Data in 2010')
rawdata[rawdata['year'] == 2010]['pf_ss_women'].plot.hist(color = 'red')
rawdata[rawdata['year'] == 2010]['ef_score'].plot.hist(color = 'darkorange')
rawdata[rawdata['year'] == 2010]['pf_movement_women'].plot.hist(color = 'gold')

fig, ax = plt.subplots(figsize=(5, 5))
plt.title('Data in 2009')
rawdata[rawdata['year'] == 2009]['pf_ss_women'].plot.hist(color = 'red')
rawdata[rawdata['year'] == 2009]['ef_score'].plot.hist(color = 'darkorange')
rawdata[rawdata['year'] == 2009]['pf_movement_women'].plot.hist(color = 'gold')

fig, ax = plt.subplots(figsize=(5, 5))
plt.title('Data in 2008')
rawdata[rawdata['year'] == 2008]['pf_ss_women'].plot.hist(color = 'red')
rawdata[rawdata['year'] == 2008]['ef_score'].plot.hist(color = 'darkorange')
rawdata[rawdata['year'] == 2008]['pf_movement_women'].plot.hist(color = 'gold')

**Part Three.** <br>
Aggregate graphs, that look at the three variables across time and space. <br>
(Across global regions and years under study, to be precise.)

In [None]:
fig, ax = plt.subplots(figsize=(20,10))
plt.title('Safety & Security of Women by Region and Year')
plt.xlabel('Region')
plt.ylabel('Score')
sns.lineplot(x = 'region', y = 'pf_ss_women', data = rawdata[rawdata['year'] == 2017], label = '2017', ci = None)
sns.lineplot(x = 'region', y = 'pf_ss_women', data = rawdata[rawdata['year'] == 2016], label = '2016', ci = None)
sns.lineplot(x = 'region', y = 'pf_ss_women', data = rawdata[rawdata['year'] == 2015], label = '2015', ci = None)
sns.lineplot(x = 'region', y = 'pf_ss_women', data = rawdata[rawdata['year'] == 2014], label = '2014', ci = None)
sns.lineplot(x = 'region', y = 'pf_ss_women', data = rawdata[rawdata['year'] == 2013], label = '2013', ci = None)
sns.lineplot(x = 'region', y = 'pf_ss_women', data = rawdata[rawdata['year'] == 2012], label = '2012', ci = None)
sns.lineplot(x = 'region', y = 'pf_ss_women', data = rawdata[rawdata['year'] == 2011], label = '2011', ci = None)
sns.lineplot(x = 'region', y = 'pf_ss_women', data = rawdata[rawdata['year'] == 2010], label = '2010', ci = None)
sns.lineplot(x = 'region', y = 'pf_ss_women', data = rawdata[rawdata['year'] == 2009], label = '2009', ci = None)
sns.lineplot(x = 'region', y = 'pf_ss_women', data = rawdata[rawdata['year'] == 2008], label = '2008', ci = None)

In [None]:
fig, ax = plt.subplots(figsize=(20,10))
plt.title('Freedom of Movement of Women by Region and Year')
plt.xlabel('Region')
plt.ylabel('Score')
sns.lineplot(x = 'region', y = 'pf_movement_women', data = rawdata[rawdata['year'] == 2017], label = '2017', ci = None)
sns.lineplot(x = 'region', y = 'pf_movement_women', data = rawdata[rawdata['year'] == 2016], label = '2016', ci = None)
sns.lineplot(x = 'region', y = 'pf_movement_women', data = rawdata[rawdata['year'] == 2015], label = '2015', ci = None)
sns.lineplot(x = 'region', y = 'pf_movement_women', data = rawdata[rawdata['year'] == 2014], label = '2014', ci = None)
sns.lineplot(x = 'region', y = 'pf_movement_women', data = rawdata[rawdata['year'] == 2013], label = '2013', ci = None)
sns.lineplot(x = 'region', y = 'pf_movement_women', data = rawdata[rawdata['year'] == 2012], label = '2012', ci = None)
sns.lineplot(x = 'region', y = 'pf_movement_women', data = rawdata[rawdata['year'] == 2011], label = '2011', ci = None)
sns.lineplot(x = 'region', y = 'pf_movement_women', data = rawdata[rawdata['year'] == 2010], label = '2010', ci = None)
sns.lineplot(x = 'region', y = 'pf_movement_women', data = rawdata[rawdata['year'] == 2009], label = '2009', ci = None)
sns.lineplot(x = 'region', y = 'pf_movement_women', data = rawdata[rawdata['year'] == 2008], label = '2008', ci = None)

In [None]:
fig, ax = plt.subplots(figsize=(20,10))
plt.title('Economic Freedom by Region and Year')
plt.xlabel('Region')
plt.ylabel('Score')
sns.lineplot(x = 'region', y = 'ef_score', data = rawdata[rawdata['year'] == 2017], label = '2017', ci = None)
sns.lineplot(x = 'region', y = 'ef_score', data = rawdata[rawdata['year'] == 2016], label = '2016', ci = None)
sns.lineplot(x = 'region', y = 'ef_score', data = rawdata[rawdata['year'] == 2015], label = '2015', ci = None)
sns.lineplot(x = 'region', y = 'ef_score', data = rawdata[rawdata['year'] == 2014], label = '2014', ci = None)
sns.lineplot(x = 'region', y = 'ef_score', data = rawdata[rawdata['year'] == 2013], label = '2013', ci = None)
sns.lineplot(x = 'region', y = 'ef_score', data = rawdata[rawdata['year'] == 2012], label = '2012', ci = None)
sns.lineplot(x = 'region', y = 'ef_score', data = rawdata[rawdata['year'] == 2011], label = '2011', ci = None)
sns.lineplot(x = 'region', y = 'ef_score', data = rawdata[rawdata['year'] == 2010], label = '2010', ci = None)
sns.lineplot(x = 'region', y = 'ef_score', data = rawdata[rawdata['year'] == 2009], label = '2009', ci = None)
sns.lineplot(x = 'region', y = 'ef_score', data = rawdata[rawdata['year'] == 2008], label = '2008', ci = None)

According to all three graphs, Sub Saharan Africa, and the Middle East & North African Region rank consistently low, throughout time. <br>
Further, the rough peaks and troughs for all three graphs seem to overlap, which is a good sign for answering my research questions.

***2. Bivariate Graphs.*** <br>
Here’s where the correlation matrices show up.

In [None]:
chartdata = rawdata[rawdata['year'] == 2017][['pf_ss_women', 'pf_movement_women', 'ef_score']].copy()
corr = chartdata.corr()
fig, ax = plt.subplots(figsize=(10,10))
plt.title('2017 Matrix')
sns.heatmap(
    corr, 
    square=True,
    linewidths=.5,
    annot = True,
    ax = ax
)

To say that I have a convincing, or at least relevant degree of correlation, would require a number equal to or greater than 0.3. In short, we want those two squares, top right and middle right, looking as light as possible!  <br>
That's 2017. The rest of the set is right here, and can be expanded for viewing.

In [None]:
chartdata = rawdata[rawdata['year'] == 2016][['pf_ss_women', 'pf_movement_women', 'ef_score']].copy()
corr = chartdata.corr()
fig, ax = plt.subplots(figsize=(10,10))
plt.title('2016 Matrix')
sns.heatmap(
    corr, 
    square=True,
    linewidths=.5,
    annot = True,
    ax = ax
)

chartdata = rawdata[rawdata['year'] == 2015][['pf_ss_women', 'pf_movement_women', 'ef_score']].copy()
corr = chartdata.corr()
fig, ax = plt.subplots(figsize=(10,10))
plt.title('2015 Matrix')
sns.heatmap(
    corr, 
    square=True,
    linewidths=.5,
    annot = True,
    ax = ax
)

chartdata = rawdata[rawdata['year'] == 2014][['pf_ss_women', 'pf_movement_women', 'ef_score']].copy()
corr = chartdata.corr()
fig, ax = plt.subplots(figsize=(10,10))
plt.title('2014 Matrix')
sns.heatmap(
    corr, 
    square=True,
    linewidths=.5,
    annot = True,
    ax = ax
)

chartdata = rawdata[rawdata['year'] == 2013][['pf_ss_women', 'pf_movement_women', 'ef_score']].copy()
corr = chartdata.corr()
fig, ax = plt.subplots(figsize=(10,10))
plt.title('2013 Matrix')
sns.heatmap(
    corr, 
    square=True,
    linewidths=.5,
    annot = True,
    ax = ax
)

chartdata = rawdata[rawdata['year'] == 2012][['pf_ss_women', 'pf_movement_women', 'ef_score']].copy()
corr = chartdata.corr()
fig, ax = plt.subplots(figsize=(10,10))
plt.title('2012 Matrix')
sns.heatmap(
    corr, 
    square=True,
    linewidths=.5,
    annot = True,
    ax = ax
)

chartdata = rawdata[rawdata['year'] == 2011][['pf_ss_women', 'pf_movement_women', 'ef_score']].copy()
corr = chartdata.corr()
fig, ax = plt.subplots(figsize=(10,10))
plt.title('2011 Matrix')
sns.heatmap(
    corr, 
    square=True,
    linewidths=.5,
    annot = True,
    ax = ax
)

chartdata = rawdata[rawdata['year'] == 2010][['pf_ss_women', 'pf_movement_women', 'ef_score']].copy()
corr = chartdata.corr()
fig, ax = plt.subplots(figsize=(10,10))
plt.title('2010 Matrix')
sns.heatmap(
    corr, 
    square=True,
    linewidths=.5,
    annot = True,
    ax = ax
)

chartdata = rawdata[rawdata['year'] == 2009][['pf_ss_women', 'pf_movement_women', 'ef_score']].copy()
corr = chartdata.corr()
fig, ax = plt.subplots(figsize=(10,10))
plt.title('2009 Matrix')
sns.heatmap(
    corr, 
    square=True,
    linewidths=.5,
    annot = True,
    ax = ax
)

chartdata = rawdata[rawdata['year'] == 2008][['pf_ss_women', 'pf_movement_women', 'ef_score']].copy()
corr = chartdata.corr()
fig, ax = plt.subplots(figsize=(10,10))
plt.title('2008 Matrix')
sns.heatmap(
    corr, 
    square=True,
    linewidths=.5,
    annot = True,
    ax = ax
)

I compiled all of these 20 numbers, representing the degrees of correlation, into a new DataFrame.

In [None]:
datatrend = {'Year': [2017, 2016, 2015, 2014, 2013, 2012, 2011, 2010, 2009, 2008],
            'Safety & Security | Economic Freedom': [0.44, 0.5, 0.49, 0.45, 0.45, 0.46, 0.46, 0.4, 0.44, 0.45],
            'Movement | Economic Freedom': [0.26, 0.41, 0.39, 0.36, 0.37, 0.5, 0.48, 0.18, 0.24, 0.21]
            }
datatrend = pd.DataFrame(datatrend, columns = ['Year', 'Safety & Security | Economic Freedom', 
                                               'Movement | Economic Freedom'])
datatrend.head(10)

Those values actually look really good. <br>The least degree of correlation between Safety & Security of Women, and Economic Freedom is 0.40, which is a good number! <br> While I can’t say the same for Freedom of Movement and Economic Freedom, the numbers do seem to indicate a positively correlated relationship.

Out of curiosity, I plotted these as well. And the results are very interesting.

In [None]:
plt.title('Degree of Correlation by Year')
plt.xlabel('Year')
sns.regplot(x = 'Year', y = 'Safety & Security | Economic Freedom', data = datatrend, color = 'red')

In [None]:
plt.title('Degree of Correlation by Year')
plt.xlabel('Year')
sns.regplot(x = 'Year', y = 'Movement | Economic Freedom', data = datatrend, color = 'blue')

Throughout the years, *the general welfare and empowerment of women seems to be **positively related** to the economic strength and freedom of a country*.

# Insights

Hi! Congratulations for making it this far.

Finally, my hypotheses have turned out to be true. 

The data does indicate that there is a positive relation between the Safety & Security of Women, and the Economic Freedom of a country, as well as a positive relation between the Freedom of Movement of Women, and the Economic Freedom of a country.

Further, this relation seems to be getting stronger over time, as the growing empowerment and subsequent contribution of women in the economy seems to strengthen the economic freedom of the country as a whole.

That’s all for now! Thanks for reading!
Please feel free to provide feedback.