# Are Americans Financially Literate?

_Here, I will assess the financial literacy levels amongst people who live in the U.S._

In [None]:
import pandas as pd

I obtained a csv file of entry points of >29,000 surveys.  I cleaned it up, processed the data through a pipeline written in Python.  Then, I created a new metric, called "financialliteracyscore" which assesses the 'financial-literacy-ness' so to speak of a person.  I created this metric based on three foundational concepts that encompass financial literacy: the knowledge of compound interest, the practice of planning for the future, and basic money management, including budgeting and owning a savings account. All of this data was in the csv file I found.  I saved the clean and revised dataframe in a csv, called ```correct_processed_data.csv```.

In [None]:
file_one = pd.read_csv("./correct_processed_data.csv")


In [None]:
file_one.head()


The file is pretty big, with a lot of information.  The task is to pick out some of the more interesting attributes in this dataset.

In [None]:
df = pd.DataFrame(file_one)

In [None]:
stats = df['financialliteracyscore'].describe()


In [None]:
stats


Already, the stats looks alarming.  According this, 75% of surveyees score less than a 60% on the assessment that measures their financial-literacy-ness.

In [None]:
df['financialliteracyscore'].quantile(q=0.91)

The above statistic suggests that only about 9% of the surveyees passed the test.

____________________________________________________________________

## Most Americans are worried about retirement

In [None]:
data = df.loc[df['DegreeOfWorryAboutRetirement'] <= 7]


In [None]:
data.boxplot(column='DegreeOfWorryAboutRetirement', by="AgeGroup", showmeans=True)


_______________________________________________________________________


## Most Americans own a savings account.

In [None]:
data = df.loc[df['SavingsAccount?'] <= 2]

Code like right above is just acquiring the values from the dataframe that only have 1 (i.e. 'yes') or 2 (i.e. 'no').  

In [None]:
data.hist(column = 'SavingsAccount?')

______________________________________________________________________

In [None]:
data = df.loc[df['OwnHome?'] <= 2]

In [None]:
data.hist(column = 'OwnHome?')

In [None]:
data['OwnHome?'].describe()

In [None]:
own_home= data.loc[data['OwnHome?'] == 1]['OwnHome?'].count()

In [None]:
rent_home = data.loc[data['OwnHome?'] == 2]['OwnHome?'].count()

In [None]:
stats = own_home/(float(rent_home)+own_home)*100

In [None]:
stats


Noteworthy to point out that 63% of surveyees own a home. It is worth investigating: is there a difference between the financial literacy score between those who own a home and those who do not? Let's find out...

In [None]:
df_owners = data.loc[data['OwnHome?'] == 1]

In [None]:
df_renters = data.loc[data['OwnHome?']==2]

In [None]:
df_owners.hist(column = 'financialliteracyscore')

In [None]:
df_renters.hist(column = 'financialliteracyscore')

Oh wow. That looks quite drastic.  The financial literacy score of those who rent have an average that is centered more to the left than the average of the scores of homeowners.

In [None]:
df_owners['financialliteracyscore'].describe()

In [None]:
df_renters['financialliteracyscore'].describe()

## Let's determine whether the level of proficiency in financial literacy is significantly different between the two groups, home owners and renters

### We will conduct a two-sample t-test

In [None]:
import numpy as np
from scipy import stats



In [None]:
fin_lit_renters = df_renters['financialliteracyscore']

In [None]:
fin_lit_owners = df_owners['financialliteracyscore']

In [None]:
var_r = fin_lit_renters.var(ddof=1)
var_h = fin_lit_owners.var(ddof = 1)

In [None]:
s = np.sqrt((var_r + var_h)/2)
s

In [None]:
#m = (fin_lit_renters.mean() - fin_lit_owners.mean()/s*np.sqrt(2/))

#### I have to pause actually, because I just realized that in order to conduct a two-sample t-test, you must have equal sample sizes. 

In [None]:
len(fin_lit_renters)

In [None]:
len(fin_lit_owners)

In [None]:
import matplotlib.pyplot as plt

In [None]:
plt.boxplot(fin_lit_renters)

#### probably better if we presented both box plot side to side

In [None]:
fin_lit_owners.unstack?

# mpl_fig = plt.figure()
# ax = mpl_fig.add_subplot(111)

# ax.boxplot(data)

# plotly_fig = tls.mpl_to_plotly( mpl_fig )
# plot_url = py.plot(plotly_fig, 'mpl-multiple-boxplot')

In [None]:
type(fin_lit_renters)

In [None]:
data = pd.DataFrame(fin_lit_owners, fin_lit_renters)

In [None]:
data.head()

In [None]:
fin_lit_owners

In [None]:
fin_lit_renters


In [None]:
data = [fin_lit_owners, fin_lit_renters]

In [None]:
import plotly.plotly as py
import plotly.tools as tls

# mpl_fig = plt.figure()
# ax = mpl_fig.add_subplot(111)

# ax.boxplot(data)

# plotly_fig = tls.mpl_to_plotly( mpl_fig )
# plot_url = py.plot(plotly_fig, 'mpl-multiple-boxplot')