### Income data for 1965 and 2015 in China and USA.

<br>

#### General setup.
___

In [1]:
import numpy as np
import pandas as pd
import scipy.stats
import matplotlib.pyplot as plt

from IPython import display
from ipywidgets import interact, widgets

import re
import mailbox
import csv

%matplotlib inline

In [2]:
plt.style.use('ggplot')
plt.rcParams['figure.figsize'] = (12, 8)

<br>

#### Load the data sets.
___

In [3]:
china1965 = pd.read_csv('../Data/income-1965-china.csv')
china2015 = pd.read_csv('../Data/income-2015-china.csv')
usa1965 = pd.read_csv('../Data/income-1965-usa.csv')
usa2015 = pd.read_csv('../Data/income-2015-usa.csv')

In [4]:
china1965.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 2 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   income        1000 non-null   float64
 1   log10_income  1000 non-null   float64
dtypes: float64(2)
memory usage: 15.8 KB


In [5]:
china1965.head()

Unnamed: 0,income,log10_income
0,1.026259,0.011257
1,0.912053,-0.03998
2,0.110699,-0.955857
3,0.469659,-0.328217
4,0.374626,-0.426402


There are two columns here, both representing income data. The second column represents base 10 logarithmic income (because the daily income and quality of life have logarithmic relationship, for example if you earn 16 dollars a day you need to go to 64 dollars a day before your quality of life changes). 

<br>

#### Stats.
___

In [6]:
# Max and min
china1965.min(), china1965.max()

(income          0.041968
 log10_income   -1.377078
 dtype: float64,
 income          5.426802
 log10_income    0.734544
 dtype: float64)

In [7]:
# Mean
china1965.mean()

income          0.660597
log10_income   -0.274157
dtype: float64

In [8]:
# Variance
china1965.var(ddof=0)

income          0.208846
log10_income    0.088610
dtype: float64

In [9]:
# Quantile for 25% and 75%
china1965.quantile([.25,.75])

Unnamed: 0,income,log10_income
0.25,0.34413,-0.463277
0.75,0.863695,-0.06364


In China in 1965 25% is less than 34 cents a day and 25% is more than 86 cents a day.

In [10]:
# Quantile for 50% is the same as meadian
china1965.quantile(.5), china1965.median()

(income          0.557477
 log10_income   -0.253773
 Name: 0.5, dtype: float64,
 income          0.557477
 log10_income   -0.253773
 dtype: float64)

In [11]:
scipy.stats.percentileofscore(china1965.income, 1.5)

95.5

This means that 95.5% of the income data lies before 1.5 dollars a day.

In [12]:
china1965.describe()

Unnamed: 0,income,log10_income
count,1000.0,1000.0
mean,0.660597,-0.274157
std,0.457226,0.297822
min,0.041968,-1.377078
25%,0.34413,-0.463277
50%,0.557477,-0.253773
75%,0.863695,-0.06364
max,5.426802,0.734544


In [13]:
usa1965.describe()

Unnamed: 0,income,log10_income
count,1000.0,1000.0
mean,31.587965,1.418835
std,22.101531,0.2622
min,4.177852,0.620953
25%,17.498592,1.243003
50%,26.069531,1.416133
75%,39.017113,1.591255
max,246.030397,2.390989


From the above stats we can see that in 1965 in China the income was almost 50% lower than in USA.

<br>

___
#### End.