## Generate Decriptive Statstics with Home Brew Library
This Jupyter notebook demonstrates using a home-brew pandas-based library (lib.py) to generate descriptive statistics. The dataset used for demo is extracted from [European Health for All database (HFA-DB)](https://gateway.euro.who.int/en/datasets/european-health-for-all-database/), which contains suicide rate and GDP per capita data for France and Albania from the 1960's to the 2010's.

In [1]:
#import libraries
import pandas as pd
import lib

First, the csv file is read in as a pandas DataFrame.
The dataset contains 4 columns: COUNTRY_REGIONS, YEAR, GDP per Capita, and Suicide Rate

In [2]:
# Read dataset from a csv file
Dataset_raw = pd.read_csv('Dataset.csv')
print(Dataset_raw.head())

  COUNTRY_REGION  YEAR  GDP per Capita  Suicide Rate
0            ALB  1987          674.79          2.47
1            ALB  1988          652.77          2.40
2            ALB  1989          698.00          2.48
3            ALB  1992          200.85          1.65
4            ALB  1993          367.28          2.53


In [3]:
# Select data from 1990 to 2010
Dataset = Dataset_raw[(Dataset_raw['YEAR']>=1990)&(Dataset_raw['YEAR']<=2010)].copy()

Next, a column named Suicide Rate Cat was created, categorizing suicide rate into 3 groups: low, medium and high. 

In [4]:
# Define categorical Variable
def catSuicideRate(rate):
    if rate < 5:
        return 'low'
    elif rate < 15:
        return 'medium'
    else:
        return 'high'

Dataset['Suicide Rate Cat'] = Dataset['Suicide Rate'].apply(catSuicideRate)

Descriptive Statistics were first generated for GDP per Capita and Suicide Rate in France.

In [5]:
# Summarize GDP per Capita and Suicide Rate data in France
Dataset_FRA = Dataset[Dataset['COUNTRY_REGION']=='FRA']
print("Descriptive Statistics for France from 1990 to 2010:")
print(lib.printNumStats(Dataset_FRA, 'GDP per Capita'))
print(lib.printNumStats(Dataset_FRA, 'Suicide Rate'))
print(lib.printOccStats(Dataset_FRA, 'Suicide Rate Cat', 'high'))
print(lib.printOccStats(Dataset_FRA, 'Suicide Rate Cat', 'medium'))
print(lib.printOccStats(Dataset_FRA, 'Suicide Rate Cat', 'low'))

Descriptive Statistics for France from 1990 to 2010:
In GDP per Capita column, the mean is 29206.78 and the median is 24974.27.
In Suicide Rate column, the mean is 16.92 and the median is 16.48.
In Suicide Rate Cat column, the number of occurrences of high is 18, or 85.71% of total samples.
In Suicide Rate Cat column, the number of occurrences of medium is 3, or 14.29% of total samples.
In Suicide Rate Cat column, the number of occurrences of low is 0, or 0.0% of total samples.


Descriptive Statistics were then generated for GDP per Capita and Suicide Rate in Albania.

In [6]:
Dataset_ALB = Dataset[Dataset['COUNTRY_REGION']=='ALB']
print("Descriptive Statistics for Albania from 1990 to 2010:")
print(lib.printNumStats(Dataset_ALB, 'GDP per Capita'))
print(lib.printNumStats(Dataset_ALB, 'Suicide Rate'))
print(lib.printOccStats(Dataset_ALB, 'Suicide Rate Cat', 'high'))
print(lib.printOccStats(Dataset_ALB, 'Suicide Rate Cat', 'medium'))
print(lib.printOccStats(Dataset_ALB, 'Suicide Rate Cat', 'low'))

Descriptive Statistics for Albania from 1990 to 2010:
In GDP per Capita column, the mean is 1736.61 and the median is 1204.17.
In Suicide Rate column, the mean is 3.17 and the median is 3.58.
In Suicide Rate Cat column, the number of occurrences of high is 0, or 0.0% of total samples.
In Suicide Rate Cat column, the number of occurrences of medium is 2, or 11.11% of total samples.
In Suicide Rate Cat column, the number of occurrences of low is 16, or 88.89% of total samples.
