# Descriptive statistics using Polars or Panda
## IDS 706 Data Engineering Systems
 Author: Tianji Rao

## Contents:
[1.Introduction](#1-introduction)   
[2.Pandas](#2-pandas)

# 1. Introduction

## Import necessary packages

In [1]:
import pandas as pd
import polars as pl


Here, the author used the [`Money Stock Measures - H.6 Release`](https://www.federalreserve.gov/releases/h6/current/default.htm) as the sample dataset and performed a series of descriptive statistics. Here we can have a glance at the dataset. All customized functions are from `mylib`, which contains `for_pandas.py` and `for_polars`.

In [12]:
# Using pandas to read .csv file
df = pd.read_csv('FRB_H6.csv').iloc[5:,]
# modify column names
df.columns = ['Date'] + df.columns[1:].tolist()
# print data information
df.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 775 entries, 5 to 779
Data columns (total 30 columns):
 #   Column                                                                                                 Non-Null Count  Dtype 
---  ------                                                                                                 --------------  ----- 
 0   Date                                                                                                   775 non-null    object
 1   M1; Not seasonally adjusted                                                                            775 non-null    object
 2   M2; Not seasonally adjusted                                                                            775 non-null    object
 3   Currency; Not seasonally adjusted                                                                      775 non-null    object
 4   Demand deposits; Not seasonally adjusted                                                               7

In [16]:
df.iloc[:,1:] = df.iloc[:, 1:].astype('float')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 775 entries, 5 to 779
Data columns (total 30 columns):
 #   Column                                                                                                 Non-Null Count  Dtype 
---  ------                                                                                                 --------------  ----- 
 0   Date                                                                                                   775 non-null    object
 1   M1; Not seasonally adjusted                                                                            775 non-null    object
 2   M2; Not seasonally adjusted                                                                            775 non-null    object
 3   Currency; Not seasonally adjusted                                                                      775 non-null    object
 4   Demand deposits; Not seasonally adjusted                                                               7

In [9]:
df.head()

Unnamed: 0,Series Description,M1; Not seasonally adjusted,M2; Not seasonally adjusted,Currency; Not seasonally adjusted,Demand deposits; Not seasonally adjusted,Other liquid deposits - Total; Not seasonally adjusted,Small-denomination time deposits - Total; Not seasonally adjusted,Retail money market funds; Not seasonally adjusted,IRA and Keogh accounts at depository institutions; Not seasonally adjusted. Last 4 obs are estimates.,IRA and Keogh accounts at money market funds; Not seasonally adjusted. Last 4 obs are estimates.,...,Monetary base; total; not seasonally adjusted,"Reserves of depository institutions, total; not seasonally adjusted",Total borrowings from the Federal Reserve; not seasonally adjusted,"Reserves of depository institutions, nonborrowed; not seasonally adjusted",Other checkable deposits - Total; Not seasonally adjusted; *Discontinued after Apr 2020,Savings deposits - Total; Not seasonally adjusted; *Discontinued after Apr 2020,Travelers checks; Not seasonally adjusted; *Discontinued after Dec 2018,Other checkable deposits - Total; Seasonally adjusted; *Discontinued after Apr 2020,Savings deposits - Total; Seasonally adjusted; *Discontinued after Apr 2020,Travelers Checks; Seasonally adjusted; *Discontinued after Dec 2018
5,1959-01,142.2,289.8,28.4,113.4,,11.7,,0.0,0.0,...,50.5,18.9,551.8,18.3,0.0,136.0,0.3,0.0,136.0,0.3
6,1959-02,139.3,287.7,28.2,110.8,,11.7,,0.0,0.0,...,49.8,18.6,505.0,18.1,0.0,136.7,0.3,0.0,136.6,0.3
7,1959-03,138.4,287.9,28.3,109.8,,11.8,,0.0,0.0,...,49.7,18.4,599.5,17.8,0.0,137.7,0.3,0.0,137.6,0.3
8,1959-04,139.7,290.2,28.3,111.1,,12.0,,0.0,0.0,...,50.1,18.7,691.6,18.0,0.0,138.5,0.3,0.0,138.4,0.3
9,1959-05,138.7,290.2,28.5,109.8,,12.2,,0.0,0.0,...,50.1,18.6,741.5,17.8,0.0,139.3,0.3,0.0,139.5,0.3


In [13]:
df.shape

(775, 30)

# 2. Pandas

Since importing data is shown in previous section, we already have a `pd.DataFrame` as our dataset. let's start from using `pandas` to do statistical descripition. Here, we gonna use `pd.DataFrame.describe()` to print a series of useful statistics.


## 2.1 Descriptitive Statistics

In [15]:
print(df['M1; Not seasonally adjusted'].describe())

count       775
unique      745
top       144.5
freq          3
Name: M1; Not seasonally adjusted, dtype: object
