# Codebook  
**Authors:** Lauren Baker   
Documenting existing data files of DaanMatch with information about location, owner, "version", source etc.

In [1]:
import boto3
import numpy as np 
import pandas as pd
pd.plotting.register_matplotlib_converters()
import matplotlib.pyplot as plt
%matplotlib inline
from collections import Counter
import statistics

In [2]:
client = boto3.client('s3')
resource = boto3.resource('s3')
my_bucket = resource.Bucket('daanmatchdatafiles')

# Expenditure_Gov_India_2017-18_2019-20.csv

## TOC:
* [About this dataset](#1)
* [What's in this dataset](#2)
* [Codebook](#3)
    * [Missing values](#3.1)
    * [Summary statistics](#3.2)
* [Columns](#4)
    * [Sl. No.](#4.1)
    * [Category](#4.2)
    * [Sub Head](#4.3)
    * [2017-2018 - Actuals](#4.4)
    * [2018-2019 - Budget Estimates](#4.5)
    * [2018-2019 - Revised Estimates](#4.6)
    * [2019-2020 - Budget Estimates](#4.7)

**About this dataset**  <a class="anchor" id="1"></a>  
Data provided by: Unknown.  
Source: https://daanmatchdatafiles.s3.us-west-1.amazonaws.com/Expenditure_Gov_India_2017-18_2019-20.csv  
Type: csv  
Last Modified: June 14, 2021, 21:47:22 (UTC-07:00)  
Size: 786.0 B

In [3]:
path = "s3://daanmatchdatafiles/Expenditure_Gov_India_2017-18_2019-20.csv"
expenditure = pd.read_csv(path)
expenditure

Unnamed: 0,Sl. No.,Category,Sub Head,2017-2018 - Actuals,2018-2019 - Budget Estimates,2018-2019 - Revised Estimates,2019-2020 - Budget Estimates
0,I,A. Centre's Expenditure,Establishment Expenditure,473031,508400,517025,541345
1,II,A. Centre's Expenditure,Central Sector Schemes/Projects,587785,708934,736796,860180
2,III,A. Centre's Expenditure,Other Central Sector Expenditure,622900,678017,695609,777996
3,III,A. Centre's Expenditure,Interest Payments out of Other Central Sector ...,528952,575795,587570,665061
4,IV,B. Transfers,Centrally Sponsored Schemes,285448,305517,304849,327679
5,V,B. Transfers,Finance Commission Grants,92244,109374,106129,131902
6,VI,B. Transfers,Other Grants/Loans/Transfers,80567,131973,96827,145097
7,Total,Grand Total,Grand Total,2141975,2442213,2457235,2784200


**What's in this dataset?** <a class="anchor" id="2"></a>

In [6]:
print("Shape:", expenditure.shape)
print("Rows:", expenditure.shape[0])
print("Columns:", expenditure.shape[1])
print("Each row is a department of expenditures.")

Shape: (8, 7)
Rows: 8
Columns: 7
Each row is a department of expenditures.


**Codebook** <a class="anchor" id="3"></a>

In [10]:
expenditure_columns = [column for column in expenditure.columns]
expenditure_description = ["Serial number.", 
                           "Expenditure category.", 
                           "Sub-category of the category for expenditure.", 
                           "Actual amount spent in expenditures in 2017-2018.", 
                           "Predicted expenditures for 2018-2019.", 
                           "Updated and revised estimated expenditures for 2018-2019.", 
                           "Predicted expenditures for 2019-2020."]
expenditure_dtypes = [dtype for dtype in expenditure.dtypes]

data = {"Column Name": expenditure_columns, "Description": expenditure_description, "Type": expenditure_dtypes}
expenditure_codebook = pd.DataFrame(data)
expenditure_codebook.style.set_properties(subset=['Description'], **{'width': '600px'})

Unnamed: 0,Column Name,Description,Type
0,Sl. No.,Serial number.,object
1,Category,Expenditure category.,object
2,Sub Head,Sub-category of the category for expenditure.,object
3,2017-2018 - Actuals,Actual amount spent in expenditures in 2017-2018.,int64
4,2018-2019 - Budget Estimates,Predicted expenditures for 2018-2019.,int64
5,2018-2019 - Revised Estimates,Updated and revised estimated expenditures for 2018-2019.,int64
6,2019-2020 - Budget Estimates,Predicted expenditures for 2019-2020.,int64


**Missing values** <a class="anchor" id="3.1"></a>

In [11]:
expenditure.isnull().sum()

Sl. No.                          0
Category                         0
Sub Head                         0
2017-2018 - Actuals              0
2018-2019 - Budget Estimates     0
2018-2019 - Revised Estimates    0
2019-2020 - Budget Estimates     0
dtype: int64

**Summary statistics** <a class="anchor" id="3.2"></a>

In [12]:
expenditure.describe()

Unnamed: 0,2017-2018 - Actuals,2018-2019 - Budget Estimates,2018-2019 - Revised Estimates,2019-2020 - Budget Estimates
count,8.0,8.0,8.0,8.0
mean,601612.8,682527.9,687755.0,779182.5
std,657492.9,747632.0,756572.5,855331.1
min,80567.0,109374.0,96827.0,131902.0
25%,237147.0,262131.0,255169.0,282033.5
50%,500991.5,542097.5,552297.5,603203.0
75%,596563.8,685746.2,705905.8,798542.0
max,2141975.0,2442213.0,2457235.0,2784200.0
