# Codebook  
**Authors:** Patrick Guo  
Documenting existing data files of DaanMatch with information about location, owner, "version", source etc.

In [1]:
import boto3
import numpy as np 
import pandas as pd
pd.plotting.register_matplotlib_converters()
import matplotlib.pyplot as plt
%matplotlib inline
from collections import Counter
import statistics

In [2]:
client = boto3.client('s3')
resource = boto3.resource('s3')
my_bucket = resource.Bucket('my-bucket')

# 42621 Final_Data_ngodarpan.gov.in

## TOC:
* [About this dataset](#1)
* [What's in this dataset](#2)
* [Codebook](#3)
    * [Missing values](#3.1)
    * [Summary statistics](#3.2)
* [Columns](#4)
    * [Name](4.4)
    * [ngo url](4.5)
    * [Mobile](4.6)
    * [UniqueID](4.7)
    * [Off phone1](4.8)
    * [Email](4.9)
    * [Major Activities1](4.10)
    * [operational states db](4.11)
    * [issues working db](4.12)
    * [operational district db](4.13)
    * [reg name](4.14)
    * [fcrano](4.15)
    * [nr regNo](4.16)
    * [nr add](4.17)
    * [nr orgName](4.18)
    * [ngo reg date](4.19)
    * [nr actName](4.20)
    * [nr city](4.21)
    * [TypeDescription](4.22)
    * [StateName](4.23)
    * [status](4.24)
    * [president name](4.25)
    * [president email](4.26)
    * [president mobile](4.27)
    * [Chairman name](4.28)
    * [Chairman email](4.29)
    * [Chairman mobile](4.30)
    * [Secretary name](4.31)
    * [Secretary email](4.32)
    * [Secretary mobile](4.33)
    * [Asisstant Secretary name](4.34)
    * [Asisstant Secretary email](4.35)
    * [Asisstant Secretary mobile](4.36)
    * [Board Member name](4.37)
    * [Board Member email](4.38)
    * [Board Member mobile](4.39)
    * [Vice Chairman name](4.40)
    * [Vice Chairman email](4.41)
    * [Vice Chairman mobile](4.42)
    * [Member name](4.43)
    * [Member email](4.44)
    * [Member mobile](4.45)

In [41]:
# Lists out the column names in TOC format
def toc_maker(dataset):
    counter = 4
    for column in dataset.columns:
        print("* ["+column+"](4."+str(counter)+")")
        counter +=1

In [40]:
#toc_maker(Final_Data_ngodarpan)

**About this dataset**  <a class="anchor" id="1"></a>  
Data provided by: NGO Darpan  
Source: ngodarpan.gov.in   
Type: xlsx  
Last Modified: June 1, 2021, 17:06:30 (UTC-07:00)  
Size: 49.7 MB

In [7]:
path = "s3://daanmatchdatafiles/Darpan21FCRA/42621 Final_Data_ngodarpan.gov.in.xlsx"
xl = pd.ExcelFile(path)
print(xl.sheet_names)
Final_Data_ngodarpan = xl.parse('ngodarpan.gov.in')
Final_Data_ngodarpan.head()

['ngodarpan.gov.in']


Unnamed: 0,Name,ngo url,Mobile,UniqueID,Off phone1,Email,Major Activities1,operational states db,issues working db,operational district db,...,Asisstant Secretary mobile,Board Member name,Board Member email,Board Member mobile,Vice Chairman name,Vice Chairman email,Vice Chairman mobile,Member name,Member email,Member mobile
0,PRAYAS,,9778080000.0,OR/2009/0010000,06858-223440,director_prayas@yahoo.com,1.63 Nos. of SHGs formed,"ORISSA,","Agriculture,Children,Civic Issues,Disaster Man...","ORISSA->Nabarangapur ,",...,,,,,,,,,,
1,PONDICHERRYWOMENSCONFERENCE,,9443253000.0,PY/2016/0100001,0413-2213238,surebe33@gmail.com,Working for Women and Children Obtaining Loan ...,"PUDUCHERRY,","Women's Development & Empowerment,Children,","PUDUCHERRY->Puducherry,",...,,,,,,,,,,
2,SHABRI SAMAJ SEWA SAMITI,http://ssssamitibhind.org,7828394000.0,MP/2016/0100003,0751-1234689,ssssamitibhind@gmail.com,more than one thousand leadership development ...,"MADHYA PRADESH,","Animal Husbandry, Dairying & Fisheries,Agricul...","MADHYA PRADESH->Anuppur, Ashoknagar, Balaghat,...",...,,,,,,,,ALOK,ssssamitibhind@gmail.com,7828498000.0
3,ANAND GANGA SAMAJIK SIKSHA SAMITI,,9450678000.0,UP/2016/0100004,05566-281059,lovelyraivijendra@gmail.com,OUR ORGANISATION HAVE PLANTED MORE THAN 2 LAKH...,"UTTAR PRADESH,","Agriculture,Environment & Forests,Health & Fam...","UTTAR PRADESH->Deoria, Gorakhpur, Sant Kabir N...",...,,,,,,,,,,
4,Himaliyan Gram Vikas Samiti,,9412037000.0,UA/2016/0100009,05964-213271,hgvs1990@gmail.com,Facilitated formation and strengthening of 65C...,"UTTARAKHAND,","Animal Husbandry, Dairying & Fisheries,Agricul...","UTTARAKHAND->Almora , Bageshwar, Champawat, Pi...",...,,Krishna Nand,hgvsgan@yahoo.co.in,7500720000.0,Leela Dhar Joshi,hgvs.jleeladhar.lj@gmail.com,8057816000.0,,,


**What's in this dataset?** <a class="anchor" id="2"></a>

In [8]:
dataset = Final_Data_ngodarpan
print("Shape:", dataset.shape)
print("Rows:", dataset.shape[0])
print("Columns:", dataset.shape[1])
print("Each row is a NGO.")

Shape: (111929, 42)
Rows: 111929
Columns: 42
Each row is a NGO.


**Codebook** <a class="anchor" id="3"></a>

In [28]:
dataset_columns = [column for column in dataset.columns]
dataset_desc = ["Name of NGO",
               "Url for NGO",
               "Mobile phone",
               "Unique ID of VO/NGO",
               "Telephone/Alternate number",
               "Email address",
               "Description of major activities",
               "List of states they operate in",
               "List of issues they are working on",
               "List of districts they operate in",
               "Name of registrar",
               "FCRA number",
               "Registration number",
               "Address",
               "Name of NGO",
               "Registration date",
               "Name of Act",
               ]
dataset_desc = dataset_desc + ["N/A"] * (len(dataset_columns) - len(dataset_desc))
dataset_dtypes = [dtype for dtype in dataset.dtypes]

data = {"Column Name": dataset_columns, "Description": dataset_desc, "Type": dataset_dtypes}
dataset_codebook = pd.DataFrame(data)
dataset_codebook

Unnamed: 0,Column Name,Description,Type
0,Name,Name of NGO,object
1,ngo url,Url for NGO,object
2,Mobile,Mobile phone,float64
3,UniqueID,Unique ID of VO/NGO,object
4,Off phone1,Telephone/Alternate number,object
5,Email,Email address,object
6,Major Activities1,Description of major activities,object
7,operational states db,List of states they operate in,object
8,issues working db,List of issues they are working on,object
9,operational district db,List of districts they operate in,object


**Missing values** <a class="anchor" id="3.1"></a>

In [10]:
Final_Data_ngodarpan.isnull().sum()

Name                               0
ngo url                        86142
Mobile                            32
UniqueID                           0
Off phone1                     95402
Email                              0
Major Activities1              27311
operational states db          23039
issues working db              22637
operational district db        23039
reg name                           0
fcrano                         89869
nr regNo                           3
nr add                             0
nr orgName                         0
ngo reg date                       0
nr actName                      1316
nr city                          214
TypeDescription                    0
StateName                          0
status                        111929
president name                 52520
president email                52520
president mobile               52520
Chairman name                  82126
Chairman email                 82132
Chairman mobile                82137
S

**Summary statistics** <a class="anchor" id="3.2"></a>

None. All qualitative features.