# Codebook  
**Authors:** Patrick Guo  
Documenting existing data files of DaanMatch with information about location, owner, "version", source etc.

In [1]:
import boto3
import pandas as pd
import numpy as np 
from collections import Counter
import matplotlib.pyplot as plt
import statistics
import math

In [2]:
client = boto3.client('s3')
resource = boto3.resource('s3')
my_bucket = resource.Bucket('my-bucket')

# RAWCosolidated NGO list.xlsx

## TOC:
* [About this dataset](#1)
* [What's in this dataset](#2)
* [Codebook](#3)
    * [Missing values](#3.1)
    * [Summary statistics](#3.2)
* [Columns](#4)
    * [S. No.](#4.1)
    * [District](#4.2)
    * [Mandal](#4.3)
    * [Panchayat](#4.4)
    * [Gram Panchayat Special officer Name](#4.5)
    * [Mobile Number](#4.6)
    * [Address for Communication](#4.7)

**About this dataset**  <a class="anchor" id="1"></a>  
Data provided by: GivingTuesday, Globalgiving.org, GuideStar, IndiaNGOlist, NGOimpact.com, NGODarpan.gov.in  
Source:   
Type: xlsx  
Last modified: May 30, 2021, 21:04:43 (UTC-07:00)  
Size: 96.7 MB

In [13]:
path = "s3://daanmatchdatafiles/RAWCosolidated NGO list.xlsx"
xl = pd.ExcelFile(path)
print(xl.sheet_names)

['Sheet1']


In [14]:
RAWCosolidated_NGO_list = xl.parse("Sheet1")
RAWCosolidated_NGO_list.head()

Unnamed: 0,Link,Ngo Name,Year of Establishment,GuideStar URL,Full Time Staff,Full Time Volunteers,Collected,Target,Donations,Description,...,Asisstant Secretary mobile,Board Member name,Board Member email,Board Member mobile,Vice Chairman name,Vice Chairman email,Vice Chairman mobile,Member name,Member email,Member mobile
0,http://www.ngoimpact.com/ngos/turning-point-fo...,Turning Point Foundation Best Non-Government O...,,,,,,,,"TPF is a not for profit, national level volunt...",...,,,,,,,,,,
1,http://www.ngoimpact.com/ngos/basic-research-e...,Basic Research Education And Development Socie...,,,,,,,,Basic Research Education And Development Socie...,...,,,,,,,,,,
2,http://www.ngoimpact.com/ngos/divya-jyothi-cha...,Divya Jyothi Charitable Trust Best Non-Governm...,,,,,,,,“ Divya Jyothi Charitable Trust” for the blind...,...,,,,,,,,,,
3,http://www.ngoimpact.com/ngos/atma-foundation#...,ATMA Foundation Best Non-Government Organizati...,,,,,,,,ATMA Foundation is an NGO committed to empower...,...,,,,,,,,,,
4,http://www.ngoimpact.com/ngos/calcutta-rescue#...,Calcutta Rescue Best Non-Government Organizati...,,,,,,,,Calcutta Rescue (CR) is a non – governmental o...,...,,,,,,,,,,


**What's in this dataset?** <a class="anchor" id="2"></a>

In [15]:
dataset = RAWCosolidated_NGO_list
print("Shape:", dataset.shape)
print("Rows:", dataset.shape[0])
print("Columns:", dataset.shape[1])
print("Each row is a NGO.")

Shape: (140577, 101)
Rows: 140577
Columns: 101
Each row is a NGO.


**Missing values** <a class="anchor" id="3.1"></a>

In [40]:
RAWConsolidated_NGO_list.isnull().sum()

Link                      82221
Ngo Name                      3
Year of Establishment    139651
GuideStar URL            139652
Full Time Staff          139804
                          ...  
Vice Chairman email      136873
Vice Chairman mobile     136872
Member name               69797
Member email             118620
Member mobile            118616
Length: 101, dtype: int64

**Summary statistics** <a class="anchor" id="3.2"></a>

In [35]:
summary = RAWCosolidated_NGO_list.describe()
del summary["Mobile"]
del summary["Governance photos"]
del summary["status"]
del summary["president mobile"]
del summary["Chairman mobile"]
del summary["Secretary mobile"]
del summary["Asisstant Secretary mobile"]
del summary["Board Member mobile"]
del summary["Vice Chairman mobile"]
del summary["Member mobile"]
summary.transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Year of Establishment,926.0,2464.768,14121.82,1922.0,1994.0,2002.0,2010.0,431731.0
Full Time Volunteers,664.0,159.3479,1289.265,0.0,3.0,8.0,25.0,24450.0
Donations,694.0,218.4251,631.6098,0.0,8.0,59.0,193.75,9516.0
Number of consultants,72.0,6.083333,3.672317,1.0,4.0,5.0,8.0,18.0
Number of board members,440.0,8.363636,3.613383,3.0,5.0,8.0,12.0,22.0
Number of meetings done,73.0,9.178082,25.76245,3.0,4.0,5.0,6.0,222.0
Project Completed,537.0,3.627561,18.94018,0.0,0.0,0.0,4.0,366.0
Funding requirement,537.0,4139238000.0,95895920000.0,0.0,0.0,0.0,450000.0,2222222000000.0
