# How are organisations identified in 360Giving's GrantNav? 

**Author**: Edafe Onerhime

**Last Updated**: 2017-07-11

**Description**:  Analyses the organisation id's used in [GrantNav](http://grantnav.threesixtygiving.org) using the prefixes in the **Recipient Org: Identifier** field.

**Contents**: This Jupyter Notebook, data folder

## About

This is an experiment to understand how funding organisations talk about the recipients of their grants by analysing the identifiers they provide for them. 


**Why identifiers matter**

Identifiers are an important part of any dataset. They let a computer uniquely identify and refer to specific grants, organisations, transactions and so-on. 

Whilst a human being may be good at recognising that:

>INDIGO TRUST, The Indigo Trust, and indigo-trust

... all refer to the same organisation, computers find this a lot trickier.

Identifiers can be valuable for joining up pools of information and improving the quality of the information we hold.

**What will this analysis tell us?**

We can better understand what identifiers publishers prefer, if we should provide some interventions and where to target our efforts.

For example, do publishers prefer their own internal reference? Can we understand why? Do publishers prefer charity numbers over company numbers? How can we support them while making the information more widely useful?

## Instructions


## Things to note


## Things to do



In [38]:
import os.path
import pandas as pd
from urllib.request import urlretrieve

In [39]:
fileDetails = {'grantnav.csv': 'http://grantnav.threesixtygiving.org/search.csv', 
               'org-id.csv': 'http://org-id.guide/download.csv',
               'country-codes.csv': 'https://raw.githubusercontent.com/datasets/country-codes/master/data/country-codes.csv'}

excludeFunders = ['360G-blf'] # Excludes any Funding Org:Identifier listed
excludeText = ''
if len(excludeFunders) > 0:
    excludeText = '_excludes_'+'-'.join(excludeFunders)
    
outputFileName = 'grantnav_recipient_prefix' + excludeText + '.csv'

In [40]:
# Get GrantNav and Org-Id files
for fname in fileDetails:
    print('Checking for:',fname)
    if not os.path.isfile(fname):
        print('Downloading:',fname)
        urlretrieve(fileDetails[fname], fname)
        

Checking for: country-codes.csv
Checking for: org-id.csv
Checking for: grantnav.csv


In [41]:
# =======================
# Process GrantNav data
# =======================

# Load files
dfGN = pd.read_csv('grantnav.csv',usecols=['Recipient Org:Identifier','Funding Org:Identifier'])

# Remove excluded funders
if len(excludeFunders) > 0:
    dfGN = dfGN[~dfGN['Funding Org:Identifier'].isin(excludeFunders)]
    
# Split column 
# (Replace anything containing 360G with 360G- to group all publisher-own prefixes)
dfGN.loc[dfGN['Recipient Org:Identifier'].str.contains('360G'), 'Recipient Org:Identifier'] = '360G'
dfGN = pd.DataFrame(dfGN['Recipient Org:Identifier'].str.split('-',2).tolist(),columns = ['split1','split2','discard']).fillna('')
dfGN['code'] = dfGN['split1']+'-'+dfGN['split2']
dfGN = dfGN[['code']]
dfGN['code'] = dfGN.code.str.strip('-')
dfGN = pd.DataFrame({'count' : dfGN.groupby( ['code'] ).size()}).reset_index()

In [42]:
# =======================
# Process GrantNav data
# =======================

# Load files
dfOID = pd.read_csv('org-id.csv',usecols=['code', 'quality', "name/en",'coverage','structure'])
dfOID.rename(columns={"name/en": 'name','coverage': 'country_code'}, inplace=True)

In [43]:
# =====================
# Process country codes
# =====================

dfCC = pd.read_csv('country-codes.csv',usecols={'name','ISO3166-1-Alpha-2'})
dfCC.rename(columns={'name': 'country','ISO3166-1-Alpha-2': 'country_code'}, inplace=True)

In [44]:
# ========================
# Merge & export datasets
# ========================

df = pd.merge(dfGN, dfOID, how='left', on=['code'])
df = pd.merge(df, dfCC.dropna(), how='left', on=['country_code'])
df.loc[df['code'].str.contains('360G'), 'name'] = "Publisher's internal reference"
df[['code','name','country_code','country','structure','count','quality']].fillna('').to_csv(outputFileName,index=False)