# ![](https://ga-dash.s3.amazonaws.com/production/assets/logo-9f88ae6c9c3871690e33280fcf557f33.png) Capstone Project: Donor Leads through Networks


--- 

By: Wenzhe

## Recommender System

### Overview

The notebook serves to build a recommender system for the problem statement: to provide new donor leads for a given organisation. It also builds some of the necessary code for the demo.

### Notebook Structure

* [Part 1: Setup](#part-1-eda)
* [Part 2: Item-Based Recommender](#part-2-item-based-recommender)
* [Part 3: Network Based](#part-3-network-based-recommender)
* [Part 4: Visualising Recommendations](#part-4-visualising-recommendations)
* [Part 5: Conclusion](#part-5-conclusion)

---

## Part 1: Setup

#### Import Libraries

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

# NetworkX to create and work with Networks
import networkx as nx

# Pyvis to visualise the graps from NetworkX 
from pyvis.network import Network

# To write the graph data into JSON
import json
from networkx.readwrite import json_graph

# For forest centrality
import networkit

# For Community Detection
import karateclub

# For making copies
import copy

# For similarity score
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.metrics.pairwise import pairwise_distances
# from sklearn.metrics import jaccard_score

# For colors in visualisation
import matplotlib
import matplotlib.colors as mcolors

### Import Charities Data

Load the previously saved graph that is store in JSON format.

In [2]:
with open('../json/graph.json', "r") as f:
    json_data = json.load(f)

In [3]:
G = json_graph.node_link_graph(json_data)

In [4]:
# Check that graph has been loaded
len(G.nodes())

24756

In [5]:
df_charities = pd.read_csv('../data/charities.csv')

In [6]:
df_charities.head()

Unnamed: 0,charity_uen,charity_name,address,postal_code,charity_objective,charity_vision,sector,is_ipc,activity_direct_services,activity_research,...,classification_south_west,classification_support_groups,classification_taoism,classification_tcm_clinic,classification_theatre_&_dramatic_arts,classification_think_tanks,classification_traditional_ethnic_performing_arts,classification_training_&_education,classification_trust/research_funds,classification_visual_arts
0,200920810R,#CHECKED LIMITED,"350 ORCHARD ROAD, #17-07/09, SHAW HOUSE, 238868",238868.0,A. To match Green innovation ideas with the re...,1. Vision: To be an educational platform for e...,Others,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,200712761D,"*SCAPE CO., LTD.","2 ORCHARD LINK, #04-01, SCAPE, 237978",237978.0,(i) To encourage and promote social and cultur...,Vision To be a celebrated talent and resource ...,Others,1,1,0,...,0,0,0,0,0,0,0,0,0,0
2,201021998H,=DREAMS (ASIA) LIMITED,"1 LORONG 2 TOA PAYOH, #07-00, BRADDELL HOUSE, ...",319637.0,"Overseas work for disadvantaged children, comm...",Communities are developed and poverty is allev...,Others,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,202032457N,=DREAMS (SINGAPORE) LIMITED,"99 HAIG ROAD, =DREAMS CAMPUS, 438748",438748.0,=DREAMS Singapore is a first-of-its-kind secul...,ABOUT US\n\nWHAT: =DREAMS is a residential mod...,Social and Welfare,1,1,0,...,0,0,0,0,0,0,0,0,0,0
4,201436550G,21C GIRLS LTD.,"101 UPPER CROSS STREET, #05-16, PEOPLE'S PARK ...",58357.0,DELIVER FREE TECHNOLOGY CLASSES AND CAMPS FOR ...,TO TEACH TECHNOLOGY TO GIRLS SO THAT THEY CAN ...,Education,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [7]:
df_charities.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2601 entries, 0 to 2600
Data columns (total 68 columns):
 #   Column                                                 Non-Null Count  Dtype  
---  ------                                                 --------------  -----  
 0   charity_uen                                            2601 non-null   object 
 1   charity_name                                           2601 non-null   object 
 2   address                                                2598 non-null   object 
 3   postal_code                                            2535 non-null   float64
 4   charity_objective                                      2274 non-null   object 
 5   charity_vision                                         2115 non-null   object 
 6   sector                                                 2601 non-null   object 
 7   is_ipc                                                 2601 non-null   int64  
 8   activity_direct_services                        

---

## Part 2: Item-Based Recommender

From the available dataset, most person nodes are only connected to one organisation. An item-based recommender will be more appropriate to recommend donors connected to similar charities.

Item-based recommender is straightforward with generally better prediction accuracy, and in this case, it is based on the features of the items and disregards the connected donors.

The downside would be that this is heavily dependent on the data quality of features and will only recommend similar organisations with no consideration for novelty.

### Forming the Dataframe

In [8]:
# Make dummy columns for sector
df_charities = pd.get_dummies(df_charities, prefix='sector', columns=['sector'], dtype=int)

In [9]:
feature_cols = [col for col in df_charities.columns if col.startswith(('sector', 'activity', 'classification'))]
feature_cols

['activity_direct_services',
 'activity_research',
 'activity_financial_assistance',
 'activity_support_charities',
 'activity_grantmaking',
 'activity_training_education',
 'activity_public_awareness',
 'classification_active_ageing',
 'classification_animal_welfare',
 'classification_buddhism',
 'classification_central',
 'classification_children/youth',
 'classification_christianity',
 'classification_cluster/hospital_funds',
 'classification_community',
 'classification_contemporary_&_ethnic_dance',
 'classification_day_rehabilitation_centre',
 'classification_disability_(adult)',
 'classification_disability_(children)',
 'classification_disability_sports',
 'classification_diseases/illnessess_support_group',
 'classification_eldercare',
 'classification_environment',
 'classification_family',
 'classification_foreign_educational_institutions/funds',
 'classification_foundations_&_trusts',
 'classification_general_charitable_purposes',
 'classification_government-aided_schools',
 '

In [10]:
items = df_charities[['charity_name', 'is_ipc'] + feature_cols]
items.head()

Unnamed: 0,charity_name,is_ipc,activity_direct_services,activity_research,activity_financial_assistance,activity_support_charities,activity_grantmaking,activity_training_education,activity_public_awareness,classification_active_ageing,...,classification_trust/research_funds,classification_visual_arts,sector_Arts and Heritage,sector_Community,sector_Education,sector_Health,sector_Others,sector_Religious,sector_Social and Welfare,sector_Sports
0,#CHECKED LIMITED,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
1,"*SCAPE CO., LTD.",1,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
2,=DREAMS (ASIA) LIMITED,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,1,0,0,0
3,=DREAMS (SINGAPORE) LIMITED,1,1,0,1,0,0,1,0,0,...,0,0,0,0,0,0,0,0,1,0
4,21C GIRLS LTD.,0,0,0,0,0,0,1,0,0,...,0,0,0,0,1,0,0,0,0,0


In [11]:
features = items.drop(columns=['charity_name'])

### Cosine Similarity vs Jaccard Similarity

Explore the differences between using the two similarity metrics for item based recommender.

The features of the items are binary variables.

**Cosine Similarity**: Measures the similarity of two vectors in the same inner product space. Mathematically, it is the dot product of the vectors divided by their magnitude.

**Jaccard Similarity**: Can compute the similarity between two asymmetric binary variables. Mathematically, it is the intersection of the two sets of variables divided by their union. In this case where the value of 1 may be more important than 0, it is said to be an asymmetric binary variable. Source: https://www.learndatasci.com/glossary/jaccard-similarity/

In [12]:
data = {
    'item_id': ['item1', 'item2', 'item3', 'item4'],
    'feature_1': [1, 0, 1, 0],
    'feature_2': [0, 1, 1, 0],
    'feature_3': [1, 1, 0, 0],
    'feature_4': [0, 0, 1, 1]
}

df = pd.DataFrame(data)

# Set 'item_id' as index
df.set_index('item_id', inplace=True)

In [13]:
# Calculate cosine similarity
cos_sim_matrix = cosine_similarity(df.values)

# Create a DataFrame from the cosine similarity matrix
cos_sim_df = pd.DataFrame(cos_sim_matrix, index=df.index, columns=df.index)


In [14]:
cos_sim_df

item_id,item1,item2,item3,item4
item_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
item1,1.0,0.5,0.408248,0.0
item2,0.5,1.0,0.408248,0.0
item3,0.408248,0.408248,1.0,0.57735
item4,0.0,0.0,0.57735,1.0


In [15]:
pd.DataFrame(1- pairwise_distances(df.values, metric='jaccard'))



Unnamed: 0,0,1,2,3
0,1.0,0.333333,0.25,0.0
1,0.333333,1.0,0.25,0.0
2,0.25,0.25,1.0,0.333333
3,0.0,0.0,0.333333,1.0


In [16]:
df

Unnamed: 0_level_0,feature_1,feature_2,feature_3,feature_4
item_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
item1,1,0,1,0
item2,0,1,1,0
item3,1,1,0,1
item4,0,0,0,1


From the created dataframe, item1 and item2 are similar by having 1 in feature3 and 0 in feature4. The cosine similarity of these two items are 0.5 using cosine similarity, and 0.333 using jaccard similarity.

For this recommender system, it may be better to prioritise similarities where the features are present and less so on the features that are absent. As such, jaccard similarity is the better metric to use for measuring the similarity of items (organisations) for this project.

### Jaccard Similarities

In [17]:
features.shape

(2601, 69)

In [18]:
jaccard_similarties = 1 - pairwise_distances(features.values, metric='jaccard')
jaccard_similarties.shape



(2601, 2601)

In [19]:
df_similarities = pd.DataFrame(jaccard_similarties, index=items['charity_name'], columns=items['charity_name'])

In [20]:
df_similarities.head()

charity_name,#CHECKED LIMITED,"*SCAPE CO., LTD.",=DREAMS (ASIA) LIMITED,=DREAMS (SINGAPORE) LIMITED,21C GIRLS LTD.,365 CANCER PREVENTION SOCIETY,3PUMPKINS LIMITED,A DAY WITH MARY,A. O. ACTION LOVE LTD.,ABDULLAH SALEH SHOOKER TRUST,...,ZI JING CULTURAL CENTRE LTD.,ZION BISHAN BIBLE-PRESBYTERIAN CHURCH,Zion Church,Zion Full Gospel Church,Zion Home for the Aged,Zion Living Streams Community Church,Zion Presbyterian Church,ZION SERANGOON BIBLE-PRESBYTERIAN CHURCH,Zonta Singapore- Project Pari Fund,Zu-Lin Temple Association
charity_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
#CHECKED LIMITED,1.0,0.2,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,...,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
"*SCAPE CO., LTD.",0.2,1.0,0.166667,0.428571,0.0,0.125,0.2,0.0,0.0,0.2,...,0.166667,0.0,0.0,0.0,0.166667,0.166667,0.0,0.0,0.142857,0.0
=DREAMS (ASIA) LIMITED,0.25,0.166667,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.666667,...,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
=DREAMS (SINGAPORE) LIMITED,0.0,0.428571,0.0,1.0,0.125,0.375,0.272727,0.0,0.0,0.0,...,0.125,0.0,0.0,0.0,0.285714,0.125,0.0,0.111111,0.428571,0.125
21C GIRLS LTD.,0.0,0.0,0.0,0.125,1.0,0.142857,0.0,0.0,0.0,0.0,...,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0


Get the top 10 similarities of a given charity.

In [21]:
df_similarities['IMPART LTD.'].sort_values(ascending=False)[:10]

charity_name
IMPART LTD.                                                1.000000
KAMPUNG SENANG CHARITY AND EDUCATION FOUNDATION LIMITED    0.875000
PERTAPIS EDUCATION AND WELFARE CENTRE                      0.857143
MONTFORT CARE                                              0.777778
RAMAKRISHNA MISSION-BOYS' HOME, THE                        0.714286
SMA CARE LIMITED                                           0.714286
Teen Challenge (Singapore)                                 0.714286
ONE HOPE CENTRE                                            0.714286
BABES PREGNANCY CRISIS SUPPORT LTD.                        0.714286
THYE HUA KWAN MORAL CHARITIES LIMITED                      0.700000
Name: IMPART LTD., dtype: float64

---

## Part 3: Network Based Recommender

Given the network, build a recommender system based on various metrics. There are several social network-based recommender systems. This section aims to implement a simple solution for the given network.

### Helper Functions for Network Building / Visualisation

In [22]:
# Returns the name of person / charity
def label_node_name(node):
    if node.get('node-type') == 'person':
        return node.get('name')
    else:
        return node.get('charity_name')

In [23]:
# make persons dot and charities box on the visualisation
def label_node_shape(node):
    if node['node-type'] == 'person':
        return 'dot'
    else:
        return 'box' # square

In [24]:
def label_node_color_degree(graph, node, start_node):
    try: path_len = nx.shortest_path_length(graph, source=start_node, target=node)
    except nx.NetworkXNoPath: path_len = 6
    if path_len == 0: path_len = 1 # if target is same as source, set it to be the warmest color (same as deg=1)
    if path_len > 6: path_len = 6 # this color palette only has 6 colours, limit anything beyond 6 to use the coolest color available
    return sns.color_palette('coolwarm').as_hex()[-path_len]

In [25]:
def visualise_graph(graph, filename):
    for n in graph.nodes(data=True):
        n[1]['label'] = label_node_name(n[1])
        n[1]['shape'] = label_node_shape(n[1])
    
    pyvis_net = Network(notebook=True, cdn_resources='remote') 
    pyvis_net.from_nx(graph) 
    pyvis_net.show(f'../graph/{filename}.html') 

In [26]:
# Returns the node of a charity based on its name
# Names are unique, else returns the first instance
def get_charity_by_name(G, name):
    node_select = [node for node, data in G.nodes(data=True) if data.get('node-type') == 'entity' and data.get('charity_name') == name]
    # return none if no such charity name
    if len(node_select) == 0: return None
    return node_select[0]

In [27]:
# Returns the node of a person based on the name
def get_person_by_name(G, name):
    node_select = [node for node, data in G.nodes(data=True) if data.get('node-type') == 'person' and data.get('name') == name]
    # return none if no such charity name
    if len(node_select) == 0: return None
    return node_select[0]

In [28]:
def visualise_degree_graph(graph, charity_name, degree, filename):
    charity_node = get_charity_by_name(graph, charity_name)
    subgraph = nx.ego_graph(graph, charity_node, degree)
    for n in subgraph.nodes(data=True):
        n[1]['label'] = label_node_name(n[1])
        n[1]['shape'] = label_node_shape(n[1])
        n[1]['color'] = label_node_color_degree(subgraph, n[0], charity_node)
        # TODO: label entity features
    
    pyvis_net = Network(notebook=True, cdn_resources='remote') 
    pyvis_net.from_nx(subgraph) 
    pyvis_net.show(f'../graph/{filename}.html') 

In [29]:
def visualise_subgraph(subgraph, charity_node, filename):
    for n in subgraph.nodes(data=True):
        n[1]['label'] = label_node_name(n[1])
        n[1]['shape'] = label_node_shape(n[1])
        n[1]['color'] = label_node_color_degree(subgraph, n[0], charity_node)
        # TODO: label entity features
    
    pyvis_net = Network(notebook=True, cdn_resources='remote') 
    pyvis_net.from_nx(subgraph) 
    pyvis_net.show_buttons(filter_=['physics'])
    pyvis_net.show(f'../graph/{filename}.html') 

In [30]:
def get_community_subgraph_visualise(graph, charity_name, communities, filename):
    charity_node = get_charity_by_name(graph, charity_name)
    community_num = list(communities.items())[charity_node][1][0]
    community_nodes = [node for node, community in communities.items() if community[0] == community_num]
    subgraph = graph.subgraph(community_nodes)

    visualise_subgraph(subgraph, charity_node, filename)
    return 
    

### Ego-Splitting Framework (Community Detection)

Use the Ego-Splitting community detection method as evaluated in the previous notebook.

In [31]:
ENS = karateclub.EgoNetSplitter(weight=None)

As this method alters the original graph, use a deep copy of the graph to run this method and use the communities created to visualise on the original graph.

In [32]:
G_ens = copy.deepcopy(G)
ENS.fit(G_ens)

In [33]:
communities = ENS.get_memberships()

As this algorithm caters for overlapping clusters, check whether any node has been clustered under multiple communities.

In [34]:
for node, community in communities.items():
    if len(community) > 1:
        print(f"Node {node}: Community {community}")

For this network, each node only belongs to one community.

### Point Based Recommender using Networks

While there are probablistic means for [recommending in a network](https://medium.com/dunnhumby-data-science-engineering/network-models-for-recommender-systems-7f0d6d210ccf+), this project thats a point-based approach to weigh donors to recommend based on the donor's relation to the chosen organisation in the network.

Taking D to abbreviate a donor and C for the chosen charity, points can be assigned based on the following:
1) Distance of D to C
2) Whether D resides in the same community as C
3) Centrality of D within the subgraph of C
4) Similarity of other charities connected to D

Using a chosen charity for example:

In [35]:
chosen_charity = 'IMPART LTD.'

In [36]:
node_chosen_charity = get_charity_by_name(G, chosen_charity)
node_chosen_charity

23185

Initialise an empty dictionary to contain the recommended persons. The key will be the node, and the nested dictionary will contain the scores and reasons for recommending.

In [37]:
recommendations = {}

In [38]:
def add_recommendation(node, score, reason):
    reason_str = f'({score:.3f}) {reason}\n'
    if node in recommendations:
        recommendations[node]['score'] += score
        recommendations[node]['reason'] += reason_str
        return
    else:
        recommendations[node] = {
            'name': G.nodes()[node]['name'],
            'score': score,
            'reason': reason_str
        }

#### Getting the charity's community

Recommend donors who are in the same community as the charity.

In [39]:
# Get the community number of a node
def get_community_num(node, communities):
    # As there are no overlapping communities detected currently, return the first item of the list
    community_num = list(communities.items())[node][1][0]
    return community_num

In [40]:
get_community_num(node_chosen_charity, communities)

38

In [41]:
# Get all other person nodes within the community that is not already connected to the charity
def get_other_persons_community(G, node_chosen_charity, communities):
    
    community_num = get_community_num(node_chosen_charity, communities)
    # Nodes in the same community
    community_nodes = [node for node, community in communities.items() if community[0] == community_num and 
                       # Node is not the charity itself
                       node != node_chosen_charity and 
                       # Node is not already directly connected to the charity
                       node not in G.neighbors(node_chosen_charity) and
                       # Node is a person
                       G.nodes()[node]['node-type'] == 'person']

    return community_nodes

In [42]:
persons_in_community = get_other_persons_community(G, node_chosen_charity, communities)

In [44]:
# Add 1 score for persons in the same community
for person in persons_in_community:
    add_recommendation(person, 1, 'Donor resides in the same community')

#### Getting similar charities

Get charities that are above a set threshold of similarity, recommend their donors based on the similarity score. The same donor can appear multiple times and scores are additive.

In [46]:
similarity_threshold = 0.6

In [47]:
similar_charities = df_similarities[chosen_charity][df_similarities[chosen_charity] >= similarity_threshold].sort_values(ascending=False)

In [48]:
similar_charities.name = 'similarity'

In [49]:
df_similar = pd.DataFrame(similar_charities, index=similar_charities.index)
df_similar.drop(chosen_charity, inplace=True)
df_similar.head()

Unnamed: 0_level_0,similarity
charity_name,Unnamed: 1_level_1
KAMPUNG SENANG CHARITY AND EDUCATION FOUNDATION LIMITED,0.875
PERTAPIS EDUCATION AND WELFARE CENTRE,0.857143
MONTFORT CARE,0.777778
BABES PREGNANCY CRISIS SUPPORT LTD.,0.714286
SMA CARE LIMITED,0.714286


In [50]:
# Will need to check length of df_similar for demo
for c in df_similar.index:
    # Get the score of the similar charity
    score = df_similar.loc[c]['similarity']
    # Get the neighbors (persons/donors) of the charity
    # All neighbors of any given charity are currently only persons
    charity_node = get_charity_by_name(G, c)
    similar_persons = list(G.neighbors(charity_node))
    if len(similar_persons) > 0:
        reason_str = f'Donor from similar charity: {c}'
        for p in similar_persons:
            add_recommendation(p, score, reason_str)

#### Distance of donor from charity

In [52]:
distance_threshold = 5

In [53]:
for node_person, person in recommendations.items():
    try:
        distance = nx.shortest_path_length(G, source=node_person, target=node_chosen_charity)
        # Add a flat score of 1 when donor is up to 5 degrees away; otherwise 0.5 for connected donor
        if distance <= distance_threshold: score = 1
        else: score = 0.5
        add_recommendation(node_person, score, f'Donor distance is {distance}')
        #print(f"Distance of {person['name']} is {nx.shortest_path_length(G, source=node_person, target=node_chosen_charity)}")
    # No score awarded when there is no direct path from donor to charity
    except nx.NetworkXNoPath:
        continue
        #print(f"{person['name']} has no connected path")

#### Creating donor subgraphs

In [55]:
def get_recommendation_subgraph(G, node_charity, communities, recommendations):
    community_num = get_community_num(node_charity, communities)
    # Nodes in the same community
    recommendation_nodes = [node for node, community in communities.items() if community[0] == community_num]
    
    # Add person nodes and its in-between connections if it can connect to the subgraph
    for node_person in recommendations:
        # skip if the person is already in the community subgraph
        if node_person in recommendation_nodes: continue
        try:
            shortest_paths = list(nx.all_shortest_paths(G, source=node_person, target=node_chosen_charity))
            for path in shortest_paths:
                nodes_to_add = [node for node in path if node not in recommendation_nodes]
                recommendation_nodes.extend(nodes_to_add)

        # if there is no path from the person to the charity, just add the person node
        except nx.NetworkXNoPath:
            #print(f'{node_person} is not connected')
            recommendation_nodes.append(node_person)
    
    return G.subgraph(recommendation_nodes)

        

In [56]:
recommendation_subgraph = get_recommendation_subgraph(G, node_chosen_charity, communities, recommendations)

In [57]:
recommendation_subgraph.nodes()

NodeView((24576, 24580, 22538, 11, 6155, 12303, 22545, 17, 24594, 22549, 22, 4119, 24, 25, 22551, 24602, 30, 31, 22560, 22559, 24620, 2099, 8243, 58, 24635, 4160, 24649, 8265, 12363, 22604, 24652, 12364, 24655, 6219, 2130, 22610, 4181, 2134, 2135, 22622, 2164, 121, 12409, 123, 6269, 24702, 24703, 128, 24710, 135, 22664, 12422, 4236, 2192, 4241, 4242, 4243, 4244, 4245, 24723, 4247, 151, 4249, 4250, 4251, 4252, 24733, 153, 4255, 159, 2216, 2217, 2218, 2219, 2220, 2221, 2222, 2223, 2224, 177, 2225, 2226, 2227, 12472, 8378, 198, 22730, 203, 206, 2260, 22744, 4319, 12518, 2279, 12519, 12521, 4332, 22768, 22770, 4341, 4344, 12544, 12551, 22441, 22804, 22805, 22813, 22823, 22829, 12596, 12598, 14650, 14653, 14658, 8515, 14659, 14660, 14661, 14662, 14663, 2376, 2377, 2379, 12620, 2380, 24489, 22857, 22864, 10577, 2384, 10579, 2385, 2386, 10582, 10583, 2387, 10585, 14681, 10587, 14682, 14683, 10590, 10591, 14684, 14685, 14686, 14687, 10597, 4454, 4455, 4456, 4457, 4458, 361, 22894, 22897, 10612

In [59]:
visualise_subgraph(recommendation_subgraph, node_chosen_charity, 'impart_test')

../graph/impart_test.html


#### Subgraph Centrality

Eigenvector centrality is a useful measure of connections in social networks. Add scores based on a standardised scale of the centrality within the subgraph.

In [60]:
df_centrality = pd.DataFrame.from_dict(nx.eigenvector_centrality(recommendation_subgraph), orient='index', columns=['centrality'])
df_centrality

Unnamed: 0,centrality
24576,1.362565e-01
24580,1.852147e-01
22538,3.725001e-08
11,3.149611e-06
6155,1.366270e-05
...,...
22515,1.565219e-06
18418,9.433086e-65
18419,9.433086e-65
18420,9.433086e-65


In [61]:
# Drop the charities from this centrality table
for node in df_centrality.index:
    if G.nodes()[node]['node-type'] == 'entity':
        df_centrality.drop(node, inplace=True)

In [62]:
existing_neighbors = list(G.neighbors(node_chosen_charity))
existing_neighbors

[3380, 9200, 9201, 9202, 9203, 9204, 9205]

In [63]:
for node in existing_neighbors:
    if node in df_centrality.index:
        df_centrality.drop(node, inplace=True)

In [64]:
df_centrality.shape

(682, 1)

In [65]:
max_centrality = df_centrality['centrality'].max()
min_centrality = df_centrality['centrality'].min()

In [66]:
df_centrality['centrality'] = (df_centrality['centrality']  - min_centrality) / (max_centrality - min_centrality)

In [67]:
df_centrality.sort_values(by='centrality', ascending=False)

Unnamed: 0,centrality
4249,1.000000
4244,1.000000
4236,0.828991
4255,0.828991
4252,0.828991
...,...
21663,0.000000
9405,0.000000
9406,0.000000
9408,0.000000


In [68]:
for node in df_centrality.index: 
    score = df_centrality.loc[node]['centrality']
    add_recommendation(node, score, f'Donor scaled eigenvector centrality is {score}')

---

## Part 4: Visualising Recommendations

Visualise the recommended donors in a pyvis network graph. Autumn color palette is used where the higher recommended donors are colored more red, and less recommended donors are more yellow.

In [71]:
def label_node_score_color(node, node_charity, recommendations, max_score, min_score):
    if node == node_charity: return '#c6e6ee' #'#34eb9e'
    if node not in recommendations: return '#d7d9d8'
    score = recommendations[node]['score']
    # Normalize the score to [0, 1]
    n_score = (score - min_score) / (max_score - min_score)
    # Use the warm cool color gradient
    hex_color = mcolors.to_hex(matplotlib.colormaps['autumn'](1-n_score))

    return hex_color

In [72]:
# Returns the name of person / charity
def label_node_score_name(node, recommendations, node_num):
    if node.get('node-type') == 'entity':
        return node.get('charity_name')
    if node_num not in recommendations:
        return node.get('name')
    else:
        return f"{recommendations[node_num]['score']:.3f} {node.get('name')}"


In [73]:
def visualise_recommendation(subgraph, node_charity, recommendations, filename):
    scores = [value['score'] for value in recommendations.values()]
    max_score = max(scores)
    min_score = min(scores)
    
    for n in subgraph.nodes(data=True):
        n[1]['label'] = label_node_score_name(n[1], recommendations, n[0])
        n[1]['shape'] = label_node_shape(n[1])
        n[1]['color'] = label_node_score_color(n[0], node_charity, recommendations, max_score, min_score)
        
    pyvis_net = Network(notebook=True, cdn_resources='remote') 
    pyvis_net.from_nx(subgraph)
    #pyvis_net.toggle_physics(False)
    pyvis_net.show_buttons(filter_=['physics'])
    pyvis_net.show(f'../graph/{filename}.html') 

In [74]:
visualise_recommendation(recommendation_subgraph, node_chosen_charity, recommendations, 'impart_reqs')

../graph/impart_reqs.html


In [75]:
df_recommendations = pd.DataFrame.from_dict(recommendations, orient='index', columns=['name', 'score', 'reason'])

In [76]:
df_recommendations.head()

Unnamed: 0,name,score,reason
3130,MAITLAND TANYA MEI SIAN,2.63637,(1.000) Donor resides in the same community\n(...
3131,CHENG KUANG KUO @JAMES CHENG KUANG KUO,2.136367,(1.000) Donor resides in the same community\n(...
3132,JOSEPH LEONG JERN-YI (LIANG ZHENYI),2.136367,(1.000) Donor resides in the same community\n(...
3133,SALLY HO TWA MOI,2.136367,(1.000) Donor resides in the same community\n(...
3310,TAN KHENG BOON EUGENE,2.000085,(1.000) Donor resides in the same community\n(...


In [77]:
df_recommendations.sort_values(by='score', ascending=False)

Unnamed: 0,name,score,reason
3130,MAITLAND TANYA MEI SIAN,2.636370e+00,(1.000) Donor resides in the same community\n(...
8178,TAN SEE LENG,2.291686e+00,(0.667) Donor from similar charity: THE HUT LI...
3132,JOSEPH LEONG JERN-YI (LIANG ZHENYI),2.136367e+00,(1.000) Donor resides in the same community\n(...
3133,SALLY HO TWA MOI,2.136367e+00,(1.000) Donor resides in the same community\n(...
3131,CHENG KUANG KUO @JAMES CHENG KUANG KUO,2.136367e+00,(1.000) Donor resides in the same community\n(...
...,...,...,...
3581,TAI WEN LIANG DENNES,6.202538e-07,(0.000) Donor scaled eigenvector centrality is...
3585,LEE PENG ONG @ DAVID LEE,6.202538e-07,(0.000) Donor scaled eigenvector centrality is...
9418,LING KIN HUAT,1.494647e-07,(0.000) Donor scaled eigenvector centrality is...
11324,QUEK KWANG THANG,9.131008e-08,(0.000) Donor scaled eigenvector centrality is...


---

## Part 5: Conclusion

This project has explored the use of a social network recommender to generate new donor leads if given a limited dataset of donors and charities through unsupervised methods.

It uses concepts of:
* Community detection
* Shortest path
* Centrality
* Similarity

to provide recommendation of donors through the network and features of charities available. While multiple of each concepts have been explored, there is plenty of possible future work by exploring further depth and range of available concepts to fine tuning the recommender. 