## Finding Key Connectors

In [2]:
import numpy as np
import matplotlib.pyplot as plt

* data dump
    * consists of a list of users, each represented by a dict that contains for each user his or her id (which is a number) and name (which, in one of the great cosmic coincidences, rhymes with the user’s id):

In [3]:
users = [
{ "id": 0, "name": "Hero" },
{ "id": 1, "name": "Dunn" },
{ "id": 2, "name": "Sue" },
{ "id": 3, "name": "Chi" },
{ "id": 4, "name": "Thor" },
{ "id": 5, "name": "Clive" },
{ "id": 6, "name": "Hicks" },
{ "id": 7, "name": "Devin" },
{ "id": 8, "name": "Kate" },
{ "id": 9, "name": "Klein" }
]

#“friendship” data, represented as a list of pairs of IDs:

friendships = [(0, 1), (0, 2), (1, 2), (1, 3), (2, 3), (3, 4),
(4, 5), (5, 6), (5, 7), (6, 8), (7, 8), (8, 9)]


* the tuple (0, 1) indicates that the data scientist with id 0 (Hero) and the data scientist with id 1 (Dunn) are friends
![](https://www.oreilly.com/library/view/data-science-from/9781491901410/assets/dsfs_0101.png)

In [4]:
# to add a list of friends to each user
#first we set each user’s friends property to an empty list
for user in users:
    user["friends"] = [] #creating a new key within each user dictionary for "friends" list

#and then we populate the lists using the friendships data
for i, j in friendships:
    #users[i] is the user whose id is i within the tuples of friendship
    users[i]["friends"].append(users[j]) # add i as a friend of j
    users[j]["friends"].append(users[i]) # add j as a friend of i

* Once each user dict contains a list of friends, we can easily ask questions of our graph, like “what’s the average number of connections?”

In [5]:
#first we find the total number of connections, by summing up the lengths of all the friends lists
def number_of_friends(user): #function to return the length of the list in the value of the "friends" key for each user within users
    return len(user["friends"])
total_connections = sum(number_of_friends(user)for user in users) #stores number of all "friends" connections by summing up the lengths returned by function number_of_friends for each user

In [6]:
#and then we just divide by the number of users:
from __future__ import division #will change the / operator to mean true division throughout the module
num_users = len(users) #stores total number of users
avg_connections = total_connections / num_users #calculates average number of connections across the users

* the most connected people — are those who have the largest
number of friends
* Since there aren’t very many users, we can sort them from “most friends” to “least friends”

In [7]:
#code to iterate through list of dictionaries and access/return value for specific key
'''
user_id=[]
for user in users:
    for key,value in user.items():
        if key=="id":
            user_id.append(value)
print(user_id)
'''

'\nuser_id=[]\nfor user in users:\n    for key,value in user.items():\n        if key=="id":\n            user_id.append(value)\nprint(user_id)\n'

In [8]:
from operator import itemgetter
# create a list (user_id, number_of_friends)
num_friends_by_id = [(user["id"], number_of_friends(user))for user in users]
#sort by number of friends (most->least) which is the second argument in the tuple, num_friends_by_id[1]
num_friends_by_id.sort(key=lambda num_friends:num_friends[1],reverse=True)
#each pair is (user_id, num_friends)
print(num_friends_by_id)

[(1, 3), (2, 3), (3, 3), (5, 3), (8, 3), (0, 2), (4, 2), (6, 2), (7, 2), (9, 1)]


* we’ve just computed the network metric degree centrality
    * the degree centrality of a node is simply its degree—the number of edges it has. The higher the degree, the more central the node is. This can be an effective measure, since many nodes with high degrees also have high centrality by other measures

* in the network user_id 4 only has two connections while user_id 1 has three, yet looking at the network it intuitively seems like Thor should be more central

![](https://www.oreilly.com/library/view/data-science-from/9781491901410/assets/dsfs_0102.png)

## Chapter 2

*design a “Data Scientists You May Know” suggester, suggest that a user might know the friends of friends

* these are easy to compute:
    * for each of a user’s friends, iterate over that person’s friends and collect all the results

In [10]:
def friends_of_friend_ids_bad(user):
    # "foaf" is short for "friend of a friend"
    return [foaf["id"]
            #for each of user's friends
            for friend in user["friends"]
            #get each of _their_ friends
            for foaf in friend["friends"]]
friends_of_friend_ids_bad(users[0])

[0, 2, 3, 0, 1, 3]

* it includes user 0 (twice), since Hero is indeed friends with both of his friends.
* it includes users 1 and 2, although they are both friends with Hero already
* and it includes user 3 twice, as Chi is reachable through two different friends

In [13]:
print([friend["id"] for friend in users[0]["friends"]])
print([friend["id"] for friend in users[1]["friends"]])
print([friend["id"] for friend in users[2]["friends"]])

[1, 2]
[0, 2, 3]
[0, 1, 3]


* maybe instead we should produce a count of mutual friends. And we
definitely should use a helper function to exclude people already known to the user

In [None]:
from collections import Counter

def not_the_same(user, other_user):
    #two users are not the same if they have different ids
    return user["id"] != other_user["id"]
def not_friends(user, other_user):
    #other_user is not a friend if he's not in user["friends"];
    #that is, if he's not_the_same as all the people in user["friends"]
    return all(not_the_same(friend, other_user)
               for friend in user["friends"])
def friends_of_friend_ids(user):
    return Counter(foaf["id"]
                   for friend in user["friends"] # for each of my friends
                   for foaf in friend["friends"] # count *their* friends
                   if not_the_same(user, foaf) # who aren't me
                   and not_friends(user, foaf)) # and aren't my friends
print(friends_of_friend_ids(users[3]))