# Introduction

## Understanding key connectors

We have a list of users with ID (number) and Name (string) information as our Database. Each user is represented by a dictionary (dict).

In [112]:
users = [
    { "id": 0, "name": "Hero" },
    { "id": 1, "name": "Dunn" },
    { "id": 2, "name": "Sue" },
    { "id": 3, "name": "Chi" },
    { "id": 4, "name": "Thor" },
    { "id": 5, "name": "Clive" },
    { "id": 6, "name": "Hicks" },
    { "id": 7, "name": "Devin" },
    { "id": 8, "name": "Kate" },
    { "id": 9, "name": "Klein" }
]

To represent friendships between user, we have a list of tuples as if (0,1) it means that user with IDs 0 and user 1 are friends.

In [113]:
friendships = [(0, 1), (0, 2), (1, 2), (1, 3), (2, 3), (3, 4), (4, 5), (5, 6), (5, 7), (6, 8), (7, 8), (8, 9)]

In order to get an easy understading of the Data that we are dealing with, using the list above, we should create a list of friends for each user in the users dict. That way, we should have a Tree or Graph type of structure.

In [114]:
for user in users:
    user["friends"] = []
    
for i, j in friendships:
    users[i]["friends"].append(users[j])
    users[j]["friends"].append(users[i])
    
users

[{'id': 0,
  'name': 'Hero',
  'friends': [{'id': 1,
    'name': 'Dunn',
    'friends': [{...},
     {'id': 2,
      'name': 'Sue',
      'friends': [{...},
       {...},
       {'id': 3,
        'name': 'Chi',
        'friends': [{...},
         {...},
         {'id': 4,
          'name': 'Thor',
          'friends': [{...},
           {'id': 5,
            'name': 'Clive',
            'friends': [{...},
             {'id': 6,
              'name': 'Hicks',
              'friends': [{...},
               {'id': 8,
                'name': 'Kate',
                'friends': [{...},
                 {'id': 7, 'name': 'Devin', 'friends': [{...}, {...}]},
                 {'id': 9, 'name': 'Klein', 'friends': [{...}]}]}]},
             {'id': 7,
              'name': 'Devin',
              'friends': [{...},
               {'id': 8,
                'name': 'Kate',
                'friends': [{'id': 6,
                  'name': 'Hicks',
                  'friends': [{...}, {...}]},
        

### Getting some metrics 

To facilitate the process, we must create a function to retrieve the number of friends a user have. For that, we just need to check the length of the `friends` list in Database.

In [115]:
def number_of_friends(user):
    return len(user["friends"])

print(f"Number of friends of Hero: {number_of_friends(users[0])}")

Number of friends of Hero: 2


#### Total connections

In [116]:
total_connections = sum(number_of_friends(user) for user in users)
total_connections

24

#### Average number of friends by User

In [117]:
from __future__ import division
num_users = len(users)
avg_connections = total_connections / num_users
avg_connections

2.4

#### Listing users by their number of friends

In [118]:
num_friends_by_id = [(user["id"], number_of_friends(user)) for user in users]
num_friends_by_id.sort(key=lambda user : user[1], reverse=True)
num_friends_by_id

[(1, 3),
 (2, 3),
 (3, 3),
 (5, 3),
 (8, 3),
 (0, 2),
 (4, 2),
 (6, 2),
 (7, 2),
 (9, 1)]

#### Friends of a Friend

In [119]:
# Responsible to get the IDs from friends of friends of a specific user
def friends_of_friend_ids(user):
    # foaf means "friend of a friend"
    return [foaf["id"] for friend in user["friends"] for foaf in friend["friends"]]

print(f"Friends of a friend from Hero: {friends_of_friend_ids(users[0])}")

Friends of a friend from Hero: [0, 2, 3, 0, 1, 3]


You can see:

- User with ID 0 is included twice, as Hero (User 0) truly is friend of both.
- User 1 (Dunn) and 2 (Sue) appear, even though they are friends with Hero.
- User 3 (Chi) appears twice because it's reachable from both friends Dunn and Sue.

In [120]:
heros_friends = [friend['id'] for friend in users[0]['friends']]
print(f"Friends of Hero (0): {heros_friends}")
dunns_friends = [friend['id'] for friend in users[1]['friends']]
print(f"Friends of Dunn (1): {dunns_friends}")
sues_friends = [friend['id'] for friend in users[2]['friends']]
print(f"Friends of Sue (2): {sues_friends}")

Friends of Hero (0): [1, 2]
Friends of Dunn (1): [0, 2, 3]
Friends of Sue (2): [0, 1, 3]


#### Counting mutual friends

In [121]:
from collections import Counter

In [122]:
# Responsible for checking if two users are not the same
def user_is_not_the_same(user, other_user):
    return user["id"] != other_user["id"]

print(f"Hero and Dunn are different users: {user_is_not_the_same(users[0], users[1])}")
print(f"User 0 and User 0 (Hero) are different users: {user_is_not_the_same(users[0], users[0])}")

Hero and Dunn are different users: True
User 0 and User 0 (Hero) are different users: False


In [123]:
# Responsible for 
def user_is_not_friends_with(user, other_user):
    return all(user_is_not_the_same(friend, other_user) for friend in user["friends"])

print(f"Hero and Dunn are not friends: {user_is_not_friends_with(users[0], users[1])}")
print(f"Hero and Klein are not friends: {user_is_not_friends_with(users[0], users[9])}")

Hero and Dunn are not friends: False
Hero and Klein are not friends: True


In [124]:
def count_mutual_friends(user):
    return Counter(foaf["id"]
                  for friend in user["friends"]
                  for foaf in friend["friends"]
                  if user_is_not_the_same(user, foaf) and user_is_not_friends_with(user, foaf))

print(f"Hero's Mutual friends: {count_mutual_friends(users[0])}")
print(f"Chi's Mutual friends: {count_mutual_friends(users[3])}")

Hero's Mutual friends: Counter({3: 2})
Chi's Mutual friends: Counter({0: 2, 5: 1})


This is showing that Hero has 2 mutual friends with Chi (User 3) and Chi has 1 mutual friend with Clive (User 5).

#### Finding people with the same interest

In [125]:
interests = [
    (0, "Hadoop"), (0, "Big Data"), (0, "HBase"), (0, "Java"), (0, "Spark"), (0, "Storm"), (0, "Cassandra"),
    (1, "NoSQL"), (1, "MongoDB"), (1, "Cassandra"), (1, "HBase"), (1, "Postgres"),
    (2, "Python"), (2, "scikit-learn"), (2, "scipy"), (2, "numpy"), (2, "statsmodels"), (2, "pandas"),
    (3, "R"), (3, "Python"), (3, "statistics"), (3, "regression"), (3, "probability"),
    (4, "machine learning"), (4, "regression"), (4, "decision trees"), (4, "libsvm"),
    (5, "Python"), (5, "R"), (5, "Java"), (5, "C++"), (5, "Haskell"), (5, "programming languages"),
    (6, "statistics"), (6, "probability"), (6, "mathematics"), (6, "theory"),
    (7, "machine learning"), (7, "scikit-learn"), (7, "Mahout"), (7, "neural networks"),
    (8, "neural networks"), (8, "deep learning"), (8, "Big Data"), (8, "artificial intelligence"),
    (9, "Hadoop"), (9, "Java"), (9, "MapReduce"),  (9, "Big Data")
]

In [126]:
# Responsible for getting a list of users (IDs) that are interested in a specific topic
def users_who_are_interested_in(target_interest):
    return [user_id for user_id, user_interest in interests if user_interest == target_interest]

users_who_are_interested_in_machine_learning = users_who_are_interested_in("machine learning")
print(f"Users who are interested in Machine Learning: {users_who_are_interested_in_machine_learning}")
users_who_are_interested_in_python = users_who_are_interested_in("Python")
print(f"Users who are interested in Python: {users_who_are_interested_in_python}")

Users who are interested in Machine Learning: [4, 7]
Users who are interested in Python: [2, 3, 5]


We can create a dictionary with informations grouping users by interests or the other way around.

In [127]:
from collections import defaultdict

In [128]:
user_ids_by_interest = defaultdict(list) # creates an empty dict where every element is a list

for user_id, interest in interests:
    user_ids_by_interest[interest].append(user_id)
    
print(user_ids_by_interest)

defaultdict(<class 'list'>, {'Hadoop': [0, 9], 'Big Data': [0, 8, 9], 'HBase': [0, 1], 'Java': [0, 5, 9], 'Spark': [0], 'Storm': [0], 'Cassandra': [0, 1], 'NoSQL': [1], 'MongoDB': [1], 'Postgres': [1], 'Python': [2, 3, 5], 'scikit-learn': [2, 7], 'scipy': [2], 'numpy': [2], 'statsmodels': [2], 'pandas': [2], 'R': [3, 5], 'statistics': [3, 6], 'regression': [3, 4], 'probability': [3, 6], 'machine learning': [4, 7], 'decision trees': [4], 'libsvm': [4], 'C++': [5], 'Haskell': [5], 'programming languages': [5], 'mathematics': [6], 'theory': [6], 'Mahout': [7], 'neural networks': [7, 8], 'deep learning': [8], 'artificial intelligence': [8], 'MapReduce': [9]})


In [129]:
interests_by_user_id = defaultdict(list) # creates an empty dict where every element is a list

for user_id, interest in interests:
    interests_by_user_id[user_id].append(interest)
    
print(interests_by_user_id)

defaultdict(<class 'list'>, {0: ['Hadoop', 'Big Data', 'HBase', 'Java', 'Spark', 'Storm', 'Cassandra'], 1: ['NoSQL', 'MongoDB', 'Cassandra', 'HBase', 'Postgres'], 2: ['Python', 'scikit-learn', 'scipy', 'numpy', 'statsmodels', 'pandas'], 3: ['R', 'Python', 'statistics', 'regression', 'probability'], 4: ['machine learning', 'regression', 'decision trees', 'libsvm'], 5: ['Python', 'R', 'Java', 'C++', 'Haskell', 'programming languages'], 6: ['statistics', 'probability', 'mathematics', 'theory'], 7: ['machine learning', 'scikit-learn', 'Mahout', 'neural networks'], 8: ['neural networks', 'deep learning', 'Big Data', 'artificial intelligence'], 9: ['Hadoop', 'Java', 'MapReduce', 'Big Data']})


In [130]:
def most_common_interests_with(user):
    return Counter(interested_user_id
                  for interest in interests_by_user_id[user["id"]]
                  for interested_user_id in user_ids_by_interest[interest]
                  if interested_user_id != user["id"])

print(f"Users with most common interests as Hero: {most_common_interests_with(users[0])}")

Users with most common interests as Hero: Counter({9: 3, 1: 2, 8: 1, 5: 1})


As you can see, User 9 (Klein) is the one that has 3 interests in common with Hero (User 0), which is the biggest number of interests of all.