# VK Network Analysis Project


# 1. Network Summary

## Network Source
The source of network is vk and I used vk_api python module `https://pypi.org/project/vk-api/` to load the data from vk. I used vk account of a friend of mine as I created recently my vk account and I have only 24 or 25 friends so It's a few number of nodes. The vk account I used has 550 friends which is 550 number of nodes. This is link to the account used `https://vk.com/artur__avetisyan`

In [1]:
from __future__ import division
import vk_api
import networkx as nx
import matplotlib.pyplot as plt
import numpy as np
import scipy.spatial as spt
import pandas as pd
from collections import Counter

In [2]:
username = "funky1998@yahoo.com"
password = 'Ehab2010'
vk_session = vk_api.VkApi(username, password)
vk_session.auth()
vk = vk_session.get_api()
friends_ids = vk.friends.get()["items"]

## Loading & Preprocessing 

In [3]:
graph_realtions = {}
deleted_friends = []

for friend_id in friends_ids:
    try:
        graph_realtions[friend_id] = vk.friends.get(user_id = friend_id)["items"]
    except:
        deleted_friends.append(friend_id)

## setting the nodes and attributes of the graph
In this section, I initialize the graph nodes, pull the rest of the data for each user, setting graph attributes and initializing a dataframe for the user data to do a summary.

In [4]:
G = nx.Graph(directed=False)
for node_i in graph_realtions:
    G.add_node(node_i)
    for node_j in graph_realtions[node_i]:
        if node_i != node_j and node_i in friends_ids and node_j in friends_ids:
            G.add_edge(node_i, node_j)

In [5]:
fields = ['first_name', 'last_name', 'sex', 'relation', 'city', 'education', 'personal']
fields_str = ','.join(fields)

users_data = []
for node in G.nodes:
    users_data.append( vk.users.get(user_ids=str(node), fields=fields_str, lang='en')[0])

In [6]:
graph_data = {
    'id': [user['id'] for user in users_data],
    'name': [user['first_name']+' '+user['last_name'] for user in users_data],
    'sex': [user['sex'] for user in users_data],
    'city': [user['city']['title'] if 'city' in user else None for user in users_data],
    'university': [user['university'] if 'university' in user else 0 for user in users_data],
    'relationship': [user['relation'] if 'relation' in user else 0 for user in users_data],
    'friends': [],
    'smoking': [user['personal']['smoking'] if 'personal' in user and 'smoking' in user['personal'] else 0 for user in users_data],
    'alcohol': [user['personal']['alcohol'] if 'personal' in user and 'alcohol' in user['personal'] else 0 for user in users_data],
    'life_main': [user['personal']['life_main'] if 'personal' in user and 'life_main' in user['personal'] else 0 for user in users_data],
    'people_main': [user['personal']['people_main'] if 'personal' in user and 'people_main' in user['personal'] else 0 for user in users_data]
}

In [7]:
friends_count = []
for i in graph_data['id']:
    friends_count.append(len(graph_realtions[i]))
graph_data['friends'] = friends_count

In [8]:
df = pd.DataFrame(data=graph_data)

In [9]:
df.head()

Unnamed: 0,id,name,sex,city,university,relationship,friends,smoking,alcohol,life_main,people_main
0,18218517,Yana Mukhina,1,Saint Petersburg,17,0,271,0,0,5,0
1,66881193,Regina Fakhreeva,1,Ufa,0,0,194,0,0,0,0
2,77908220,Sofia Gorelkina,1,Saint Petersburg,1,0,297,0,0,6,1
3,175409628,Maxim Beresnev,2,Saint Petersburg,17,0,157,2,0,0,0
4,36810226,Ivan Griga,2,,0,0,354,0,0,0,0


In [10]:
# setting the attributes for the nodes
for key in df:
    if key == 'id':
        continue
    attribute = dict(zip(list(df['id']), list(df[key])))
    nx.set_node_attributes(G, attribute, key)

## Size and Order

In [11]:
print('Number of Nodes: ', G.number_of_nodes())
print('Number of Edges: ', G.number_of_edges())
print('Number of Connected components: ', nx.number_connected_components(G))

Number of Nodes:  25
Number of Edges:  27
Number of Connected components:  13


The number of nodes is the the number of friends of the profile used for the analysis and It's less than the total number from the profile because there are some profiles that are blocked or deleted from VK, so they are not included in the analysis. Number of edges is the number of connections formed between nodes.

In [16]:
c_components = list(nx.connected_components(G))
for i in range(len(c_components)):
    print('component {0} has {1} nodes'.format(i+1, len(c_components[i])))

component 1 has 4 nodes
component 2 has 9 nodes
component 3 has 1 nodes
component 4 has 1 nodes
component 5 has 1 nodes
component 6 has 2 nodes
component 7 has 1 nodes
component 8 has 1 nodes
component 9 has 1 nodes
component 10 has 1 nodes
component 11 has 1 nodes
component 12 has 1 nodes
component 13 has 1 nodes


## Diameter and Radius

In [18]:
sub_G = G.subgraph(c_components[1])