### 1st Virtual Methods Seminar: Methods of Computational Social Science
## Introduction to Social Network Science with Python
# Cohesion - Exercise 3.2
Instructors: Haiko Lietz & Olga Zagovora

Date: September 23, 2020
## Packages

In [None]:
import sys
libs_path = '../../code/libs/'
sys.path.append(libs_path)
import compsoc as cs

In [None]:
import matplotlib.pyplot as plt
import networkx as nx
import pandas as pd
import matplotlib.cm as cm

## Functions

In [None]:
def highlights(g, nodes):
    d = dict(zip(g.nodes, g.number_of_nodes()*['white']))
    for node in nodes:
        d.update({node: 'black'})
    return d

## Introduction
In this exercise, we will take another look at the Copenhagen Networks Study collection to detect cores graph theoretically and by filtering. Load the dataset using the `copenhagen_collection()` function:

In [None]:
users, genders, bluetooth, calls, sms, facebook_friends = cs.copenhagen_collection(path='../../data/copenhagen/')

We need just a little bit of preprocessing. For the `bluetooth` data, we sum up the signal strengths and remove signals with zero strength:

In [None]:
bluetooth = bluetooth[['user_id_from', 'user_id_to', 'strength']].groupby(['user_id_from', 'user_id_to']).sum().reset_index()
bluetooth = bluetooth[bluetooth['strength'] > 0]

The `facebook_friends` dataframe needs a unit weight so the dataframe meets the data format expectations:

In [None]:
facebook_friends['weight'] = 1

## Exercise 1
The **bluetooth** data is a weighted undirected graph. Filter weak **summed signals** to identify the persons that are most strongly co-located. Draw all largest bicomponents of the filtered graph.

Hints:
- The `nx.biconnected_components()` method returns a list of sets. To transform this list into a union set, find a solution [here](https://stackoverflow.com/questions/31253109/how-can-i-find-the-union-on-a-list-of-sets-in-python).
- Extract all bicomponents by extracting a `subgraph()`.

## Exercise 2
Construct a simple (undirected) **facebook friends** graph, remove self-loops (for some strange reason some users are friends of themselves), extract the largest connected component, store a `spring_layout()` in a vertex property variable, and draw the graph.

Then identify the network core via $k$-core decomposition. Create a variable $k$ and mark the largest $k$-core using the `highlights()` function (given above, as in the demo).

Hint: You can remove self-loops by calling `G.remove_edges_from(nx.selfloop_edges(G))`.

For later: Can you also color each node by its core number, using the "hot" [colormap](https://matplotlib.org/tutorials/colors/colormaps.html) of matplotlib? That means, nodes should have "hotter" colors the more they belong to the core.