# Collaboration network

Using the library graph-tool we'll contruct the collaboration network of scientists of computer science from 2014. This is a non-directed bipartite graph connecting authors to the papers they have published. We want to get instead a graph were nodes are authors and an edge between two nodes exists only if they have collaborated in at least **5 different papers** together.

In order to work with a smaller graph (~7 Million nodes is too much for now), let's just take authors with at least 50 published papers.

0. Import graph-tool with `import graph_tool.all as gt`.
1. Get info about the graph to be used with `gt.collection.ns_info['dblp_author_paper']`.
2. Create a graph `g` with `gt.collection.ns['dblp_author_paper']`.
3. Let's get the two sets of nodes in our bipartite graph with `is_bi, part = gt.is_bipartite(g, partition=True`).
4. Get the degrees in a variable `deg` with `g.degree_property_map('out')`.


### Filter nodes

Create a new function that takes a node as an argument and returns `True` if the node partition `part[node]` is equal to 0 **and** its degree is greater or equal to 50 with `deg[node] >= 20`. The same function should return `True` everytime `part[node]==1` as well. Call the function `first_filter`.

Create a new `gt.GraphView` (a subgraph) of `g` in a variable `h` using the function `first_filter` as `h = gt.GraphView(g, vfilt=first_filter)`.

Get the degrees of `h` in `deg_h` using `h.degree_property_map('out')`.

Filter the graph again to get only the nodes of `h` with a degree greater or equal to 1 with `gt.GraphView(h, vfilt=lambda v: deg_h[v] >= 1)`.

Now we only have people with at least 20 articles. Now let's create a new graph `p` with the _projection_ of our bipartite graph `h`. This graph will contain only nodes with `part[node] == 0` and there will be an edge connecting two nodes if they shared a neighbor in the graph `h`.

Understand and use the code bellow to get that graph `p`.

In [83]:
edges = []
for u in h.iter_vertices():
    node_u = p.add_vertex()
    if part[u] == 0:
        for paper in h.get_all_neighbors(u):
            for v in h.get_all_neighbors(paper):
                e = tuple(sorted((u, v)))
                if u != v and e not in edges:
                    edges.append(e)

Convert the list `edges` to a `numpy.array` import numpy as np and using `np.array(edges)`.

Create an empty graph `p` and add edges from `edges` with `p.add_edge_list(edges, hashed=True, hash_type=int)`.

You can draw the graph to see that it has more than one connected component (CC). Filter `p` to get the largest CC with `cc = gt.GraphView(p, gt.label_largest_component(p))`.

Now draw `cc` according to core number. First set `kcore` to be equal to `gt.kcore_decomposition(p)`. Then use `gt.graph_draw(cc, vertex_fill_color=kcore, vertex_size=kcore)`.