# Getting Started with Graspologic
Graspologic brings a lot of novel network statistics algorithms to the table, but what if you're just getting started? What if you're not even sure how, or where, to start?

This tutorial is intended to help you get started with some of the tools and capabilities within Graspologic.  We'll focus primarily on using Graspologic through the `pipeline` module, which gives a slightly simpler and opinionated but less flexible approach toward the capabilities within the Graspologic library.

The following tutorial is going to follow a simple flow.

We're going to take an example edge list, load it into a [networkx Graph](https://networkx.org/documentation/stable/reference/classes/index.html) object, run an automatic layout step so we have an idea how and where similar nodes cluster together and connect to each other, identify some communities via one of the community detection algorithms, embed our graph, and load our embeddings into a nearest neighbor object and retrieve some of the most similar nodes for a small set of nodes.



## Graph




In [None]:
data_file = "graph_name.csv"

import networkx as nx

graph = nx.Graph()
with open(data_file, "r") as edges_io:
    next(edges_io)  # skips the header line in the file
    for line in edges_io:
        # usually using the csv module is more ideal, but we know this works and I want to keep it simple
        source, target, weight = line.strip().split(",")
        weight = float(weight)
        # the weight kwarg gets attached to the edge as an attribute in the edge's data dictionary.
        # this data dictionary can contain anything, but `weight` is a commonly used attribute that indicates
        # the strength of the connection
        graph.add_edge(source, target, weight=weight)


## Visualize

Automatic visualizations of networks is a surprisingly difficult problem. Some of the most common layout methods either do not scale well with graph size or require a lot of per-network fine-tuning of arguments that make it feel like an art form than a science.

In an attempt to get reasonable (or at least acceptable) layouts in a completely undirected manner, we have built a way that first embeds the nodes of the network into a moderate 128-dimensional space using node2vec, and then use [UMAP](https://umap-learn.readthedocs.io/en/latest/) to downproject these 128 dimensions into 2 dimensions; x and y coordinates.

Then, to avoid too much node occlusion, we run a quad-tree based no-overlap algorithm over the 2 dimensional positions and display the layout- and all of this comes in just two function calls.

In [None]:
import graspologic as gc

working_graph, positions = gc.layouts.layout_umap(graph)
# working graph may differ from `graph` - this automatic layout mechanism will try to keep our graph size under
# a default maximum edge count of 10m edges (and achieves this by pruning the lowest weight edges until it gets to around 10m in size)
# it also doesn't work super well if we have more than one connected component, so it only works on the largest
# connected component in the graph

gc.layouts.show_graph(working_graph, positions) 
# in jupyter and interpreter, we show away - but if we wanted to save, we could use `gc.layouts.save_graph` 
# and save a png

## Communities

In networks, the word `communities` describes a group of nodes that are more densely connected to each other and more loosely connected to others. If you think about social networks, it should be pretty obvious that some groups of people have stronger connections to each other than to everyone else in the network, and it should also be obvious that there are some individuals that could conceivably be part of many communities.

In the next step, we're going to define a community structure to mean "every node in the network is in precisely one community". I just referenced how there are circumstances in which you could arguably belong to many communities; the [Leiden](https://arxiv.org/abs/1810.08473) algorithm we are going to use just attempts to find the best community for us by maximizing [modularity](https://en.wikipedia.org/wiki/Modularity_(networks)).  In other words, these communities should be _reasonable_ assertions, if not objectively optimal.

In [None]:
import graspologic as gc

# partitions is a dictionary of node_id -> partition/community id (int)
partitions = gc.partition.leiden(graph) # note that we're doing this to the entire graph, not the working_graph from above

# look up a few nodes, show their communities, show some other nodes in their communities?
community_lists = {}
for node_id, partition_id in partitions.items():
    community_list = community_lists.get(node_id, [])
    community_list.append(node_id)
    community_lists[node_id] = community_list

community_populations = {partition_id: len(community_list) for partition_id, community_list in community_lists.items()}


## Embedding for Link Prediction

Our last task of the tutorial is going to be to create an embedding, using [node2vec](https://arxiv.org/abs/1607.00653), and then use the latent positions generated to populate a [Nearest Neighbors](https://scikit-learn.org/stable/modules/neighbors.html) model. 

An exceptionally short explanation of the premise is that if the latent position correlated to a node in high dimensional space is _relatively_ close to another latent position, then the nodes are likely similar. Probably.

In [None]:
embedding, node_labels = gc.embed.node2vec_embed(graph)

from sklearn.neighbors import NearestNeighbors

neighbors = NearestNeighbors(n_neighbors=11)
neighbors.fit(embedding)

node_embedding = embedding[134] # some random embedding

nearest_distances, nearest_indices = neighbors.kneighbors([node_embedding])



# Epilogue

This tutorial is **not** intended to be the optimal path through graph machine learning. At every step of the way, there are plausible, reasonable, and arguably superior routes you can or should take; some dependant upon the data you have, some upon the task you're trying to solve, some upon the system resources you have for the data you have... it's non-trivial work ahead of you.

However, it's my hope that tutorial has helped you to start the process of exploring GraphML with Graspologic.