# Accelerating NetworkX via `nx-cugraph` Backend

The easiest way to enable `nx-cugraph`:
- set `NX_CUGRAPH_AUTOCONFIG` environment variable before importing NetworkX

That's it!

Zero code change acceleration on a NVIDIA GPU is ready to go.

In [None]:
# If you're curious (and patient), try to run the notebook without running this cell to run with pure NetworkX
%env NX_CUGRAPH_AUTOCONFIG=True

In [None]:
import networkx as nx
import numpy as np
import pandas as pd
nx.config.warnings_to_ignore.add("cache")  # Ignore caching warnings

Display info about available GPUs and current CUDA version:

In [None]:
!nvidia-smi

### Let's explore the Pokec social network dataset
Download and unzip from [SNAP  repository](https://snap.stanford.edu/data/soc-Pokec.html)

> Pokec is the most popular Slovak on-line social network. These datasets
are anonymized and contains relationships and user profile data of the
whole network. Profile data are in Slovak language. Friendships in the
Pokec network are oriented. Datasets were crawled during MAY 25-27 2012.
>
> Author: Lubos Takac, lubos.takac@gmail.com

In [None]:
%%bash
if [[ ! -f soc-pokec-relationships.txt ]]; then
  wget -nc -q "https://snap.stanford.edu/data/soc-pokec-relationships.txt.gz"
  wget -nc -q "https://snap.stanford.edu/data/soc-pokec-profiles.txt.gz"
  wget -nc -q "https://snap.stanford.edu/data/soc-pokec-readme.txt"
  gunzip *.gz
else
  echo "pokec dataset already downloaded :)"
fi

In [None]:
# Show size of files
!du -csh soc-pokec*

In [None]:
# Uncomment to look at the README
# !cat soc-pokec-readme.txt

In [None]:
edgelist_filepath = "soc-pokec-relationships.txt"
profiles_filepath = "soc-pokec-profiles.txt"
readme_filepath = "soc-pokec-readme.txt"

Load profile data; each row is a user's Pocek profile. In our graph, the nodes represent profiles. Try using this to filter based on user properties

In [None]:
# Load node data. List of columns provided by README
col_names = ["user_id","public","completion_percentage","gender","region","last_login","registration","AGE","body","I_am_working_in_field","spoken_languages","hobbies","I_most_enjoy_good_food","pets","body_type","my_eyesight","eye_color","hair_color","hair_type","completed_level_of_education","favourite_color","relation_to_smoking","relation_to_alcohol","sign_in_zodiac","on_pokec_i_am_looking_for","love_is_for_me","relation_to_casual_sex","my_partner_should_be","marital_status","children","relation_to_children","I_like_movies","I_like_watching_movie","I_like_music","I_mostly_like_listening_to_music","the_idea_of_good_evening","I_like_specialties_from_kitchen","fun","I_am_going_to_concerts","my_active_sports","my_passive_sports","profession","I_like_books","life_style","music","cars","politics","relationships","art_culture","hobbies_interests","science_technologies","computers_internet","education","sport","movies","travelling","health","companies_brands","more"]
profiles_df = pd.read_csv(
    profiles_filepath,
    sep="\t",
    names=col_names,
    index_col=False
)
profiles_df.head()

In [None]:
# We're not yet using this data, so delete for now to save memory
del profiles_df

Load edge data, which is the `user_id` of the source and target node. An edge represents a friendship between users.

In [None]:
!head soc-pokec-relationships.txt

In [None]:
relationships_df = pd.read_csv(
    edgelist_filepath,
    sep="\t",
    names=["src", "dst"]
)
relationships_df.shape

## Create Graphs on GPU

The very first use of the GPU may take a second to load and initialize

In [None]:
%%time
nx.empty_graph(backend="cugraph")

but don't worry--using the GPU should be much quicker once it's warmed up!

In [None]:
%%time
nx.empty_graph(backend="cugraph")

### Use `nx.from_pandas_edgelist` to create a graph from edge data

If the `NX_CUGRAPH_AUTOCONFIG` env var was set at the beginning,
this will automatically call the nx-cugraph backend and return a graph on the GPU.

**Heads up: this runs for more than a minute with pure NetworkX**

In [None]:
%%time
G = nx.from_pandas_edgelist(
    relationships_df,
    source="src",
    target="dst",
    edge_attr=None,
    # create_using=nx.DiGraph,  # The original dataset is directed
    create_using=nx.Graph,  # Alternative that symmetrizes edges
)
type(G)

## Now let's run some common Graph algorithms and visualize the results.

Visualizing summary statistics and metrics is a simple way to begin to understand a dataset.

We'll begin by plotting the histogram of common algorithms using `bokeh`. We will show two plots:
- Left plot is normal y scale
- Right plot is log y scale

In [None]:
from bokeh.plotting import figure, show
from bokeh.io import output_notebook
from bokeh.layouts import row

output_notebook()

In [None]:
def plot_hist(result, title=""):
  """Plot the histogram of results; the right plot is logscale y"""
  y, x = np.histogram(list(result.values()), bins=400)
  x = x[:-1]
  p = figure(width=400, height=400)
  p.scatter(x=x, y=y)
  p_log = figure(width=400, height=400, y_axis_type="log", title=f"{title} (log y)")
  p_log.scatter(x=x, y=y)
  show(row(p, p_log))

In [None]:
def plot_full(result, title=""):
  """Plot the values, sorted; the right plot is logscale y"""
  x = list(range(len(result)))
  y = sorted(result.values())
  p = figure(width=400, height=400, title=title)
  p.scatter(x=x, y=y)
  p_log = figure(width=400, height=400, y_axis_type="log", title=f"{title} (log y)")
  p_log.scatter(x=x, y=y)
  show(row(p, p_log))

In [None]:
%%time
nx.is_connected(G)

In [None]:
# Even this simple measure may take a second or two with pure NetworkX
%%time
dc = nx.degree_centrality(G)

In [None]:
plot_hist(dc, "degree centrality")

In [None]:
# May take a couple of minutes with pure NetworkX
%%time
t = nx.triangles(G)

In [None]:
plot_hist(t, "triangles")

In [None]:
# May take a few minutes with pure NetworkX
%%time
pr = nx.pagerank(G)

In [None]:
plot_hist(pr, "pagerank")

In [None]:
# May take a few minutes with pure NetworkX
%%time
cn = nx.core_number(G)

In [None]:
plot_hist(cn, "core_number")

In [None]:
# Very slow with pure NetworkX; perhaps try with a smaller k
%%time
bc = nx.betweenness_centrality(G, k=50)

In [None]:
plot_hist(bc, "betweenness centrality")

## Backend-only Functions
Besides improved performance, another benefit backends provide is the ability to add functionality to NetworkX that is not present in the default implementation.

NetworkX 3.5 adds the `leiden_communities` function, but still does not provide an implementation. This allows backends to implement Leiden community detection using a common function signature, so when other backends or even NetworkX provide an implemtation, users can use those without requiring code changes.

### Leiden community detection
Let's take a look at the communities in the Pokec social network dataset using `leiden_communities`

In [None]:
from networkx.algorithms.community import leiden_communities

In [None]:
%%time
leiden_res = leiden_communities(G, seed=42, backend="cugraph")

`leiden_communities` returns a list of sets, where each set contains the node IDs making up a community in the graph.

In [None]:
print(f"Total number of extracted communities: {len(leiden_res)}")

sizes = [len(s) for s in leiden_res]

print(f"Largest community: {max(sizes)}")
print(f"Smallest community: {min(sizes)}")

import statistics

print(f"Median community size: {statistics.median(sizes)}")

## What to explore next?

The [Facebook Network Analysis](https://networkx.org/nx-guides/content/exploratory_notebooks/facebook_notebook.html)
example in [nx-guides](https://networkx.org/nx-guides) goes much more in depth
and is a good tour of networkx analysis.

---

### Information on the Pocek Social Network dataset used in this notebook

**Authors:** Lubos Takac and Michal Zabovsky  
**Title:** SNAP Datasets, Stanford Large Network Dataset Collection  
**URL:** [http://snap.stanford.edu/data](http://snap.stanford.edu/data)  
**Date:** May 2012