# GFQL CPU, GPU Benchmark

This notebook examines GFQL property graph query performance on 1-8 hop queries using CPU + GPU modes on various real-world 100K - 100M edge graphs. The data comes from a variety of popular social networks. The single-threaded CPU mode benefits from GFQL's novel dataframe engine, and the GPU mode further adds single-GPU acceleration. Both the `chain()` and `hop()` methods are examined.

The benchmark does not examine bigger-than-memory and distributed scenarios. The provided results here are from running on a free Google Colab T4 runtime, with a 2.2GHz Intel CPU (12 GB CPU RAM) and T4 Nvidia GPU (16 GB GPU RAM).

## Data
From [SNAP](https://snap.stanford.edu/data/)

| Network | Nodes     | Edges        |
|-------------|-----------|--------------|
| [**Facebook**](#fb)| 4,039     | 88,234       |
| [**Twitter**](#tw) | 81,306    | 2,420,766    |
| [**GPlus**](#gpl)   | 107,614   | 30,494,866   |
| [**Orkut**](#ork)   | 3,072,441 | 117,185,082  |

## Results

Definitions:

* GTEPS: Giga (billion) edges traversed per second

* T edges / \$: Estimated trillion edges traversed for 1\$ USD based on observed GTEPS and a 3yr AWS reservation (as of 12/2023)

Tasks:

1. `chain()` - includes complex pre/post processing

  **Task**: `g.chain([n({'id': some_id}), e_forward(hops=some_n)])`


| **Dataset** | Max GPU Speedup      | CPU GTEPS   | GPU GTEPS   | T CPU edges / \$ (t3.l) | T GPU edges / \$ (g4dn.xl) |
|-------------|--------------|-------------|-------------|----------------------------|--------------------------------|
| [**Facebook**](#fb)| 1.1X  | 0.66 | 0.61 | 65.7                | 10.4                    |
| [**Twitter**](#tw) | 17.4X   | 0.17 | 2.81 | 16.7                | 48.1                    |
| [**GPlus**](#gpl)   | 43.8X  | 0.09 | 2.87 | 8.5                | 49.2                    |
| [**Orkut**](#ork)   | N/A            | N/A         | 12.15 | N/A                        | 208.3                    |
| **AVG** | 20.7X | 0.30 | 4.61 | 30.3 | 79.0
| **MAX** | 43.8X | 0.66 | 12.15 | 65.7 | 208.3


2. `hop()` - core property search primitive similar to BFS

  **Task**: `g.hop(nodes=[some_id], direction='forward', hops=some_n)`


| **Dataset** | Max GPU Speedup | CPU GTEPS | GPU GTEPS | T CPU edges / \$ (t3.l) | T GPU edges / \$ (g4dn.xl) |
|-------------|-------------|-----------|-----------|--------------------|--------------------------------|
| [**Facebook**](#fb)| 3X          | 0.47      | 1.47     | 47.0        | 25.2                    |
| [**Twitter**](#tw) | 42X         | 0.50      | 10.51      | 50.2        | 180.2                    |
| [**GPlus**](#gpl)   | 21X         | 0.26      | 4.11       | 26.2        | 70.4                    |
| [**Orkut**](#ork)   | N/A         | N/A       | 41.50     | N/A                | 711.4                    |
| **AVG** | 22X | 0.41 | 14.4 | 41.1 | 246.8
| **MAX** | 42X | 0.50 | 41.50 | 50.2 | 711.4


## Optional: GPU setup - Google Colab

In [1]:
# Report GPU used when GPU benchmarking
! nvidia-smi

Tue Jul  9 13:29:05 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   41C    P8              10W /  70W |      0MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

In [2]:
import cudf
cudf.__version__

'24.04.01'

# 1. Install & configure

In [3]:
#! pip install graphistry[igraph]

!pip install -q igraph
!pip install -q graphistry


[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.1/3.1 MB[0m [31m13.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m250.5/250.5 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m332.3/332.3 kB[0m [31m9.0 MB/s[0m eta [36m0:00:00[0m
[?25h

## Imports

In [4]:
import pandas as pd
import numpy as np
import graphistry, time

from graphistry import (

    # graph operators
    n, e_undirected, e_forward, e_reverse,

    # attribute predicates
    is_in, ge, startswith, contains, match as match_re
)
graphistry.__version__

'0.33.9'

In [5]:
#work around google colab shell encoding bugs

import locale
locale.getpreferredencoding = lambda: "UTF-8"

# 2. Perf benchmarks

<a name="fb"></a>
### Facebook: 88K edges

In [6]:
df = pd.read_csv('https://raw.githubusercontent.com/graphistry/pygraphistry/master/demos/data/facebook_combined.txt', sep=' ', names=['s', 'd'])
print(df.shape)
df.head(5)

(88234, 2)


Unnamed: 0,s,d
0,0,1
1,0,2
2,0,3
3,0,4
4,0,5


In [8]:
fg = graphistry.edges(df, 's', 'd').materialize_nodes()
print(fg._nodes.shape, fg._edges.shape)
fg._nodes.head(5)

(4039, 1) (88234, 2)


Unnamed: 0,id
0,0
1,1
2,2
3,3
4,4


with 2 and 5 hop `chain` comparison we see a slight/negligable speedup enabled by setting g. to `cudf`

In [None]:
results_df = pd.DataFrame(columns=['hops', 'CPU n_notation time (s)', 'GPU n_notation time (s)', 'n_notation speedup',
                                   'CPU source_node_match time (s)', 'GPU source_node_match time (s)', 'source_node_match speedup'])


for n_hop in [2,5]:
    start0 = time.time()
    for i in range(100):
        fg2 = fg.chain([n({'id': 0}), e_forward(hops=n_hop)])  # using n notation
    mid0 = time.time()
    for i in range(100):
        fg2 = fg.chain([e_forward(source_node_match={'id': 0}, hops=n_hop)])  # using source_node_match in e_forward
    end0 = time.time()
    T0 = mid0-start0
    T1 = end0-mid0
    fg_gdf = fg.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))
    start1 = time.time()
    for i in range(100):
        fg2 = fg_gdf.chain([n({'id': 0}), e_forward(hops=n_hop)])
    mid1 = time.time()
    for i in range(100):
        fg2 = fg_gdf.chain([e_forward(source_node_match={'id': 0}, hops=n_hop)])
    end1 = time.time()

    del fg_gdf
    del fg2
    T2 = mid1-start1
    T3 = end1-mid1

    new_row = pd.DataFrame({
        'hops': [n_hop],
        'CPU n_notation time (s)': [np.round(T0, 4)],
        'GPU n_notation time (s)': [np.round(T2, 4)],
        'n_notation speedup': [np.round(T0 / T2, 4)],
        'CPU source_node_match time (s)': [np.round(T1, 4)],
        'GPU source_node_match time (s)': [np.round(T3, 4)],
        'source_node_match speedup': [np.round(T1 / T3, 4)],
    })

    results_df = pd.concat([results_df, new_row], ignore_index=True)

(results_df)

In [10]:
results_df.T

Unnamed: 0,0,1
hops,2.0,5.0
CPU n_notation time (s),11.8076,25.4098
GPU n_notation time (s),10.3238,14.4829
n_notation speedup,1.1437,1.7545
CPU source_node_match time (s),12.0969,10.2662
GPU source_node_match time (s),11.2681,11.199
source_node_match speedup,1.0736,0.9167


and with simple 2 and 5 hop `hop` comparison we see a 2x speedup enabled by setting g. to `cudf`

In [None]:
results_df = pd.DataFrame(columns=['hops', 'CPU hop time (s)', 'GPU hop time (s)', 'n_notation speedup'])



for n_hop in [2,5]:
    start_nodes = pd.DataFrame({fg._node: [0]})
    start0 = time.time()
    for i in range(100):
        fg2 = fg.hop(
            nodes=start_nodes,
            direction='forward',
            hops=n_hop)
    end0 = time.time()
    T0 = end0-start0
    start_nodes = cudf.DataFrame({fg._node: [0]})
    fg_gdf = fg.nodes(cudf.from_pandas(fg._nodes)).edges(cudf.from_pandas(fg._edges))
    start1 = time.time()
    for i in range(100):
        fg2 = fg_gdf.hop(
            nodes=start_nodes,
            direction='forward',
            hops=n_hop)
    end1 = time.time()

    del fg_gdf
    del fg2
    T1 = end1-start1

    new_row = pd.DataFrame({
        'hops': [n_hop],
        'CPU hop time (s)': [np.round(T0, 4)],
        'GPU hop time (s)': [np.round(T1, 4)],
        'n_notation speedup': [np.round(T0 / T1, 4)]
    })

    results_df = pd.concat([results_df, new_row], ignore_index=True)

# print(results_df)

In [13]:
results_df.T

Unnamed: 0,0,1
hops,2.0,5.0
CPU hop time (s),5.8614,10.1756
GPU hop time (s),2.3729,5.4458
n_notation speedup,2.4701,1.8685


<a name="tw"></a>
## Twitter

- edges: 2420766
- nodes: 81306

In [15]:
! wget 'https://snap.stanford.edu/data/twitter_combined.txt.gz'
#! curl -L 'https://snap.stanford.edu/data/twitter_combined.txt.gz' -o twitter_combined.txt.gz

--2024-07-09 13:36:53--  https://snap.stanford.edu/data/twitter_combined.txt.gz
Resolving snap.stanford.edu (snap.stanford.edu)... 171.64.75.80
Connecting to snap.stanford.edu (snap.stanford.edu)|171.64.75.80|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10621918 (10M) [application/x-gzip]
Saving to: ‘twitter_combined.txt.gz’


2024-07-09 13:36:54 (19.6 MB/s) - ‘twitter_combined.txt.gz’ saved [10621918/10621918]



In [16]:
! gunzip twitter_combined.txt.gz

In [17]:
! head -n 5 twitter_combined.txt

214328887 34428380
17116707 28465635
380580781 18996905
221036078 153460275
107830991 17868918


In [18]:
te_df = pd.read_csv('twitter_combined.txt', sep=' ', names=['s', 'd'])
te_df.shape

(2420766, 2)

In [19]:
import graphistry

In [20]:
g = graphistry.edges(te_df, 's', 'd').materialize_nodes()
g._nodes.shape

(81306, 1)

on the twitter data, simpler `chain` operations over several different hops -- **10-20x** *italicized text* speed increases

In [21]:
results_df = pd.DataFrame(columns=['hops', 'CPU hop chain time (s)', 'GPU hop chain time (s)', 'n_notation speedup'])


for n_hop in [1,2,8]:
    start_nodes = pd.DataFrame({fg._node: [0]})
    start0 = time.time()
    for i in range(10):
        g2 = g.chain([n({'id': 17116707}), e_forward(hops=n_hop)])
    end0 = time.time()
    T0 = end0-start0
    g_gdf = g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))
    start1 = time.time()
    for i in range(10):
        out = g_gdf.chain([n({'id': 17116707}), e_forward(hops=n_hop)])._nodes
    end1 = time.time()

    del g_gdf
    del out
    T1 = end1-start1

    new_row = pd.DataFrame({
        'hops': [n_hop],
        'CPU hop chain time (s)': [np.round(T0, 4)],
        'GPU hop chain time (s)': [np.round(T1, 4)],
        'n_notation speedup': [np.round(T0 / T1, 4)]
    })


    results_df = pd.concat([results_df, new_row], ignore_index=True)

results_df.T

Unnamed: 0,0,1,2
hops,1.0,2.0,8.0
CPU hop chain time (s),19.3802,17.21,84.5977
GPU hop chain time (s),0.7395,1.5332,4.4011
n_notation speedup,26.2058,11.2246,19.2218


and similarly for these `hop` operations -- **10-30x** speed increases

<a name="gpl"></a>
## GPlus

- edges: 30494866
- nodes: 107614

In [22]:
results_df = pd.DataFrame(columns=['hops', 'CPU hop chain time (s)', 'GPU hop chain time (s)', 'n_notation speedup'])


for n_hop in [1,2,8]:
    start_nodes = pd.DataFrame({g._node: [17116707]})
    start0 = time.time()
    for i in range(10):
      g2 = g.hop(
          nodes=start_nodes,
          direction='forward',
          hops=n_hop)
    end0 = time.time()
    T0 = end0-start0
    start_nodes = cudf.DataFrame({g._node: [17116707]})
    g_gdf = g.nodes(cudf.from_pandas(g._nodes)).edges(cudf.from_pandas(g._edges))
    start1 = time.time()
    for i in range(10):
        g2 = g_gdf.hop(
            nodes=start_nodes,
            direction='forward',
            hops=5)
    end1 = time.time()

    del start_nodes
    del g_gdf
    del g2
    T1 = end1-start1

    new_row = pd.DataFrame({
        'hops': [n_hop],
        'CPU hop chain time (s)': [np.round(T0, 4)],
        'GPU hop chain time (s)': [np.round(T1, 4)],
        'n_notation speedup': [np.round(T0 / T1, 4)]
    })

    results_df = pd.concat([results_df, new_row], ignore_index=True)

(results_df.T)

Unnamed: 0,0,1,2
hops,1.0,2.0,8.0
CPU hop chain time (s),18.8525,12.5991,43.39
GPU hop chain time (s),1.0538,1.0413,1.4334
n_notation speedup,17.8901,12.0998,30.2698


In [23]:
! wget https://snap.stanford.edu/data/gplus_combined.txt.gz

--2024-07-09 13:41:10--  https://snap.stanford.edu/data/gplus_combined.txt.gz
Resolving snap.stanford.edu (snap.stanford.edu)... 171.64.75.80
Connecting to snap.stanford.edu (snap.stanford.edu)|171.64.75.80|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 398930514 (380M) [application/x-gzip]
Saving to: ‘gplus_combined.txt.gz’


2024-07-09 13:41:19 (39.7 MB/s) - ‘gplus_combined.txt.gz’ saved [398930514/398930514]



In [24]:
! gunzip gplus_combined.txt.gz

In [25]:
%%time
ge_df = pd.read_csv('gplus_combined.txt', sep=' ', names=['s', 'd'])
print(ge_df.shape)
ge_df.head(5)

(30494866, 2)
CPU times: user 16.3 s, sys: 1.31 s, total: 17.6 s
Wall time: 19.2 s


Unnamed: 0,s,d
0,116374117927631468606,101765416973555767821
1,112188647432305746617,107727150903234299458
2,116719211656774388392,100432456209427807893
3,117421021456205115327,101096322838605097368
4,116407635616074189669,113556266482860931616


In [26]:
%%time
gg = graphistry.edges(ge_df, 's', 'd').materialize_nodes()
gg = graphistry.edges(ge_df, 's', 'd').nodes(gg._nodes, 'id')
print(gg._edges.shape, gg._nodes.shape)
gg._nodes.head(5)

(30494866, 2) (107614, 1)
CPU times: user 5.14 s, sys: 1.08 s, total: 6.22 s
Wall time: 6.27 s


Unnamed: 0,id
0,116374117927631468606
1,112188647432305746617
2,116719211656774388392
3,117421021456205115327
4,116407635616074189669


In [27]:
%%time
gg.chain([ n({'id': '116374117927631468606'})])._nodes

CPU times: user 676 ms, sys: 400 ms, total: 1.08 s
Wall time: 1.11 s


Unnamed: 0,id
0,116374117927631468606


on the GPlus data, simpler `chain` operations over several different hops -- **100-200x** speed increases

In [28]:
results_df = pd.DataFrame(columns=['hops', 'CPU hop chain time (s)', 'GPU hop chain time (s)', 'n_notation speedup'])


for n_hop in [1,2,3,4,5]:
    start_nodes = pd.DataFrame({fg._node: [0]})
    start0 = time.time()
    out = gg.chain([ n({'id': '116374117927631468606'}), e_forward(hops=n_hop)])._nodes
    end0 = time.time()
    T0 = end0-start0
    gg_gdf = gg.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))
    start1 = time.time()
    out = gg_gdf.chain([ n({'id': '116374117927631468606'}), e_forward(hops=n_hop)])
    end1 = time.time()

    del gg_gdf
    del out
    T1 = end1-start1
    # print('\nCPU',n_hop,'hop chain time:',np.round(T0,4),'\nGPU',n_hop,'hop chain time:',np.round(T1,4),'\nspeedup:', np.round(T0/T1,4))

    new_row = pd.DataFrame({
        'hops': [n_hop],
        'CPU hop chain time (s)': [np.round(T0, 4)],
        'GPU hop chain time (s)': [np.round(T1, 4)],
        'n_notation speedup': [np.round(T0 / T1, 4)]
    })

    results_df = pd.concat([results_df, new_row], ignore_index=True)

(results_df.T)

Unnamed: 0,0,1,2,3,4
hops,1.0,2.0,3.0,4.0,5.0
CPU hop chain time (s),33.7597,50.877,228.473,291.1332,327.8891
GPU hop chain time (s),0.3082,0.6515,2.9645,4.1146,4.7598
n_notation speedup,109.5356,78.0912,77.0694,70.7561,68.8877


and similarly for these hop operations -- **100x** speed increases

In [29]:
results_df = pd.DataFrame(columns=['hops', 'CPU hop chain time (s)', 'GPU hop chain time (s)', 'n_notation speedup'])


for n_hop in [1,2,3,4,5]:
    start_nodes = pd.DataFrame({gg._node: ['116374117927631468606']})
    start0 = time.time()
    for i in range(1):
      g2 = gg.hop(
          nodes=start_nodes,
          direction='forward',
          hops=n_hop)
    end0 = time.time()
    T0 = end0-start0
    start_nodes = cudf.DataFrame({gg._node: ['116374117927631468606']})
    gg_gdf = gg.nodes(cudf.from_pandas(gg._nodes)).edges(cudf.from_pandas(gg._edges))
    start1 = time.time()
    for i in range(1):
      g2 = gg_gdf.hop(
          nodes=start_nodes,
          direction='forward',
          hops=n_hop)
    end1 = time.time()

    del start_nodes
    del gg_gdf
    del g2
    T1 = end1-start1

    new_row = pd.DataFrame({
        'hops': [n_hop],
        'CPU hop chain time (s)': [np.round(T0, 4)],
        'GPU hop chain time (s)': [np.round(T1, 4)],
        'n_notation speedup': [np.round(T0 / T1, 4)]
    })

    results_df = pd.concat([results_df, new_row], ignore_index=True)

(results_df.T)

Unnamed: 0,0,1,2,3,4
hops,1.0,2.0,3.0,4.0,5.0
CPU hop chain time (s),19.6594,33.2538,64.8384,98.9693,147.4526
GPU hop chain time (s),0.116,0.2583,0.8252,1.3544,1.9375
n_notation speedup,169.4189,128.7532,78.5772,73.071,76.103
