# Deep Learning Toolkit for Splunk - Graph Algorithms with Rapids cuGraph

This notebook contains examples for Louvain graph algorithm available in Rapids

Note: By default every time you save this notebook the cells are exported into a python module which is then invoked by Splunk MLTK commands like <code> | fit ... | apply ... | summary </code>. Please read the Model Development Guide in the Deep Learning Toolkit app for more information.

## Stage 0 - import libraries
At stage 0 we define all imports necessary to run our subsequent code depending on various libraries.

In [1]:
# this definition exposes all python module imports that should be available in all subsequent commands
import json
import numpy as np
import pandas as pd
import cugraph
import cudf
# ...
# global constants
MODEL_DIRECTORY = "/srv/app/model/data/"

In [2]:
# THIS CELL IS NOT EXPORTED - free notebook cell for testing or development purposes
print("numpy version: " + np.__version__)
print("pandas version: " + pd.__version__)
print("cugraph version: " + cugraph.__version__)
print("cudf version: " + cudf.__version__)

# Print out GPU Name
cudf.utils.cudautils.cuda.detect()

numpy version: 1.18.1
pandas version: 0.25.3
cugraph version: 0.13.0+0.gac36e8c.dirty
cudf version: 0.13.0
Found 1 CUDA devices
id 0    b'Tesla V100-SXM2-16GB'                              [SUPPORTED]
                      compute capability: 7.0
                           pci device id: 30
                              pci bus id: 0
Summary:
	1/1 devices are supported


True

## Stage 1 - get a data sample from Splunk
In Splunk run a search to pipe a dataset into your notebook environment. Note: mode=stage is used in the | fit command to do this.

| inputlookup bitcoin_transactions.csv<br>
| rename user_id_from as src user_id_to as dest<br>
| fit MLTKContainer mode=stage algo=graph_algo_louvain from src dest into app:bitcoin_graph_louvain as graph<br>

After you run this search your data set sample is available as a csv inside the container to develop your model. The name is taken from the into keyword ("barebone_model" in the example above) or set to "default" if no into keyword is present. This step is intended to work with a subset of your data to create your custom model.

In [3]:
# this cell is not executed from MLTK and should only be used for staging data into the notebook environment
def stage(name):
    with open("data/"+name+".csv", 'r') as f:
        df = pd.read_csv(f)
    with open("data/"+name+".json", 'r') as f:
        param = json.load(f)
    return df, param

In [4]:
# THIS CELL IS NOT EXPORTED - free notebook cell for testing or development purposes
df, param = stage("bitcoin_graph_louvain")
print(df[0:1])
print(param)

   src  dest
0    2     2
{'options': {'params': {'mode': 'stage', 'algo': 'graph_algo_louvain'}, 'feature_variables': ['src', 'dest'], 'args': ['src', 'dest'], 'model_name': 'bitcoin_graph_louvain', 'output_name': 'graph', 'algo_name': 'MLTKContainer', 'mlspl_limits': {'handle_new_cat': 'default', 'max_distinct_cat_values': '1000', 'max_distinct_cat_values_for_classifiers': '1000', 'max_distinct_cat_values_for_scoring': '1000', 'max_fit_time': '6000', 'max_inputs': '1000000000', 'max_memory_usage_mb': '16000', 'max_model_size_mb': '1500', 'max_score_time': '6000', 'streaming_apply': 'false', 'use_sampling': 'true'}, 'kfold_cv': None}, 'feature_variables': ['src', 'dest']}


## Stage 2 - create and initialize a model

In [5]:
# initialize your model
# available inputs: data and parameters
# returns the model object which will be used as a reference to call fit, apply and summary subsequently
def init(df,param):
    model = {}
    return model

In [6]:
# THIS CELL IS NOT EXPORTED - free notebook cell for testing or development purposes
model = init(df,param)

## Stage 3 - fit the model

In [7]:
# train your model
# returns a fit info json object and may modify the model object
def fit(model,df,param):
    model = {}
    return model

In [10]:
# THIS CELL IS NOT EXPORTED - free notebook cell for testing or development purposes
fit(model,df,param)

{}

## Stage 4 - apply the model

In [11]:
src_dest_name = param['feature_variables']

In [12]:
# apply your model
# returns the calculated results
def apply(model,df,param):

    src_dest_name = param['feature_variables']
    dfg = df[src_dest_name]
    gdf = cudf.DataFrame(dfg)

    # create graph 
    G = cugraph.Graph()
    G.from_cudf_edgelist(gdf, source='src', destination='dest', renumber=True)
    max_iter = 100
    if 'max_iter' in param['options']['params']:
        max_iter = int(param['options']['params']['max_iter'])

    # cugraph Louvain Call
    dfr, mod = cugraph.louvain(G)
    dfr = dfr.to_pandas().rename(columns={"vertex": src_dest_name[0]})                   
    df = df.join(dfr.set_index(src_dest_name[0]), on=src_dest_name[0])
    df = df.rename(columns={"partition": src_dest_name[0]+"_partition"})   
    dfr = dfr.rename(columns={src_dest_name[0]: src_dest_name[1]})
    df = df.join(dfr.set_index(src_dest_name[1]), on=src_dest_name[1])
    df = df.rename(columns={"partition": src_dest_name[1]+"_partition"})   
    model['louvain_modularity'] = mod
    return df

In [15]:
# THIS CELL IS NOT EXPORTED - free notebook cell for testing or development purposes
%time result = apply(model,df,param)
result

CPU times: user 60.3 ms, sys: 7.9 ms, total: 68.2 ms
Wall time: 66.6 ms


Unnamed: 0,src,dest,src_partition,dest_partition
0,2,2,0,0
1,2,782477,0,0
2,620423,4571210,971,971
3,620423,3,971,971
4,3,782479,971,971
...,...,...,...,...
99995,985609,833006,816,782
99996,985609,909995,816,816
99997,985609,848626,816,924
99998,985609,942678,816,816


## Stage 5 - save the model

In [None]:
# save model to name in expected convention "<algo_name>_<model_name>"
def save(model,name):
    # with open(MODEL_DIRECTORY + name + ".json", 'w') as file:
    #    json.dump(model, file)
    return model

## Stage 6 - load the model

In [None]:
# load model from name in expected convention "<algo_name>_<model_name>"
def load(name):
    model = init(None,None)
    # with open(MODEL_DIRECTORY + name + ".json", 'r') as file:
    #    model = json.load(file)
    return model

## Stage 7 - provide a summary of the model

In [None]:
# return a model summary
def summary(model=None):
    returns = {"version": {"pandas": pd.__version__, "cudf": cudf.__version__, "cugraph": cugraph.__version__} }
    return returns

## End of Stages
All subsequent cells are not tagged and can be used for further freeform code