# Intro to cuGraph

cuGraph is a collection of graph analytics that process data found in cuDF Dataframes. cuGraph aims to provide a NetworkX-like API that will be familiar to data scientists, so they can now build GPU-accelerated workflows more easily.

### Test Data
We will be using the Zachary Karate club dataset 
*W. W. Zachary, An information flow model for conflict and fission in small groups, Journal of
Anthropological Research 33, 452-473 (1977).*


![Karate Club](img/zachary_black_lines.png)

## cuGraph 0.8 Notes
cuGraph version 0.8 has some limitations:
* Only Int32 Vertex ID are supported
* Only float (FP32) edge data is supported
* Vertex numbering is assumed to start at zero

These limitations are being addressed and will be fixed future versions.  
These example notebooks will illustrate how to manipulate the data so that it comforms to the current limitations

In [None]:
# Import needed libraries
import cugraph
import cudf
import networkx
from collections import OrderedDict

In [None]:
# Define the path to the test data  
datafile='../../../data/karate-data.csv'

---
# NetworkX

The data file contains an edge list, which represents the connection of a vertex to another.  The `source` to `destination` pairs is in what is known as Coordinate Format (COO).  In this test case, the data is just two columns.  However a third, `weight`, column is also possible

In [None]:
file = open(datafile, 'rb')

In [None]:
# Read the data, this also created a NetworkX Graph 
Gnx = networkx.read_edgelist(file)

# cuGraph

### Read in the data - GPU
cuGraph depends on cuDF for data loading and the initial Dataframe creation

In [None]:
# Read the data file
cols = ["src", "dst"]

dtypes = OrderedDict([
        ("src", "int32"), 
        ("dst", "int32")
        ])

gdf = cudf.read_csv(datafile, names=cols, delimiter='\t', dtype=list(dtypes.values()) )

### Create a Graph 

In [None]:
# create a Graph using the source (src) and destination (dst) vertex pairs from the Dataframe 
G = cugraph.Graph()
G.add_edge_list(gdf["src"], gdf["dst"])

## Basic operations

We can see the total number of edges and vertices in G.

In [None]:
G.number_of_edges()

In [None]:
G.number_of_vertices()

Most methods on `cugraph.Graph` return cudf DataFrame objects.

`in_degree` and `out_degree` are the number of edges going in and out of a given vertex, respectively. `degree` is the `in_degree` plus the `out_degree`.

In [None]:
print(G.in_degree())

In [None]:
print(G.out_degree())

In [None]:
print(G.degree())

### **Exercise**

Show that the `in_degree` plus the `out_degree` is indeed the same as the `degree`. 

<details><summary><b>Solution</b></summary>
   <pre>
    <br>(G.in_degree() + G.out_degree())['degree'].equals(G.degree()['degree'])
   </pre>
</details>