## Jaccard Similarity
Jaccard similarity is defined between two sets as the ratio of the volume of their intersection divided by the volume of their union. In the context of graphs, the neighborhood of a vertex is seen as a set. The Jaccard similarity weight of each edge represents the strength of connection between vertices based on the relative similarity of their neighbors. For further detail see [Wikipedia](https://en.wikipedia.org/wiki/Jaccard_index)

To Compute the Jaccard similarity between each pair of vertices connected by an edge in cuGraph use: 
**nvJaccard(G)**
* G: A cugraph.Graph object

Returns:
jaccard_weights: cudf.DataFrame with three columns:
* jaccard_weights['source']: The source vertex id.
* jaccard_weights['destination']: The destination vertex id.
* jaccard_weights['jaccard_coeff']: The jaccard coefficient computed between the source and destination vertex.

In [1]:
# Import needed libraries
import cugraph
import cudf
import numpy as np
from scipy.io import mmread

In [2]:
# Read in the data file into scipy matrix format
mmFile='/datasets/networks/karate.mtx'
M = mmread(mmFile).asfptype().tolil()
M = M.tocsr()

In [3]:
# Load the structure of the graph into GPU memory and create a CuGraph
# graph object:
row_offsets = cudf.Series(M.indptr)
col_indices = cudf.Series(M.indices)
values = cudf.Series(M.data)
G = cugraph.Graph()
G.add_adj_list(row_offsets, col_indices, values)

In [8]:
# Call cugraph.nvJaccard 
df = cugraph.nvJaccard(G)

In [14]:
# Find the most similar pair of vertices
bestEdge = 0
for i in range(len(df)):
    if df['jaccard_coeff'][i] > df['jaccard_coeff'][bestEdge]:
        bestEdge = i
print("Vertices " + str(df['source'][bestEdge]) + " and " + str(df['destination'][bestEdge]) + " are most similar with score: " + str(df['jaccard_coeff'][bestEdge]))

Vertices 0 and 7 are most similar with score: 1.0384594e+34
