# Breadth First Search (BFS) 
In this notebook, we will use cuGraph to compute the Breadth First Search path from a starting vertex to every other vertex in our training dataset.

Notebook Credits
* Original Authors: Bradley Rees and James Wyles
* Feature available since 0.6
* Last Edit: 01/28/2020

RAPIDS Versions: 0.12.0    

Test Hardware

* GV100 32G, CUDA 10.0



## Introduction

As the name implies, BFS traverses the given graph in a breadth first manner. Starting at a specified vertex, the algorithms iteratively searches neighboring vertices.  


@see https://en.wikipedia.org/wiki/Breadth-first_search


To compute BFS in cuGraph use: __bfs(G, start)__

* __G__: A cugraph.Graph object
* __start_ : the starting vertex ID

Returns:

* __df__: cudf.DataFrame with three names columns:
    * df["vertex"]:   The vertex id.
    * df["distance"]: The distance to the starting vertex
    * df["predecessor"]: The vertex ID of the vertex that was used to reach this vertex


## cuGraph Notice 
The current version of cuGraph has some limitations:

* Vertex IDs need to be 32-bit integers.
* Vertex IDs are expected to be contiguous integers starting from 0.

cuGraph provides the renumber function to mitigate this problem. Input vertex IDs for the renumber function can be either 32-bit or 64-bit integers, can be non-contiguous, and can start from an arbitrary number. The renumber function maps the provided input vertex IDs to 32-bit contiguous integers starting from 0. cuGraph still requires the renumbered vertex IDs to be representable in 32-bit integers. These limitations are being addressed and will be fixed soon.

### Test Data
We will be using the Zachary Karate club dataset 
*W. W. Zachary, An information flow model for conflict and fission in small groups, Journal of
Anthropological Research 33, 452-473 (1977).*


![Karate Club](../img/zachary_black_lines.png)

Our test data is small so that results can be visually verified

In [None]:
# First step is to import the needed libraries
import cugraph
import cudf
from collections import OrderedDict

In [None]:
# define a print path function that take the dataframe and a vertex ID

def print_path(df, id):
    
    # Use the BFS predecessors and distance to trace the path 
    # from vertex id back to the starting vertex ( vertex 1 in this example)
    dist = df['distance'][id]
    lastVert = id
    for i in range(dist):
        nextVert = df['predecessor'][lastVert]
        d = df['distance'][lastVert]
        print("Vertex: " + str(lastVert) + " was reached from vertex " + str(nextVert) + 
        " and distance to start is " + str(d) )
        lastVert = nextVert

# Read the data using cuDF

In [None]:
# Read the data file
datafile='../data/karate-data.csv'

gdf = cudf.read_csv(datafile, names=["src", "dst"], delimiter='\t', dtype=["int32", "int32"] )

In [None]:
# Look at the first few data records - the output should be two colums src and dst
gdf.head()

As you can see in the output, the starting vertex ID is 1.  For the BFS algorithm that is okay.   
cuGraph will add an isolated vertex with an ID of zero.  It will not be reachable from the test dataset  

In [None]:
# create a Graph 
G = cugraph.Graph()
G.from_cudf_edgelist(gdf, source='src', destination='dst', renumber=False)

In [None]:
# Call BFS on the graph starting from vertex 1
df = cugraph.bfs(G,1)

In [None]:
# Let's take a looks at the structure of the returned dataframe
df.dtypes

In [None]:
print_path(df, 22)

In [None]:
print_path(df, 30)

### Picking a starting vertex
Since we can see in the graph illustraion above that vertex 17 is at the edge of the graph, let's run BFS with that as the startring vertex

In [None]:
# Call BFS on the graph starting from vertex 17
df2 = cugraph.bfs(G,17)

In [None]:
# Print the max distance
df2["distance"].max()

Notice that max returned an unexpected value.  That is becouse the isoluated vertex, 0, is unreachable.
Whenever a graph contains disjointed components, the distance to the unconnected vertices will always be max_int

In [None]:
df2["distance"][0]

In [None]:
# drop all large distances 
exp="distance < 100"
df3 = df2.query(exp)

In [None]:
# Print the max distance
df3["distance"].max()

In [None]:
# Print path to vertex 30
print_path(df2, 30)

___
Copyright (c) 2019-2020, NVIDIA CORPORATION.

Licensed under the Apache License, Version 2.0 (the "License");  you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
___