# Breadth First Search (BFS) 

As the name implies, BFS traverses the given graph in a breadth first manner. Starting at a specified vertex, the algorithms iteratively searches neighboring vertices.  

The algorithm returns, for each vertex:
* the BFS distance (i.e. the number of hops to reach that vertex from starting vertex) 
* the predecessor of the node (i.e. the vertex it was reached from during the search)

@see https://en.wikipedia.org/wiki/Breadth-first_search


To compute BFS in cuGraph use: __bfs(G, start_id)__

* G: A cugraph.Graph object
* start_id : the starting vertex ID

Returns:

* bsf: cudf.DataFrame with three names columns:
    * bfs["vertex"]:   The vertex id.
    * bfs["distance"]: The distance to the starting vertex
    * bfs["predecessor"]: The vertex ID of the vertex that was used to reach this vertex



___

### Test Data
We will be using the Zachary Karate club dataset 
*W. W. Zachary, An information flow model for conflict and fission in small groups, Journal of
Anthropological Research 33, 452-473 (1977).*


![Karate Club](../img/zachary_black_lines.png)

The test data is samll so that results can be visually verified

In [None]:
# First step is to import the needed libraries
import cugraph
import cudf
from collections import OrderedDict

In [None]:
# define a print path function
def print_path(df, id):
    
    # Use the BFS predecessors and distance to trace the path from vertex
    # id back to the starting vertex 1:
    dist = df['distance'][id]
    lastVert = id
    for i in range(dist):
        nextVert = df['predecessor'][lastVert]
        d = df['distance'][lastVert]
        print("Vertex: " + str(lastVert) + " was reached from vertex " + str(nextVert) + 
        " and distance to start is " + str(d) )
        lastVert = nextVert

# Read the data using cuDF

In [None]:
# Test file  - using the clasic Karate club dataset.  
datafile='../data/networks/karate-data.csv'

In [None]:
# Read the data file
cols = ["src", "dst"]

dtypes = OrderedDict([
        ("src", "int32"), 
        ("dst", "int32")
        ])

gdf = cudf.read_csv(datafile, names=cols, delimiter='\t', dtype=list(dtypes.values()) )


In [None]:
# Let's look at the DataFrame. There should be two columns and 154 records
gdf

In [None]:
# Look at the first few data records - the output should be two colums src and dst
gdf.head().to_pandas()

In [None]:
# create a Graph 
G = cugraph.Graph()
G.add_edge_list(gdf["src"], gdf["dst"])

In [None]:
# Call BFS on the graph starting from vertex 1
df = cugraph.bfs(G,1)

In [None]:
# Let's take a looks at the structure of the returned dataframe
df.dtypes

In [None]:
print_path(df, 22)

In [None]:
print_path(df, 30)

___
Copyright (c) 2019, NVIDIA CORPORATION.

Licensed under the Apache License, Version 2.0 (the "License");  you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
___