Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define API for MG random walk #2407

Merged
2 changes: 2 additions & 0 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -195,6 +195,8 @@ set(CUGRAPH_SOURCES
src/community/legacy/extract_subgraph_by_vertex.cu
src/community/legacy/egonet.cu
src/sampling/random_walks.cu
src/sampling/random_walks_sg.cu
src/sampling/random_walks_mg.cu
src/sampling/detail/sampling_utils_mg.cu
src/sampling/detail/sampling_utils_sg.cu
src/sampling/uniform_neighbor_sampling_mg.cpp
Expand Down
130 changes: 130 additions & 0 deletions cpp/include/cugraph/algorithms.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -1350,6 +1350,136 @@ random_walks(raft::handle_t const& handle,
bool use_padding = false,
std::unique_ptr<sampling_params_t> sampling_strategy = nullptr);

/**
* @brief returns uniform random walks from starting sources, where each path is of given
* maximum length.
*
* @p start_vertices can contain duplicates, in which case different random walks will
* be generated for each instance.
*
* If the graph is weighted, the return contains edge weights. If the graph is unweighted then
* the returned value will be std::nullopt.
*
* @tparam vertex_t Type of vertex identifiers. Needs to be an integral type.
* @tparam edge_t Type of edge identifiers. Needs to be an integral type.
* @tparam weight_t Type of edge weights. Needs to be a floating point type.
* @tparam multi_gpu Flag indicating whether template instantiation should target single-GPU (false)
* @param handle RAFT handle object to encapsulate resources (e.g. CUDA stream, communicator, and
* handles to various CUDA libraries) to run graph algorithms.
* @param graph_view graph view to operate on
* @param start_vertices Device span defining the starting vertices
* @param max_length maximum length of random walk
* @param seed (optional, defaults to system time), seed for random number generation
* @return tuple containing device vectors of vertices and the edge weights (if
* the graph is weighted)<br>
* For each input selector there will be (max_length+1) elements in the
* vertex vector with the starting vertex followed by the subsequent
* vertices in the random walk. If a path terminates before max_length,
* the vertices will be populated with invalid_vertex_id
* (-1 for signed vertex_t, std::numeric_limits<vertex_t>::max() for an
* unsigned vertex_t type)<br>
* For each input selector there will be max_length elements in the weights
* vector with the edge weight for the edge in the path. If a path
* terminates before max_length the subsequent edge weights will be
* set to weight_t{0}.
*/
// FIXME: Do I care about transposed or not? I want to be able to operate in either
// direction.
template <typename vertex_t, typename edge_t, typename weight_t, bool multi_gpu>
std::tuple<rmm::device_uvector<vertex_t>, std::optional<rmm::device_uvector<weight_t>>>
uniform_random_walks(raft::handle_t const& handle,
graph_view_t<vertex_t, edge_t, weight_t, false, multi_gpu> const& graph_view,
raft::device_span<vertex_t const> start_vertices,
size_t max_length,
uint64_t seed = std::numeric_limits<uint64_t>::max());

/**
* @brief returns biased random walks from starting sources, where each path is of given
* maximum length.
*
* The next vertex is biased based on the edge weights. The probability of traversing a
* departing edge will be the edge weight divided by the sum of the departing edge weights.
*
* @p start_vertices can contain duplicates, in which case different random walks will
* be generated for each instance.
*
* @throws cugraph::logic_error if the graph is unweighted
*
* @tparam vertex_t Type of vertex identifiers. Needs to be an integral type.
* @tparam edge_t Type of edge identifiers. Needs to be an integral type.
* @tparam weight_t Type of edge weights. Needs to be a floating point type.
* @tparam multi_gpu Flag indicating whether template instantiation should target single-GPU (false)
* @param handle RAFT handle object to encapsulate resources (e.g. CUDA stream, communicator, and
* handles to various CUDA libraries) to run graph algorithms.
* @param graph_view graph view to operate on
* @param start_vertices Device span defining the starting vertices
* @param max_length maximum length of random walk
* @param seed (optional, defaults to system time), seed for random number generation
* @return tuple containing device vectors of vertices and the edge weights<br>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of <br>?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting in doxygen output

* For each input selector there will be (max_length+1) elements in the
* vertex vector with the starting vertex followed by the subsequent
* vertices in the random walk. If a path terminates before max_length,
* the vertices will be populated with invalid_vertex_id
* (-1 for signed vertex_t, std::numeric_limits<vertex_t>::max() for an
* unsigned vertex_t type)<br>
* For each input selector there will be max_length elements in the weights
* vector with the edge weight for the edge in the path. If a path
* terminates before max_length the subsequent edge weights will be
* set to weight_t{0}.
*/
template <typename vertex_t, typename edge_t, typename weight_t, bool multi_gpu>
std::tuple<rmm::device_uvector<vertex_t>, std::optional<rmm::device_uvector<weight_t>>>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not accept unweighted graphs, and should the returned weight vector here be std::optional? Can this ever be std::nullopt?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was keeping the signature the same for consistency. But you are correct, this function would never return std::null opt.

biased_random_walks(raft::handle_t const& handle,
graph_view_t<vertex_t, edge_t, weight_t, false, multi_gpu> const& graph_view,
raft::device_span<vertex_t const> start_vertices,
size_t max_length,
uint64_t seed = std::numeric_limits<uint64_t>::max());

/**
* @brief returns biased random walks with node2vec biases from starting sources,
* where each path is of given maximum length.
*
* @p start_vertices can contain duplicates, in which case different random walks will
* be generated for each instance.
*
* If the graph is weighted, the return contains edge weights and the node2vec computation
* will utilize the edge weights. If the graph is unweighted then the return will not contain
* edge weights and the node2vec computation will assume an edge weight of 1 for all edges.
*
* @tparam vertex_t Type of vertex identifiers. Needs to be an integral type.
* @tparam edge_t Type of edge identifiers. Needs to be an integral type.
* @tparam weight_t Type of edge weights. Needs to be a floating point type.
* @tparam multi_gpu Flag indicating whether template instantiation should target single-GPU (false)
* @param handle RAFT handle object to encapsulate resources (e.g. CUDA stream, communicator, and
* handles to various CUDA libraries) to run graph algorithms.
* @param graph_view graph view to operate on
* @param start_vertices Device span defining the starting vertices
* @param max_length maximum length of random walk
* @param p node2vec return parameter
* @param q node2vec in-out parameter
* @param seed (optional, defaults to system time), seed for random number generation
* @return tuple containing device vectors of vertices and the edge weights<br>
seunghwak marked this conversation as resolved.
Show resolved Hide resolved
* For each input selector there will be (max_length+1) elements in the
* vertex vector with the starting vertex followed by the subsequent
* vertices in the random walk. If a path terminates before max_length,
* the vertices will be populated with invalid_vertex_id
* (-1 for signed vertex_t, std::numeric_limits<vertex_t>::max() for an
* unsigned vertex_t type)<br>
* For each input selector there will be max_length elements in the weights
* vector with the edge weight for the edge in the path. If a path
* terminates before max_length the subsequent edge weights will be
* set to weight_t{0}.
*/
template <typename vertex_t, typename edge_t, typename weight_t, bool multi_gpu>
std::tuple<rmm::device_uvector<vertex_t>, std::optional<rmm::device_uvector<weight_t>>>
node2vec_random_walks(raft::handle_t const& handle,
graph_view_t<vertex_t, edge_t, weight_t, false, multi_gpu> const& graph_view,
raft::device_span<vertex_t const> start_vertices,
size_t max_length,
weight_t p,
weight_t q,
uint64_t seed = std::numeric_limits<uint64_t>::max());

#ifndef NO_CUGRAPH_OPS
/**
* @brief generate sub-sampled graph as an adjacency list (CSR format) given input graph,
Expand Down
71 changes: 71 additions & 0 deletions cpp/include/cugraph_c/sampling_algorithms.h
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,78 @@ typedef struct {
int32_t align_;
} cugraph_random_walk_result_t;

/**
* @brief Compute uniform random walks
*
* @param [in] handle Handle for accessing resources
* @param [in] graph Pointer to graph. NOTE: Graph might be modified if the storage
* needs to be transposed
* @param [in] start_vertices Array of source vertices
* @param [in] max_length Maximum length of the generated path
* @param [in] result Output from the node2vec call
* @param [out] error Pointer to an error object storing details of any error. Will
* be populated if error code is not CUGRAPH_SUCCESS
* @return error code
*/
cugraph_error_code_t cugraph_uniform_random_walks(
const cugraph_resource_handle_t* handle,
cugraph_graph_t* graph,
const cugraph_type_erased_device_array_view_t* start_vertices,
size_t max_length,
cugraph_random_walk_result_t** result,
cugraph_error_t** error);

/**
* @brief Compute biased random walks
*
* @param [in] handle Handle for accessing resources
* @param [in] graph Pointer to graph. NOTE: Graph might be modified if the storage
* needs to be transposed
* @param [in] start_vertices Array of source vertices
* @param [in] max_length Maximum length of the generated path
* @param [in] result Output from the node2vec call
* @param [out] error Pointer to an error object storing details of any error. Will
* be populated if error code is not CUGRAPH_SUCCESS
* @return error code
*/
cugraph_error_code_t cugraph_biased_random_walks(
const cugraph_resource_handle_t* handle,
cugraph_graph_t* graph,
const cugraph_type_erased_device_array_view_t* start_vertices,
size_t max_length,
cugraph_random_walk_result_t** result,
cugraph_error_t** error);

/**
* @brief Compute random walks using the node2vec framework.
*
* @param [in] handle Handle for accessing resources
* @param [in] graph Pointer to graph. NOTE: Graph might be modified if the storage
* needs to be transposed
* @param [in] start_vertices Array of source vertices
* @param [in] max_length Maximum length of the generated path
* @param [in] compress_result If true, return the paths as a compressed sparse row matrix,
* otherwise return as a dense matrix
* @param [in] p The return parameter
* @param [in] q The in/out parameter
* @param [in] result Output from the node2vec call
* @param [out] error Pointer to an error object storing details of any error. Will
* be populated if error code is not CUGRAPH_SUCCESS
* @return error code
*/
cugraph_error_code_t cugraph_node2vec_random_walks(
const cugraph_resource_handle_t* handle,
cugraph_graph_t* graph,
const cugraph_type_erased_device_array_view_t* start_vertices,
size_t max_length,
double p,
double q,
cugraph_random_walk_result_t** result,
cugraph_error_t** error);

/**
* @brief Compute random walks using the node2vec framework.
* @deprecated This call should be replaced with cugraph_node2vec_random_walks
*
* @param [in] handle Handle for accessing resources
* @param [in] graph Pointer to graph. NOTE: Graph might be modified if the storage
Expand Down Expand Up @@ -94,6 +164,7 @@ cugraph_type_erased_device_array_view_t* cugraph_random_walk_result_get_weights(

/**
* @brief If the random walk result is compressed, get the path sizes
* @deprecated This call will no longer be relevant once the new node2vec are called
*
* @param [in] result The result from a random walk algorithm
* @return type erased array pointing to the path sizes in device memory
Expand Down
Loading