diff --git a/README.md b/README.md index 94900568f28..c7d0456cf17 100644 --- a/README.md +++ b/README.md @@ -1,15 +1,89 @@ -#
 cuGraph - GPU Graph Analytics
+

+
+ cuGraph +

+ +
[![Build Status](https://gpuci.gpuopenanalytics.com/job/rapidsai/job/gpuci/job/cugraph/job/branches/job/cugraph-branch-pipeline/badge/icon)](https://gpuci.gpuopenanalytics.com/job/rapidsai/job/gpuci/job/cugraph/job/branches/job/cugraph-branch-pipeline/) + + License +GitHub tag (latest by date) + + + +Conda +GitHub last commit + +Conda + +RAPIDS + +
+ +
+ +[RAPIDS](https://rapids.ai) cuGraph is a monorepo that represents a collection of packages focused on GPU-accelerated graph analytics, including support for property graphs, remote (graph as a service) operations, and graph neural networks (GNNs). cuGraph supports the creation and manipulation of graphs followed by the execution of scalable fast graph algorithms. + +
+ +[Getting cuGraph](./readme_pages/getting_cugraph.md) * +[Graph Algorithms](./readme_pages/algorithms.md) * +[Graph Service](./readme_pages/cugraph_service.md) * +[Property Graph](./readme_pages/property_graph.md) * +[GNN Support](./readme_pages/gnn_support.md) + +
+ +----- + +## Table of content +- Getting packages + - [Getting cuGraph Packages](./readme_pages/getting_cugraph.md) + - [Contributing to cuGraph](./readme_pages/CONTRIBUTING.md) +- General + - [Latest News](./readme_pages/news.md) + - [Current list of algorithms](./readme_pages/algorithms.md) + - [Blogs and Presentation](./docs/cugraph/source/basics/cugraph_blogs.rst) + - [Performance](./readme_pages/performance/performance.md) +- Packages + - [cuGraph Python](./readme_pages/cugraph_python.md) + - [Property Graph](./readme_pages/property_graph.md) + - [External Data Types](./readme_pages/data_types.md) + - [pylibcugraph](./readme_pages/pylibcugraph.md) + - [libcugraph (C/C++/CUDA)](./readme_pages/libcugraph.md) + - [cugraph-service](./readme_pages/cugraph_service.md) + - [cugraph-dgl](./readme_pages/cugraph_dgl.md) + - [cugraph-ops](./readme_pages/cugraph_ops.md) +- API Docs + - Python + - [Python Nightly](https://docs.rapids.ai/api/cugraph/nightly/) + - [Python Stable](https://docs.rapids.ai/api/cugraph/stable/) + - C++ + - [C++ Nightly](https://docs.rapids.ai/api/libcugraph/nightly/) + - [C++ Stable](https://docs.rapids.ai/api/libcugraph/stable/) +- References + - [RAPIDS](https://rapids.ai/) + - [ARROW](https://arrow.apache.org/) + - [DASK](https://www.dask.org/) + +

+ +----- + +Stack + +--- -The [RAPIDS](https://rapids.ai) cuGraph library is a collection of GPU accelerated graph algorithms that process data found in [GPU DataFrames](https://github.com/rapidsai/cudf). The vision of cuGraph is _to make graph analysis ubiquitous to the point that users just think in terms of analysis and not technologies or frameworks_. To realize that vision, cuGraph operates, at the Python layer, on GPU DataFrames, thereby allowing for seamless passing of data between ETL tasks in [cuDF](https://github.com/rapidsai/cudf) and machine learning tasks in [cuML](https://github.com/rapidsai/cuml). Data scientists familiar with Python will quickly pick up how cuGraph integrates with the Pandas-like API of cuDF. Likewise, users familiar with NetworkX will quickly recognize the NetworkX-like API provided in cuGraph, with the goal to allow existing code to be ported with minimal effort into RAPIDS. -While the high-level cugraph python API provides an easy-to-use and familiar interface for data scientists that's consistent with other RAPIDS libraries in their workflow, some use cases require access to lower-level graph theory concepts. For these users, we provide an additional Python API called pylibcugraph, intended for applications that require a tighter integration with cuGraph at the Python layer with fewer dependencies. Users familiar with C/C++/CUDA and graph structures can access libcugraph and libcugraph_c for low level integration outside of python. +[RAPIDS](https://rapids.ai) cuGraph is a collection of GPU-accelerated graph algorithms and services. At the Python layer, cuGraph operates on [GPU DataFrames](https://github.com/rapidsai/cudf), thereby allowing for seamless passing of data between ETL tasks in [cuDF](https://github.com/rapidsai/cudf) and machine learning tasks in [cuML](https://github.com/rapidsai/cuml). Data scientists familiar with Python will quickly pick up how cuGraph integrates with the Pandas-like API of cuDF. Likewise, users familiar with NetworkX will quickly recognize the NetworkX-like API provided in cuGraph, with the goal to allow existing code to be ported with minimal effort into RAPIDS. To similfy integration, cuGraph also support data found in [Pandas DataFrame](https://pandas.pydata.org/), [NetworkX Graph Objects](https://networkx.org/) and several other formats. - For more project details, see [rapids.ai](https://rapids.ai/). +While the high-level cugraph python API provides an easy-to-use and familiar interface for data scientists that's consistent with other RAPIDS libraries in their workflow, some use cases require access to lower-level graph theory concepts. For these users, we provide an additional Python API called pylibcugraph, intended for applications that require a tighter integration with cuGraph at the Python layer with fewer dependencies. Users familiar with C/C++/CUDA and graph structures can access libcugraph and libcugraph_c for low level integration outside of python. **NOTE:** For the latest stable [README.md](https://github.com/rapidsai/cugraph/blob/main/README.md) ensure you are on the latest branch. + + As an example, the following Python snippet loads graph data and computes PageRank: ```python @@ -32,179 +106,11 @@ df_page.sort_values('pagerank', ascending=False).head(10) ``` -## Getting cuGraph -There are 3 ways to get cuGraph : -1. [Quick start with Docker Repo](#quick) -2. [Conda Installation](#conda) -3. [Build from Source](#source) -

- ---- -# cuGraph News - -### Scaling to 1 Trillion Edges -At GTC Spring '22 we presented results of running cuGraph on the [Selene](https://top500.org/system/179842/) supercomputer using 2,048 GPUs and processing a graph with `1.1 Trillion edges`. Synthetic data created with the RMAT generator found in cuGraph. - -
 
cuGraph Scaling
-

- -### cuGraph Software Stack -cuGraph has a new multi-layer software stack that allows users and system integrators to access cuGraph at different layers. +
-
 
cuGraph Software Stack
-

- ---- -# Currently Supported Features -As of Release 22.06 - -

-## Supported Data Types -cuGraph supports graph creation with Source and Destination being expressed as: -* cuDF DataFrame -* Pandas DataFrame - -cuGraph supports execution of graph algorithms from different graph objects -* cuGraph Graph classes -* NetworkX graph classes -* CuPy sparse matrix -* SciPy sparse matrix - -cuGraph tries to match the return type based on the input type. So a NetworkX input will return the same data type that NetworkX would have. - -

- -## Supported Graph -| Type | Description | -| --------------- | --------------------------------------------------- | -| Graph | An undirected Graph by default | -| | directed=True yields a Directed Graph | -| Multigraph | A Graph with multiple edges between a vertex pair | -| | | - -ALL Algorithms support Graphs and MultiGraph (directed and undirected) - -## Supported Algorithms -_Italic_ algorithms are planned for future releases. - -| Category | Algorithm | Scale | Notes | -| ------------ | -------------------------------------- | ------------- | ------------------- | -| Centrality | | | | -| | Katz | Multi-GPU | | -| | Betweenness Centrality | Single-GPU | | -| | Edge Betweenness Centrality | Single-GPU | | -| | Eigenvector Centrality | Multi-GPU | | -| | Degree Centrality | Multi-GPU | Python only | -| Community | | | | -| | Leiden | Single-GPU | | -| | Louvain | Multi-GPU | | -| | Ensemble Clustering for Graphs | Single-GPU | | -| | Spectral-Clustering - Balanced Cut | Single-GPU | | -| | Spectral-Clustering - Modularity | Single-GPU | | -| | Subgraph Extraction | Single-GPU | | -| | Triangle Counting | Multi-GPU | | -| | K-Truss | Single-GPU | | -| Components | | | | -| | Weakly Connected Components |Multi-GPU | | -| | Strongly Connected Components | Single-GPU | | -| Core | | | | -| | K-Core | Single-GPU | | -| | Core Number | Single-GPU | | -| _Flow_ | | | | -| | _MaxFlow_ | --- | | -| _Influence_ | | | | -| | _Influence Maximization_ | --- | | -| Layout | | | | -| | Force Atlas 2 | Single-GPU | | -| Linear Assignment| | | | -| | Hungarian | Single-GPU | [README](cpp/src/linear_assignment/README-hungarian.md) | -| Link Analysis| | | | -| | Pagerank | Multi-GPU | [C++ README](cpp/src/centrality/README.md#Pagerank) | -| | Personal Pagerank | Multi-GPU | [C++ README](cpp/src/centrality/README.md#Personalized-Pagerank) | -| | HITS | Multi-GPU | | -| Link Prediction | | | | -| | Jaccard Similarity | Single-GPU | | -| | Weighted Jaccard Similarity | Single-GPU | | -| | Overlap Similarity | Single-GPU | | -| | Sorensen Coefficient | Single-GPU | Python only | -| | _Local Clustering Coefficient_ | --- | | -| Sampling | | | | -| | Random Walks (RW) | Single-GPU | Biased and Uniform | -| | Egonet | Single-GPU | multi-seed | -| | Node2Vec | Single-GPU | | -| | Neighborhood sampling | Multi-GPU | | -| Traversal | | | | -| | Breadth First Search (BFS) | Multi-GPU | with cutoff support
[C++ README](cpp/src/traversal/README.md#BFS) | -| | Single Source Shortest Path (SSSP) | Multi-GPU | [C++ README](cpp/src/traversal/README.md#SSSP) | -| | _ASSP / APSP_ | | | -| Tree | | | | -| | Minimum Spanning Tree | Single-GPU | | -| | Maximum Spanning Tree | Single-GPU | | -| Other | | | | -| | Renumbering | Multi-GPU | multiple columns, any data type | -| | Symmetrize | Multi-GPU | | -| | Path Extraction | | Extract paths from BFS/SSP results in parallel | -| Data Generator | | | | -| | RMAT | Multi-GPU | | -| | _Barabasi-Albert_ | --- | | -| | | - - - - -## cuGraph Notice - -Vertex IDs are expected to be contiguous integers starting from 0. If your data doesn't match that restriction, we have a solution. cuGraph provides the renumber function, which is by default automatically called when data is added to a graph. Input vertex IDs for the renumber function can be any type, can be non-contiguous, can be multiple columns, and can start from an arbitrary number. The renumber function maps the provided input vertex IDs to either 32- or 64-bit contiguous integers starting from 0. - -Additionally, when using the auto-renumbering feature, vertices are automatically un-renumbered in results. - -cuGraph is constantly being updated and improved. Please see the [Transition Guide](TRANSITIONGUIDE.md) if errors are encountered with newer versions - -## Graph Sizes and GPU Memory Size -The amount of memory required is dependent on the graph structure and the analytics being executed. As a simple rule of thumb, the amount of GPU memory should be about twice the size of the data size. That gives overhead for the CSV reader and other transform functions. There are ways around the rule but using smaller data chunks. - -| Size | Recommended GPU Memory | -|-------------------|------------------------| -| 500 million edges | 32 GB | -| 250 million edges | 16 GB | - -The use of managed memory for oversubscription can also be used to exceed the above memory limitations. See the recent blog on _Tackling Large Graphs with RAPIDS cuGraph and CUDA Unified Memory on GPUs_: https://medium.com/rapids-ai/tackling-large-graphs-with-rapids-cugraph-and-unified-virtual-memory-b5b69a065d4 - -

- ---- -## Quick Start -Please see the [Docker Repository](https://hub.docker.com/r/rapidsai/rapidsai/), choosing a tag based on the NVIDIA CUDA version you’re running. This provides a ready to run Docker container with example notebooks and data, showcasing how you can utilize all of the RAPIDS libraries: cuDF, cuML, and cuGraph. +[Why cuGraph does not support Method Cascading]() -## Conda -It is easy to install cuGraph using conda. You can get a minimal conda installation with [Miniconda](https://conda.io/miniconda.html) or get the full installation with [Anaconda](https://www.anaconda.com/download). - -Install and update cuGraph using the conda command: - -```bash -# CUDA 11.5 -conda install -c rapidsai -c numba -c conda-forge -c nvidia cugraph cudatoolkit=11.5 - -# CUDA 11.4 -conda install -c rapidsai -c numba -c conda-forge -c nvidia cugraph cudatoolkit=11.4 -``` - -For CUDA > 11.5, please use the 11.5 environment. - -Note: This conda installation only applies to Linux and Python versions 3.8/3.9. - - -## Build from Source and Contributing - -Please see our [guide for building cuGraph from source](SOURCEBUILD.md) - -Please see our [guide for contributing to cuGraph](CONTRIBUTING.md). - - - -## Documentation -Python API documentation can be generated from [docs](docs) directory. ------ # Projects that use cuGraph @@ -212,19 +118,26 @@ Python API documentation can be generated from [docs](docs) directory. (alphabetical order) * ArangoDB - a free and open-source native multi-model database system - https://www.arangodb.com/ * CuPy - "NumPy/SciPy-compatible Array Library for GPU-accelerated Computing with Python" - https://cupy.dev/ -* Memgraph - In-memory database - https://memgraph.com/ +* Memgraph - In-memory Graph database - https://memgraph.com/ * ScanPy - a scalable toolkit for analyzing single-cell gene expression data - https://scanpy.readthedocs.io/en/stable/ + +(please post an issue if you have a project to add to this list) ------ +
+ +##
Open GPU Data Science -##
Open GPU Data Science The RAPIDS suite of open source software libraries aims to enable execution of end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization but exposing that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces. -

+

+ +For more project details, see [rapids.ai](https://rapids.ai/). -### Apache Arrow on GPU +

+### Apache Arrow on GPU -The GPU version of [Apache Arrow](https://arrow.apache.org/) is a common API that enables efficient interchange of tabular data between processes running on the GPU. End-to-end computation on the GPU avoids unnecessary copying and converting of data off the GPU, reducing compute time and cost for high-performance analytics common in artificial intelligence workloads. As the name implies, cuDF uses the Apache Arrow columnar data format on the GPU. Currently, a subset of the features in Apache Arrow are supported. +The GPU version of [Apache Arrow](https://arrow.apache.org/) is a common API that enables efficient interchange of tabular data between processes running on the GPU. End-to-end computation on the GPU avoids unnecessary copying and converting of data off the GPU, reducing compute time and cost for high-performance analytics common in artificial intelligence workloads. As the name implies, cuDF uses the Apache Arrow columnar data format on the GPU. Currently, a subset of the features in Apache Arrow are supported. \ No newline at end of file diff --git a/docs/cugraph/source/basics/cugraph_blogs.rst b/docs/cugraph/source/basics/cugraph_blogs.rst index 2d3e751a4b6..368dbcce4f8 100644 --- a/docs/cugraph/source/basics/cugraph_blogs.rst +++ b/docs/cugraph/source/basics/cugraph_blogs.rst @@ -1,5 +1,5 @@ -cuGraph BLOGS and Presentations +cuGraph Blogs and Presentations ************************************************ The RAPIDS team blogs at https://medium.com/rapids-ai, and many of @@ -7,21 +7,17 @@ these blog posts provide deeper dives into features from cuGraph. Here, we've selected just a few that are of particular interest to cuGraph users: -BLOGS & Conferences +Blogs & Conferences ==================== -2018 -------- - * `GTC18 Fall - RAPIDS: Benchmarking Graph Analytics on the DGX-2 `_ - +2022 +------ + * `GTC: State of cuGraph (video & slides) `_ + * `GTC: Scaling and Validating Louvain in cuGraph against Massive Graphs (video & slides) `_ + * `KDD Tutorial on Accelerated GNN Training with DGL/PyG and cuGraph `_ -2019 -------- - * `RAPIDS cuGraph `_ - * `RAPIDS cuGraph — The vision and journey to version 1.0 and beyond `_ - * `RAPIDS cuGraph : multi-GPU PageRank `_ - * `Similarity in graphs: Jaccard versus the Overlap Coefficient `_ - * `GTC19 Spring - Accelerating Graph Algorithms with RAPIDS `_ - * `GTC19 Fall - Multi-Node Multi-GPU Machine Learning and Graph Analytics with RAPIDS `_ +2021 +------ + * `GTC 21 - State of RAPIDS cuGraph and what's comming next `_ 2020 ------ @@ -31,16 +27,19 @@ BLOGS & Conferences * `Large Graph Visualization with RAPIDS cuGraph `_ * `GTC 20 Fall - cuGraph Goes Big `_ -2021 ------- - * `GTC 21 - State of RAPIDS cuGraph and what's comming next `_ +2019 +------- + * `RAPIDS cuGraph `_ + * `RAPIDS cuGraph — The vision and journey to version 1.0 and beyond `_ + * `RAPIDS cuGraph : multi-GPU PageRank `_ + * `Similarity in graphs: Jaccard versus the Overlap Coefficient `_ + * `GTC19 Spring - Accelerating Graph Algorithms with RAPIDS `_ + * `GTC19 Fall - Multi-Node Multi-GPU Machine Learning and Graph Analytics with RAPIDS `_ +2018 +------- + * `GTC18 Fall - RAPIDS: Benchmarking Graph Analytics on the DGX-2 `_ -2022 ------- - * `GTC: State of cuGraph (video & slides) `_ - * `GTC: Scaling and Validating Louvain in cuGraph against Massive Graphs (video & slides) `_ - * `KDD Tutorial on Accelerated GNN Training with DGL/PyG and cuGraph `_ Media @@ -51,16 +50,16 @@ Media Academic Papers =============== + * Alex Fender, Brad Rees, Joe Eaton (2022) `Massive Graph Analytics `_ Bader, D. (Editor) CRC Press + * S Kang, A. Fender, J. Eaton, B. Rees:`Computing PageRank Scores of Web Crawl Data Using DGX A100 Clusters`. In IEEE HPEC, Sep. 2020 * Hricik, T., Bader, D., & Green, O. (2020, September). `Using RAPIDS AI to accelerate graph data science workflows`. In 2020 IEEE High Performance Extreme Computing Conference (HPEC) (pp. 1-4). IEEE. * Richardson, B., Rees, B., Drabas, T., Oldridge, E., Bader, D. A., & Allen, R. (2020, August). Accelerating and Expanding End-to-End Data Science Workflows with DL/ML Interoperability Using RAPIDS. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 3503-3504). - * Alex Fender, Brad Rees, Joe Eaton (2022) `Massive Graph Analytics `_ Bader, D. (Editor) CRC Press - -Other BLOGS +Other Blogs ======================== * `4 graph algorithms on steroids for data scientists with cugraph `_ * `Where should I walk `_ @@ -69,4 +68,8 @@ Other BLOGS * `Running Large-Scale Graph Analytics with Memgraph and NVIDIA cuGraph Algorithms `_ * `Dev Blog Repost: Similarity in Graphs: Jaccard Versus the Overlap Coefficient `_ - +RAPIDS Event Notebooks +====================== +* `KDD 2022 Notebook that demonstates using cuDF for ETL/data cleaning and XGBoost for training a fraud predection model. `_ +* `SciPy 22 Notebook comparing cuGraph to NetworkX `_ +* `KDD 2020 Tutorial Notebooks - Accelerating and Expanding End-to-End Data Science Workflows with DL/ML Interoperability Using RAPIDS `_ diff --git a/img/Stack.png b/img/Stack.png new file mode 100644 index 00000000000..5963f39f7a5 Binary files /dev/null and b/img/Stack.png differ diff --git a/img/Stack2.png b/img/Stack2.png new file mode 100644 index 00000000000..132e85c9d15 Binary files /dev/null and b/img/Stack2.png differ diff --git a/img/cugraph_logo_2.png b/img/cugraph_logo_2.png new file mode 100644 index 00000000000..62dd79c4b98 Binary files /dev/null and b/img/cugraph_logo_2.png differ diff --git a/img/cugraphops_context.png b/img/cugraphops_context.png new file mode 100644 index 00000000000..8db157d2f09 Binary files /dev/null and b/img/cugraphops_context.png differ diff --git a/img/gaas_img_1.png b/img/gaas_img_1.png new file mode 100644 index 00000000000..aacf4ce44ae Binary files /dev/null and b/img/gaas_img_1.png differ diff --git a/img/gaas_img_2.png b/img/gaas_img_2.png new file mode 100644 index 00000000000..02a32af2d4d Binary files /dev/null and b/img/gaas_img_2.png differ diff --git a/img/gnn_blog.png b/img/gnn_blog.png new file mode 100644 index 00000000000..e0a3e36a9dd Binary files /dev/null and b/img/gnn_blog.png differ diff --git a/img/gnn_context.png b/img/gnn_context.png new file mode 100644 index 00000000000..4ef002b3e39 Binary files /dev/null and b/img/gnn_context.png differ diff --git a/img/gnn_framework.png b/img/gnn_framework.png new file mode 100644 index 00000000000..61afb048403 Binary files /dev/null and b/img/gnn_framework.png differ diff --git a/img/pg_example.png b/img/pg_example.png new file mode 100644 index 00000000000..5ce8a0f2054 Binary files /dev/null and b/img/pg_example.png differ diff --git a/python/cugraph/cugraph/gnn/README.md b/python/cugraph/cugraph/gnn/README.md new file mode 100644 index 00000000000..e69de29bb2d diff --git a/CONTRIBUTING.md b/readme_pages/CONTRIBUTING.md similarity index 73% rename from CONTRIBUTING.md rename to readme_pages/CONTRIBUTING.md index c6456e02c47..8ab7162f6e4 100644 --- a/CONTRIBUTING.md +++ b/readme_pages/CONTRIBUTING.md @@ -1,23 +1,28 @@ # Contributing to cuGraph -cuGraph, and all of RAPIDS in general, is an open-source project where we encourage community involvement. There are multiple ways to be involved and contribute to the cuGraph community, the top paths are listed below: +cuGraph, for the most part, is an open-source project where we encourage community involvement. The cugraph-ops package is the expection being a closed-source package. -* [File an Issue](#issue) -* [Implement a New Feature](#implement) -* [Work on an Existing Issue](#bugfix) +There are multiple ways to be involved and contribute to the cuGraph community, the top paths are listed below: -If you are ready to contribute, jump right to the [Contribute Code](#code) section. +* [File an Issue](https://github.com/rapidsai/docs/issues/new) +* [Implement a New Feature](https://docs.rapids.ai/contributing/code/#your-first-issue) +* [Work on an Existing Issue](#F) + +If you are ready to contribute, jump right to the [Contribute Code](https://docs.rapids.ai/contributing/issues/) section. __Style Formatting Tools:__ -* `clang-format` version 8.01+ +* `clang-format` version 11.1+ * `flake8` version 3.5.0+ -* `black` version 22.3.0 -## 1) File an Issue for the RAPIDS cuGraph team to work + +## New Issue +1) File an Issue for the RAPIDS cuGraph team to work To file an issue, go to the RAPIDS cuGraph [issue](https://github.com/rapidsai/cugraph/issues/new/choose) page an select the appropriate issue type. Once an issue is filed the RAPIDS cuGraph team will evaluate and triage the issue. If you believe the issue needs priority attention, please include that in the issue to notify the team. + +## Find a Bug ***Bug Report*** If you notice something not working please file an issue - Select **Bug** Report @@ -31,7 +36,7 @@ If there is a feature or enhancement to an existing feature, please file an issu - describing what you want to see added or changed. For new features, if there is a white paper on the analytic, please include a reference to it ***Ask a Question*** -There are several ways to ask questions, including [Stack Overflow]( https://stackoverflow.com/) or the RAPIDS [Google forum]( https://groups.google.com/forum/#!forum/rapidsai), but a GitHub issue can be filled. +There are several ways to ask questions, including [Stack Overflow]( https://stackoverflow.com/), the quickest is by submiting a GitHub question issue. - Select Question - describing your question @@ -45,28 +50,31 @@ We love when people want to get involved, and if you have a suggestion for a new - Submit a New Feature Issue (see above) and state that you are working on it. - The team will give feedback on the issue and happy to make suggestions - Once we agree that the plan looks good, go ahead and implement it -- Follow the [code contributions](#code-contributions) guide below. +- Follow the [code contributions](#so-you-want-to-contribute-code) guide below. ## 3) You want to implement a feature or bug-fix for an outstanding issue - Find an open Issue, and post that you would like to work that issues - Once we agree that the plan looks good, go ahead and implement it -- Follow the [code contributions](#code-contributions) guide below. +- Follow the [code contributions](#so-you-want-to-contribute-code) guide below. If you need more context on a particular issue, please ask. +
+ ---- -# So you want to contribute code + +# So you want to contribute code **TL;DR General Development Process** -1. Read the documentation on [building from source](SOURCEBUILD.md) to learn how to setup, and validate, the development environment +1. Read the documentation on [building from source](./SOURCEBUILD.md) to learn how to setup, and validate, the development environment 2. Read the RAPIDS [Code of Conduct](https://docs.rapids.ai/resources/conduct/) 3. Find or submit an issue to work on (include a comment that you are working issue) 4. Fork the cuGraph [repo](#fork) and Code (make sure to add unit tests)! 5. When done, and code passes local CI, create your pull request (PR) 1. Update the CHANGELOG.md with PR number - see [Changelog formatting](https://docs.rapids.ai/resources/changelog/) - 2. Ensure that the PR has the proper [tags](PRTAGS.md) + 2. Ensure that the PR has the proper [tags](./PRTAGS.md) 3. Ensure the code matches out [style guide](https://docs.rapids.ai/resources/style/) 6. Verify that cuGraph CI passes all [status checks](https://help.github.com/articles/about-status-checks/). Fix if needed 7. Wait for other developers to review your code and update code as needed @@ -90,7 +98,7 @@ The RAPIDS cuGraph repo cannot directly be modified. Contributions must come in ```git clone https://github.com//cugraph.git``` -Read the section on [building cuGraph from source](SOURCEBUILD.md) to validate that the environment is correct. +Read the section on [building cuGraph from source](./SOURCEBUILD.md) to validate that the environment is correct. **Pro Tip** add an upstream remote repository so that you can keep your forked repo in sync ```git remote add upstream https://github.com/rapidsai/cugraph.git``` @@ -105,7 +113,7 @@ cuGraph only allows contribution to the current branch and not main or a future 1. commit your code ```git push``` 6. From the GitHub web page, open a Pull Request - 1. follow the Pull Request [tagging policy](PRTAGS.md) + 1. follow the Pull Request [tagging policy](./PRTAGS.md) ### Development Environment @@ -154,46 +162,9 @@ implementation of the issue, ask them in the issue instead of the PR. ### Style Guide -All Python code most pass flake8 and black style checking; see using pre-commit below. - +All Python code most pass flake8 style checking All C++ code must pass clang style checking - All code must adhere to the [RAPIDS Style Guide](https://docs.rapids.ai/resources/style/) -#### Python / Pre-commit hooks - -cuGraph developers may use [pre-commit](https://pre-commit.com/) to locally run code -linters and formatters including [Black](https://black.readthedocs.io/en/stable/) -and [flake8](https://flake8.pycqa.org/en/latest/). These tools ensure a consistent -code format throughout the project. Using pre-commit ensures that linter versions -and options are aligned for all developers. Additionally, there is a CI check in -place to enforce that committed code follows our standards. - -To use `pre-commit`, install via `conda` or `pip`: - -```bash -conda install -c conda-forge pre-commit -``` - -```bash -pip install pre-commit -``` - -Then run pre-commit hooks before committing code: - -```bash -pre-commit run -``` - -Optionally, you may set up the pre-commit hooks to run automatically when you make a git commit. This can be done by running: - -```bash -pre-commit install -``` - -Now code linters and formatters will be run each time you commit changes. - -You can skip these checks with `git commit --no-verify` or with the short version `git commit -n`. - ### Tests All code must have associate test cases. Code without test will not be accepted diff --git a/PRTAGS.md b/readme_pages/PRTAGS.md similarity index 58% rename from PRTAGS.md rename to readme_pages/PRTAGS.md index 7ba7fd8510b..4fa02ff9590 100644 --- a/PRTAGS.md +++ b/readme_pages/PRTAGS.md @@ -5,10 +5,9 @@ PR = Pull Request | TAG | | |------------|-------------------------------------------------------| -| WIP | _Work In Progress_ - Within the RAPIDS cuGraph team, we try to open a PR when development starts. This allows other to review code as it is being developed and provide feedback before too much code needs to be refactored. It also allows process to be tracked. __A WIP PR will not be merged into baseline__ | +| WIP | _Work In Progress_ - While it would be perferred to simple place the PR is [DRAFT](https://github.blog/2019-02-14-introducing-draft-pull-requests/) state (through GitHub), you can also label the PR as being a work in progress. Within the RAPIDS cuGraph team, we try to open a PR when development starts. This allows other to review code as it is being developed and provide feedback before too much code needs to be refactored. It also allows process to be tracked. __A WIP PR will not be merged into baseline__ | | skip-ci | _Do Not Run CI_ - This flag prevents CI from being run. It is good practice to include this with the **WIP** tag since code is typically not at a point where it will pass CI. | | skip ci | same as above | -| API-REVIEW | This tag request a code review just of the API portion of the code - This is beneficial to ensure that all required arguments are captured. Doing this early can save from having to refactor later. | | REVIEW | The code is ready for a full code review. Only code that has passed a code review is merged into the baseline | diff --git a/SOURCEBUILD.md b/readme_pages/SOURCEBUILD.md similarity index 98% rename from SOURCEBUILD.md rename to readme_pages/SOURCEBUILD.md index a917f0f8bb7..f2ab0e592bd 100644 --- a/SOURCEBUILD.md +++ b/readme_pages/SOURCEBUILD.md @@ -9,7 +9,7 @@ The cuGraph package include both a C/C++ CUDA portion and a python portion. Bot __Compiler__: * `gcc` version 9.3+ * `nvcc` version 11.0+ -* `cmake` version 3.23.1+ +* `cmake` version 3.20.1+ __CUDA:__ * CUDA 11.0+ @@ -43,9 +43,6 @@ __Create the conda development environment__ ```bash # create the conda environment (assuming in base `cugraph` directory) -# for CUDA 11.0 -conda env create --name cugraph_dev --file conda/environments/cugraph_dev_cuda11.0.yml - # for CUDA 11.2 conda env create --name cugraph_dev --file conda/environments/cugraph_dev_cuda11.2.yml diff --git a/TRANSITIONGUIDE.md b/readme_pages/TRANSITIONGUIDE.md similarity index 100% rename from TRANSITIONGUIDE.md rename to readme_pages/TRANSITIONGUIDE.md diff --git a/readme_pages/algorithms.md b/readme_pages/algorithms.md new file mode 100644 index 00000000000..fa2e7cc9553 --- /dev/null +++ b/readme_pages/algorithms.md @@ -0,0 +1,85 @@ +# List of Supported and Planned Algorithms + +## Supported Graph + +| Type | Description | +| ---------- | ----------------------------------------------------------- | +| Graph | A directed or undirected Graph (use directed={True, False}) | +| Multigraph | A Graph with multiple edges between a vertex pair | +| | | + +ALL Algorithms support Graphs and MultiGraph (directed and undirected) + +--- + +
+ +# Supported Algorithms + +_Italic_ algorithms are planned for future releases. + +Note: Multi-GPU, or MG, includes support for Multi-Node Multi-GPU (also called MNMG). + +| Category | Algorithm | Scale | Notes | +| ----------------- | ---------------------------------- | ------------------- | --------------------------------------------------------------- | +| Centrality | | | | +| | Katz | __Multi-GPU__ | | +| | Betweenness Centrality | Single-GPU | MG planned for 23.02 | +| | Edge Betweenness Centrality | Single-GPU | MG planned for 23.02 | +| | Eigenvector Centrality | __Multi-GPU__ | | +| | Degree Centrality | __Multi-GPU__ | Python only | +| Community | | | | +| | Leiden | Single-GPU | MG planned for 23.02 | +| | Louvain | __Multi-GPU__ | | +| | Ensemble Clustering for Graphs | Single-GPU | | +| | Spectral-Clustering - Balanced Cut | Single-GPU | | +| | Spectral-Clustering - Modularity | Single-GPU | | +| | Subgraph Extraction | Single-GPU | | +| | Triangle Counting | __Multi-GPU__ | | +| | K-Truss | Single-GPU | | +| Components | | | | +| | Weakly Connected Components | __Multi-GPU__ | | +| | Strongly Connected Components | Single-GPU | | +| Core | | | | +| | K-Core | **Multi-GPU** | | +| | Core Number | **Multi-GPU** | | +| _Flow_ | | | | +| | _MaxFlow_ | --- | | +| _Influence_ | | | | +| | _Influence Maximization_ | --- | | +| Layout | | | | +| | Force Atlas 2 | Single-GPU | | +| Linear Assignment | | | | +| | Hungarian | Single-GPU | [README](cpp/src/linear_assignment/README-hungarian.md) | +| Link Analysis | | | | +| | Pagerank | __Multi-GPU__ | [C++ README](cpp/src/centrality/README.md#Pagerank) | +| | Personal Pagerank | __Multi-GPU__ | [C++ README](cpp/src/centrality/README.md#Personalized-Pagerank) | +| | HITS | __Multi-GPU__ | | +| Link Prediction | | | | +| | Jaccard Similarity | **Multi-GPU** | MG as of 22.12
Directed graph only | +| | Weighted Jaccard Similarity | Single-GPU | | +| | Overlap Similarity | **Multi-GPU** | MG as of 22.12 | +| | Sorensen Coefficient | **Multi-GPU** | MG as of 22.12 | +| | _Local Clustering Coefficient_ | --- | | +| Sampling | | | | +| | Uniform Random Walks (RW) | **Multi-GPU** | | +| | *Biased Random Walks (RW)* | --- | | +| | Egonet | **Multi-GPU** | | +| | Node2Vec | Single-GPU | MG planned for 23.02 | +| | Uniform Neighborhood sampling | __Multi-GPU__ | | +| Traversal | | | | +| | Breadth First Search (BFS) | __Multi-GPU__ | with cutoff support``[C++ README](cpp/src/traversal/README.md#BFS) | +| | Single Source Shortest Path (SSSP) | __Multi-GPU__ | [C++ README](cpp/src/traversal/README.md#SSSP) | +| | _ASSP / APSP_ | --- | | +| Tree | | | | +| | Minimum Spanning Tree | Single-GPU | | +| | Maximum Spanning Tree | Single-GPU | | +| Other | | | | +| | Renumbering | __Multi-GPU__ | multiple columns, any data type | +| | Symmetrize | __Multi-GPU__ | | +| | Path Extraction | | Extract paths from BFS/SSP results in parallel | +| | Two Hop Neighbors | __Multi-GPU__ | | +| Data Generator | | | | +| | RMAT | __Multi-GPU__ | | +| | _Barabasi-Albert_ | --- | | +| | | | | diff --git a/readme_pages/cugraph_dgl.md b/readme_pages/cugraph_dgl.md new file mode 100644 index 00000000000..3c6ddd4026b --- /dev/null +++ b/readme_pages/cugraph_dgl.md @@ -0,0 +1,28 @@ +# cugraph_dgl + +[RAPIDS](https://rapids.ai) cugraph_dgl enables the ability to use cugraph Property Graphs with DGL. This cugraph backend allows DGL users access to a collection of GPU-accelerated algorithms for graph analytics, such as sampling, centrality computation, and community detection. + + +The goal of `cugraph_dgl` is to enable Multi-Node Multi-GPU cugraph accelerated graphs to help train large-scale Graph Neural Networks(GNN) on DGL by providing a duck-typed version of the [DGLGraph](https://docs.dgl.ai/api/python/dgl.DGLGraph.html#dgl.DGLGraph) which uses cugraph for storing graph structure and node/edge feature data. + +## Usage +```diff + ++from cugraph_dgl.convert import cugraph_storage_from_heterograph ++cugraph_g = cugraph_storage_from_heterograph(dgl_g) + +sampler = dgl.dataloading.NeighborSampler( + [15, 10, 5], prefetch_node_feats=['feat'], prefetch_labels=['label']) + +train_dataloader = dgl.dataloading.DataLoader( +- dgl_g, ++ cugraph_g, +train_idx, +sampler, +device=device, +batch_size=1024, +shuffle=True, +drop_last=False, +num_workers=0) +``` + diff --git a/readme_pages/cugraph_ops.md b/readme_pages/cugraph_ops.md new file mode 100644 index 00000000000..87b0051a815 --- /dev/null +++ b/readme_pages/cugraph_ops.md @@ -0,0 +1,17 @@ +

+
+ cuGraph +

+

+CuGraphOps +

+Cugraph-ops is a closed-source library that is composed of highly optimized and +performant primitives associated with GNNs and related graph +operations, such as training, sampling and inference. + + +This is how cuGraphOps fits into the cuGraph ecosystem +

+
+ cuGraph +

diff --git a/readme_pages/cugraph_pyg.md b/readme_pages/cugraph_pyg.md new file mode 100644 index 00000000000..147cd70b944 --- /dev/null +++ b/readme_pages/cugraph_pyg.md @@ -0,0 +1,22 @@ +# cugraph_pyg + +[RAPIDS](https://rapids.ai) cugraph_pyg enables the ability to use cugraph Property Graphs with PyTorch Geometric (PyG). PyG users will have access to cuGraph and cuGraph-Service through the PyG GraphStore, FeatureStore, and Sampler interfaces. Through cugraph_pyg, PyG users have the full power of cuGraph's GPU-accelerated algorithms for graph analytics, such as sampling, centrality computation, and community detection. + + +The goal of `cugraph_pyg` is to enable accelerated single-GPU and multi-node, multi-GPU cugraph accelerated graphs to help train large-scale Graph Neural Networks (GNN) on PyG by providing duck-typed drop-in replacements of the `GraphStore`, `FeatureStore`, and `Sampler` interfaces backed by either cuGraph or cuGraph-Service. + +Users of cugraph_pyg have the option of installing either the cugraph or cugraph_service_client packages. Only one is required. + +## Usage +``` +G = cuGraph.PropertyGraph() +... +feature_store, graph_store = to_pyg(G) +sampler = CuGraphSampler( + data=(feature_store, graph_store), + shuffle=True, + num_neighbors=[10,25], + batch_size=50, +) +... +``` diff --git a/readme_pages/cugraph_python.md b/readme_pages/cugraph_python.md new file mode 100644 index 00000000000..164c1212ed8 --- /dev/null +++ b/readme_pages/cugraph_python.md @@ -0,0 +1,24 @@ +# cuGraph – Python + + +cuGraph is a Python package that encapsulate and hides the complexity of the lower layer C/CUDA code. Additionally, the software is focused on providing an easy and familiar API + + + +## cuGraph Notice + +Vertex IDs are expected to be contiguous integers starting from 0. If your data doesn't match that restriction, we have a solution. cuGraph provides the renumber function, which is by default automatically called when data is added to a graph. Input vertex IDs for the renumber function can be any type, can be non-contiguous, can be multiple columns, and can start from an arbitrary number. The renumber function maps the provided input vertex IDs to either 32- or 64-bit contiguous integers starting from 0. + +Additionally, when using the auto-renumbering feature, vertices are automatically un-renumbered in results. + +cuGraph is constantly being updated and improved. Please see the [Transition Guide](TRANSITIONGUIDE.md) if errors are encountered with newer versions + +## Graph Sizes and GPU Memory Size +The amount of memory required is dependent on the graph structure and the analytics being executed. As a simple rule of thumb, the amount of GPU memory should be about twice the size of the data size. That gives overhead for the CSV reader and other transform functions. There are ways around the rule but using smaller data chunks. + +| Size | Recommended GPU Memory | +|-------------------|------------------------| +| 500 million edges | 32 GB | +| 250 million edges | 16 GB | + +The use of managed memory for oversubscription can also be used to exceed the above memory limitations. See the recent blog on _Tackling Large Graphs with RAPIDS cuGraph and CUDA Unified Memory on GPUs_: https://medium.com/rapids-ai/tackling-large-graphs-with-rapids-cugraph-and-unified-virtual-memory-b5b69a065d4 \ No newline at end of file diff --git a/readme_pages/cugraph_service.md b/readme_pages/cugraph_service.md new file mode 100644 index 00000000000..9c06cd9f71a --- /dev/null +++ b/readme_pages/cugraph_service.md @@ -0,0 +1,28 @@ +# cuGraph Service + +The goal of cugraph_service is to wrap a cuGraph cluster and provide a Graph-as-a-Service feature. + +Goals +* Separate large graph management and analytic code from application code + * The application, like GNN code, should be isolated from the details of cuGraph graph management, dedicated multi-node/multi-GPU setup, feature storage and retrieval, etc. + * Scaling from single GPU (SG), to multi-GPU (MG), to multi-node/multi-GPU (MNMG) should not require changes to the graph integration code + +* Support multiple concurrent clients/processes/threads sharing one or more graphs + * No need for each client to have access to graph data files, perform ETL, etc. + * Simplify concurrent programming – synchronization, batch processing details, etc. are implemented server-side + * The GNN user should be able to easily partition and isolate hardware resources between graph analytics and training + * The application/GNN code should be able to prefetch graph samples while training without resource contention + +* Simplify MNMG deployments + * Docker images contain all required, compatible packages + * Clients use the same APIs for both SG and MG +

+ +# Picture + +One option on a single DGX +graph_service_cluster + + +Using cugraph-service on multiple DGXs +graph_service_cluster diff --git a/readme_pages/data_types.md b/readme_pages/data_types.md new file mode 100644 index 00000000000..37f2ee4daf6 --- /dev/null +++ b/readme_pages/data_types.md @@ -0,0 +1,46 @@ +# External Data Types +cuGraph Python strives to make getting data into and out of cuGraph simple. To that end, the Python interface accepts + + + +## Supported Data Types +cuGraph supports graph creation with Source and Destination being expressed as: +* cuDF DataFrame +* Pandas DataFrame +* NetworkX graph classes +* Numpy arrays +* CuPy sparse matrix +* SciPy sparse matrix + +cuGraph tries to match the return type based on the input type. So a NetworkX input will return the same data type that NetworkX would have. + +## cuDF +The preferred data type is a cuDF object since it is already in the GPU. For loading data from disk into cuDF please see the cuDF documentation. + +__Loading data__ + * Graph.from_cudf_adjlist + * Graph.from_cudf_edgelist + + +__Results__
+Results which are not simple types (ints, floats) are typically cuDF Dataframes. + + + +## Pandas +The RAPIDS cuDF library can be thought of as accelerated Pandas + + +## NetworkX Graph Objects + + +## + + + + + + +

+ +--- diff --git a/readme_pages/getting_cugraph.md b/readme_pages/getting_cugraph.md new file mode 100644 index 00000000000..ac81d1fec2c --- /dev/null +++ b/readme_pages/getting_cugraph.md @@ -0,0 +1,64 @@ + +# Getting cuGraph Packages + +There are 4 ways to get cuGraph packages: +1. [Quick start with Docker Repo](#docker) +2. [Conda Installation](#conda) +3. [Pip Installation](#pip) +4. [Build from Source](#SOURCE) + +Or checkout the [RAPIDS install selector](https://rapids.ai/start.html) for a pick list of install options. + +
+ +## Docker +The RAPIDS Docker containers contain all RAPIDS packages, including all from cuGraph, as well as all required supporting packages. To download a container, please see the [Docker Repository](https://hub.docker.com/r/rapidsai/rapidsai/), choosing a tag based on the NVIDIA CUDA version you’re running. This provides a ready to run Docker container with example notebooks and data, showcasing how you can utilize all of the RAPIDS libraries: cuDF, cuML, and cuGraph. + +
+ + +## Conda +It is easy to install cuGraph using conda. You can get a minimal conda installation with [Miniconda](https://conda.io/miniconda.html) or get the full installation with [Anaconda](https://www.anaconda.com/download). + +cuGraph Conda packages + * cugraph - this will also import: + * pylibcugraph + * libcugraph + * cugraph_service_client + * cugraph_service_server + * cugraph_dgl + * cugraph_pyg + +Replace the package name in the example below to the one you want to install. + + +Install and update cuGraph using the conda command: + +```bash +# CUDA 11.4 +conda install -c nvidia -c rapidsai -c numba -c conda-forge cugraph cudatoolkit=11.4 + +# CUDA 11.5 +conda install -c nvidia -c rapidsai -c numba -c conda-forge cugraph cudatoolkit=11.5 + +For CUDA > 11.5, please use the 11.5 environment +``` + +Note: This conda installation only applies to Linux and Python versions 3.8/3.9. + +
+ +## PIP +cuGraph, and all of RAPIDS, is available via pip. + +``` +pip install cugraph-cu11 --extra-index-url=https://pypi.ngc.nvidia.com +``` + +pip packages for other packages are being worked and should be available in early 2023 + +
+ +## SOURCE +cuGraph can be build directly from source. First check to make sure you have or can configure a supported environment. +Instructions for building from source is in our [source build](./SOURCEBUILD.md) page. \ No newline at end of file diff --git a/readme_pages/gnn_support.md b/readme_pages/gnn_support.md new file mode 100644 index 00000000000..1c52be1c013 --- /dev/null +++ b/readme_pages/gnn_support.md @@ -0,0 +1,33 @@ +

+
+ cuGraph +

+ +

+
+GNN Support +

+ +RAPIDS offers support to GNN (Graph Neural Networks). Several components of the RAPIDS ecosystem fit into a typical GNN framework as shown below. +An overview of GNN's and how they are used is found in this excellent [blog](https://blogs.nvidia.com/blog/2022/10/24/what-are-graph-neural-networks/). + +

+ cuGraph +

+ +
+ +[RAPIDS cuDF](https://docs.rapids.ai/api/cudf/stable/user_guide/10min.html) * +[RAPIDS cuGraph](https://docs.rapids.ai/api/cugraph/stable/basics/cugraph_intro.html) * +[Property Graph](./property_graph.md) * +[NVTabular](https://developer.nvidia.com/nvidia-merlin/nvtabular) * +[NVIDIA Triton](https://developer.nvidia.com/nvidia-triton-inference-server) + +
+ +RAPIDS GNN components improve other industy GNN specific projects. Due to the degree distribution of nodes, memory bottlenecks are the pain point for large scale graphs. To solve this problem, sampling operations form the backbone for Graph Neural Networks (GNN) training. However, current sampling methods provided by other libraries are not optimized enough for the whole process of GNN training. The main limit to performance is moving data between the hosts and devices. In cuGraph, we provide an end-to-end solution from data loading to training all on the GPUs. + +CuGraph now supports compatibility with [Deep Graph Library](https://www.dgl.ai/) (DGL) and [PyTorch Geometric](https://pytorch-geometric.readthedocs.io/en/latest/) (PyG) by allowing conversion between a cuGraph object and a DGL or PyG object, making it possible for DGL and PyG users to access efficient data loader and graph operations (such as uniformed sampling) implementations in cuGraph, as well as keep their models unchanged in DGL or PyG. We have considerable speedup compared with the original implementation in DGL and PyG. + +[](https://developer.nvidia.com/blog/optimizing-fraud-detection-in-financial-services-with-graph-neural-networks-and-nvidia-gpus/) + diff --git a/readme_pages/libcugraph.md b/readme_pages/libcugraph.md new file mode 100644 index 00000000000..015b0a3af8f --- /dev/null +++ b/readme_pages/libcugraph.md @@ -0,0 +1,6 @@ +

+
+ cuGraph +

+ +[Libcugraph](https://github.com/rapidsai/cugraph/blob/branch-22.12/cpp/docs/DEVELOPER_GUIDE.md) is the Rapids C/C++ library for graph processing. The library provides a C++ API for users wanting low-level access. Users familiar with C/C++/CUDA and graph structures can access libcugraph and libcugraph_c for integration outside of python. diff --git a/readme_pages/news.md b/readme_pages/news.md new file mode 100644 index 00000000000..f9ebe36b9dd --- /dev/null +++ b/readme_pages/news.md @@ -0,0 +1,13 @@ +# cuGraph News + +### Scaling to 1 Trillion Edges +At GTC Spring '22 we presented results of running cuGraph on the [Selene](https://top500.org/system/179842/) supercomputer using 2,048 GPUs and processing a graph with `1.1 Trillion edges`. Synthetic data created with the RMAT generator found in cuGraph. + +
 
cuGraph Scaling
+

+ +### cuGraph Software Stack +cuGraph has a new multi-layer software stack that allows users and system integrators to access cuGraph at different layers. + +
 
cuGraph Software Stack
+

\ No newline at end of file diff --git a/readme_pages/performance/performance.md b/readme_pages/performance/performance.md new file mode 100644 index 00000000000..5700c59bba3 --- /dev/null +++ b/readme_pages/performance/performance.md @@ -0,0 +1,7 @@ + + + + +We are working on a new nightly benchmarking system that will produce performance numbers. +This is a splash page for where the performance numbers will be posted in early 2023. \ No newline at end of file diff --git a/readme_pages/property_graph.md b/readme_pages/property_graph.md new file mode 100644 index 00000000000..19d6e23f718 --- /dev/null +++ b/readme_pages/property_graph.md @@ -0,0 +1,54 @@ +

+
+ cuGraph +

+

+
+Property Graph +

+ +Part of [RAPIDS](https://rapids.ai) cuGraph, Property Graph allows all the great benefits of cuGraph to be applied to property-rich datasets stored in a graph structure. A Property Graph is really a data model rather than a type of graph. Within the cuGraph ecosystem, a Property Graph is a meta-graph that can encapsulate and instantiate all the other graph types. That view stems from property graphs being originally created for database systems. Conceptually a Property Graph can be viewed as a property rich structure that can be projected onto any graph types. The Dataversity, has a good definition of [Property Graph](https://www.dataversity.net/what-is-a-property-graph) which contains definitions from a collection of resources. + +Property Graph enables: + +* Multiple edge and node types as seen in the Property Graph API +* Subgraph extractions based on properties and/or edge and node types as seen below. +* Storage of properties either within the graph structure on gpu or using GNN-centric storage extensions on host storage. +* Adding additional properties, nodes and edges into the property graph to store derived data like analytic results. +* Client access managed by a remote server allowing shared access and remote operations using [CuGraph Service](./cugraph_service.md). + +This is an example of using the cuGraph Property Graph in a two stage analysis. + +``` +import cudf +import cugraph +from cugraph.experimental import PropertyGraph + +# Import a built-in dataset +from cugraph.experimental.datasets import karate + +# Read edgelist data into a DataFrame, load into PropertyGraph as edge data. +# Create a graph using the imported Dataset object +graph = cugraph.Graph(directed=False) +G = karate.get_graph(create_using=graph,fetch=True) + +df = G.edgelist.edgelist_df +pG = PropertyGraph() +pG. add_edge_data(df, vertex_col_names=("src", "dst")) + +# Run Louvain to get the partition number for each vertex. +# Set resolution accordingly to identify two primary partitions. +(partition_info, _) = cugraph.louvain(pG.extract_subgraph(create_using=graph), resolution=0.6) + +# Add the partition numbers back to the Property Graph as vertex properties +pG.add_vertex_data(partition_info, vertex_col_name="vertex") + +# Use the partition properties to extract a Graph for each partition. +G0 = pG.extract_subgraph(selection=pG.select_vertices("partition == 0")) +G1 = pG.extract_subgraph(selection=pG. select_vertices("partition == 1")) +# Run pagerank on each graph, print results. +pageranks0 = cugraph.pagerank(G0) +pageranks1 = cugraph.pagerank(G1) +print(pageranks0.sort_values (by="pagerank", ascending=False).head(3)) +print(pageranks1.sort_values (by="pagerank", ascending=False).head(3)) +``` diff --git a/readme_pages/pylibcugraph.md b/readme_pages/pylibcugraph.md new file mode 100644 index 00000000000..3bb552141e9 --- /dev/null +++ b/readme_pages/pylibcugraph.md @@ -0,0 +1,25 @@ +

/ +
+ cuGraph +

+

+
+CuGraph pylibcugraph +

+ +Part of [RAPIDS](https://rapids.ai) cuGraph, pylibcugraph is a wrapper around the cuGraph C API. It is aimed more at integrators instead of algorithm writers or end users like Data Scientists. Most of the cuGraph python API uses pylibcugraph to efficiently run algorithms by removing much of the overhead of the python-centric implementation, relying more on cython instead. Pylibcugraph is intended for applications that require a tighter integration with cuGraph at the Python layer with fewer dependencies. + +Here is an example of calling the Louvain algorithm using pylibcugraph directly. + +``` +import pylibcugraph, cupy, numpy +srcs = cupy.asarray([0, 1, 2], dtype=numpy.int32) +dsts = cupy.asarray([1, 2, 0], dtype=numpy.int32) +weights = cupy.asarray([1.0, 1.0, 1.0], dtype=numpy.float32) +resource_handle = pylibcugraph.ResourceHandle() +graph_props = pylibcugraph.GraphProperties(is_symmetric=True, is_multigraph=False) +G = pylibcugraph.SGGraph( + resource_handle, graph_props, srcs, dsts, weights, + store_transposed=True, renumber=False, do_expensive_check=False) +(vertices, clusters, modularity) = pylibcugraph.louvain(resource_handle, G, 100, 1., False) +```