This is external graph system designed for Solid-State Disks (SSDs). That is, MBFGraph can use very small amount of memory to process very large graphs (e.g., processing 475GB graph with only 4GB memory). Specifically, MBFGraph adopts Millions of Bloom Filters simultaneously as approximate indice to efficiently filter out unnecessary read data and significantly reduce graph analysis time.
C++ boost library
libaio
Ubuntu 20.04
Dependency packages: libboost-all-dev libaio-dev
x86 CPU with AVX2 instruction support
You can check /proc/cpuinfo to see whether you CPU support such instructions.
Note that such instruction extension should already be widely supported.
SSDs with good random performance
Most SSDs should be able to provide decent random performance. However, some QLC-based ultra-high-capacity SSDs may provide very bad random performnace.
CPU: i7-4790
Storage: Samsung EVO 870 2TB
make -j
Troubleshooting: If you encounter any makefile error, please remove anything under object_files/.
Note that all different graph analyses, by default, are only for BFS. To enable PageRank and CC, please uncomment some lines in the run_###.sh.
Also note that, to reduce repeatly bloom filter (BF) files generation time, the script may generate BF file via degree_cnt analysis. As a result, BF files can be reused across different graph analyses.
Configuration:
Number of CPUs: Every run_##.sh provides an environment variable, PROCE, to adjust the number of threads executing during graph analyses.
Threshold on whether performing MBF query: Every run_##.sh provides an environment variable, MAX, to decide whether MBF query has to be performed. If MAX is 1, it indicates that MBF query is not executed, so MBFGraph is actually the baseline x-stream.
./download_twitter.sh
./run_twitter.sh # This is MBFGraph-optimized version of BFS on Twitter graph.
./run_twitter_baseline.sh # This is baseline (X-stream) version of BFS on Twitter graph.
./download_random.sh
./run_random.sh # This is MBFGraph-optimized version of BFS on Random (hello) graph.
./run_random_baseline.sh # This is baseline (X-stream) version of BFS on Random (hello) graph.
./download_web.sh
./run_web.sh # This is MBFGraph-optimized version of BFS on Web graph. To test the baseline version, please change MAX environment from 270000 to 1.
./download_web_large.sh # We're still working on uploading entire (475GB) graph to Zenodo.
./run_web.sh # This is MBFGraph-optimized version of BFS on Web_large graph. To test the baseline version, please change MAX environment from 270000 to 1.
./download_random.sh
./download_add.sh # three different datasets (1, 1M, 10M) are downloaded.
./run_random_update.sh # This is MBFGraph-optimized version of BFS on evolving Random graph. To test the baseline version, please change MAX environment from 270000 to 1.
Note that the edge-adding dataset can be changed to 1 or 1M edges. (default: 10M per round)
./download_twitter.sh
./run_delete_twitter.sh # This is MBFGraph-optimized version of BFS on evolving Twitter graph without purging. To test the baseline version, please change MAX environment from 270000 to 1.
./run_purge_delete_twitter.sh # This is MBFGraph-optimized version of BFS on evolving Twitter graph with purging. To test the baseline version, please change MAX environment from 270000 to 1.
Note that delete_edge and purge_delete utilities are provided under update_utility/.
This utility is provided under update_utility.
Usage: ./MBFSort <input edge (graph) file> <bloom filter bits> <output edge (graph) file>
Example: ./MBFSort input_twitter 8 output_twitter # 8 means 256 (2^8) bits.
Dataset: https://doi.org/10.5281/zenodo.8076831
Larger Dataset: https://doi.org/10.5281/zenodo.8088562
Docker image: https://doi.org/10.5281/zenodo.8080103