FlashX is a collection of big data analytics tools that perform data analytics in the form of graphs and matrices.
C++ Makefile Perl Shell CMake R Other
Clone or download
Permalink
Failed to load latest commit information.
EC2 fix a bug in docker init Aug 16, 2017
cmake Fix cmake for finding hwloc. May 3, 2017
conf Fix a bug in setting IRQ affinity. Apr 24, 2017
docker Update Dockerfile to install all components. Aug 21, 2017
docs [Doc]: remove documents of FlashR in txt format. Dec 5, 2015
flash-graph [Graph]: check error in SpMM. Aug 11, 2017
libsafs [SAFS]: remove likely() and unlikely(). May 26, 2017
matrix Merge branch 'release' of https://github.com/flashxio/FlashX into rel… Aug 21, 2017
scripts Add scritps of showing CPU usage. Jan 23, 2014
utils [SAFS]: remove ABORT_MSG. Feb 6, 2017
.travis.yml avoid installing R in travis. Aug 21, 2017
CMakeLists.txt Fix a bug of getting #nodes. Jul 29, 2017
LICENSE [SAFS]: switch to Apache v2 license. May 19, 2014
Makefile adjust Makefile for new dependency between FlashGraph and FlashMatrix. Jan 4, 2017
Makefile.SAFS [SAFS]: switch to Apache v2 license. May 19, 2014
Makefile.common use libnuma and libaio by default in Makefile. Feb 24, 2017
README.md Update README.md Aug 15, 2017
config.h.in [SAFS]: use cmake to build SAFS. Jun 12, 2014
gen_doc.sh Update the script of generating docs. Jan 9, 2017
get_num_nodes.sh Fix a bug of getting #nodes. Jul 29, 2017
install_FlashGraphR.sh [R]: avoid linking to libnuma on SMP machines. Apr 28, 2017
install_FlashR.sh update the FlashR installation script Aug 16, 2017
install_FlashRLearn.sh Add the script of installing FlashR-learn. Jul 9, 2017
mainpage.dox Update mainpage of doxygen. Dec 5, 2015

README.md

This repo contains the core of the FlashX project, which provides big data analytics tools that perform data analytics in the form of graphs and matrices. As such, FlashX covers a large range of data analysis tasks. All tools in FlashX utilize solid-state drives (SSDs) to scale data analysis to large datasets in a single machine, while achieving lightning speed (SSD-based solutions run almost as fast as in-memory solutions). The main components in FlashX are FlashGraph and FlashMatrix.

FlashGraph

FlashGraph is a general-purpose graph analysis framework that exposes vertex-centric programming interface for users to express varieties of graph algorithms. FlashGraph scales graph computation to large graphs by keeping the edges of a graph on SSDs and computation state in memory. With smart I/O scheduling, FlashGraph is able to achieve performance comparable to state-of-art in-memory graph analysis frameworks and significantly outperforms state-of-art distributed graph analysis frameworks while being able to scale to graphs with billions of vertices and hundreds of billions of edges. Please see the performance result.

FlashMatrix

FlashMatrix is a matrix computation engine that provides a small set of generalized matrix operations on sparse matrices and dense matrices to express varieties of data mining and machine learning algorithms. For certain graph algorithms such as PageRank, which can be formulated as sparse matrix multiplication, FlashMatrix is able to significantly outperform FlashGraph.

Programming interface

FlashX exposes C++, R and Python programming interface. The R and Python programming interface is highly compatible with the R base package and NumPy. As such, users can execute R and Python machine learning code on FlashX with little or no modification. Our goal is to eventually make the R and Python interface fully compatible with the ones in native R and NumPy.

  • FlashR provides many matrix operations in the R base package.
  • FlashGraphR exposes many graph algorithms in FlashGraph to R.
  • FlashR-learn is a machine learning library implemented completely with FlashR.
  • FlashPy provides many array operations in NumPy.

Documentation

FlashX Quick start guide

FlashGraph programming tutorial.

FlashR programming tutorial

FlashX performance and scalability

Publications

Da Zheng, Disa Mhembere, Joshua T. Vogelstein, Carey E. Priebe, and Randal Burns, “FlashMatrix: Parallel, scalable data analysis with generalized matrix operations using commodity ssds,” arXiv preprint arXiv:1604.06414, 2016 [pdf]

Da Zheng, Disa Mhembere, Vince Lyzinski, Joshua Vogelstein, Carey E. Priebe, and Randal Burns, “Semi-external memory sparse matrix multiplication on billion-node graphs”, Transactions on Parallel and Distributed Systems, 2016. [pdf]

Heng Wang, Da Zheng, Randal Burns, Carey Priebe, Active Community Detection in Massive Graphs, SDM-Networks 2015 [pdf]

Da Zheng, Disa Mhembere, Randal Burns, Joshua Vogelstein, Carey E. Priebe, Alexander S. Szalay, FlashGraph: Processing Billion-Node Graphs on an Array of Commodity SSDs, FAST'15, [pdf][bib]

Da Zheng, Randal Burns, Alexander S. Szalay, Toward Millions of File System IOPS on Low-Cost, Commodity Hardware, in Proceeding SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, [pdf][bib]

Contact

Mailing list: flashgraph-user@googlegroups.com

Join the chat at https://gitter.im/icoming/FlashGraph