Skip to content

s1ck/ldbc-flink-import

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build Status

ldbc-flink-import

Used to load the output of the LDBC-SNB Data Generator into Apache Flink DataSets for further processing. The LDBC data generator is designed to produce directed labeled graphs that mimic the characteristics of those graphs of real data. A detailed description of the schema produced by the data generator, as well as the format of the output files, can be found in the latest version of the official LDBC-SNB specification document.

LDBC Schema https://raw.githubusercontent.com/ldbc/ldbc_snb_docs/master/figures/schema.pdf

The tool reads the LDBC output files from a given directory (either local or HDFS) and creates two datasets containing all vertices and edges. Vertices and edges are represented by tuples. A vertex stores an id which is unique among all vertices, a vertex label and key-value properties represented by a HashMap. An edge stores an id which is unique among all edges, an edge label, source and target vertex identifiers and key-value properties.

Usage

Add dependency to your maven project:

<repositories>
  <repository>
    <id>dbleipzig</id>
    <name>Database Group Leipzig University</name>
    <url>https://wdiserv1.informatik.uni-leipzig.de:443/archiva/repository/dbleipzig/</url>
    <releases>
      <enabled>true</enabled>
    </releases>
    <snapshots>
      <enabled>true</enabled>
    </snapshots>
   </repository>
</repositories>

<dependency>
  <groupId>org.s1ck</groupId>
  <artifactId>ldbc-flink-import</artifactId>
  <version>0.1</version>
</dependency>

Use in your project

LDBCToFlink ldbcToFlink = new LDBCToFlink(
      "/path/to/ldbc/output", // or "hdfs://..."
      ExecutionEnvironment.getExecutionEnvironment());

DataSet<LDBCVertex> vertices = ldbcToFlink.getVertices();
DataSet<LDBCEdge> edges = ldbcToFlink.getEdges();

License

Licensed under the GNU General Public License, v3: http://www.gnu.org/licenses/gpl-3.0.html

About

Loads LDBC social graph data into Flink DataSets

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages