Skip to content

Commit

Permalink
Gremlin the Grouch 0.9 release commit
Browse files Browse the repository at this point in the history
  • Loading branch information
okram committed Apr 4, 2011
1 parent 4630317 commit 29b9dd8
Show file tree
Hide file tree
Showing 37 changed files with 2,392 additions and 0 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.textile
Expand Up @@ -30,6 +30,7 @@ h3. Version 0.9 (Gremlin the Grouch - April 4, 2011)
* @emit@ step renamed to @transform@
* @foreach@ step renamed to @sideeffect@
* Removed @futuref@ step as @back@ supplies necessary computation
* The @it@ of the @gather@ step is now the gathered @List@

==<hr/>==

Expand Down
13 changes: 13 additions & 0 deletions doc/wiki/Acknowledgments.textile
@@ -0,0 +1,13 @@
[[https://github.com/tinkerpop/gremlin/raw/master/doc/images/gremlin-standing-small.png]]

This section provides a list of the people that have contributed in some way to the creation of Gremlin.

# "Marko A. Rodriguez":http://markorodriguez.com -- designed, developed, tested, and documented Gremlin.
# "Pavel Yaskevich":http://github.com/xedin -- designed and developed Gremlin 0.5 compiler and virtual machine.
# "Darrick Wiebe":http://ofallpossibleworlds.wordpress.com/ -- inspired many of Gremlin 0.7+ developments.
# "Peter Neubauer":http://www.linkedin.com/in/neubauer -- aided in the design and the evangelizing of Gremlin.
# "Ketrina Yim":http://csillustrated.berkeley.edu -- designed the Gremlin logo.

Please review Gremlin's "pom.xml":http://github.com/tinkerpop/gremlin/blob/master/pom.xml. Gremlin would not be possible without the work done by others to create these useful packages.

Join the Gremlin users group at "http://groups.google.com/group/gremlin-users":http://groups.google.com/group/gremlin-users.
68 changes: 68 additions & 0 deletions doc/wiki/Backtrack-Pattern.textile
@@ -0,0 +1,68 @@
[[https://github.com/tinkerpop/gremlin/raw/master/doc/images/gremlin-kilt.png]]

Many times its desirable to traverse a particular path and if some criteria is met along that path, then go back to the element from n-steps ago. Examples of such uses cases include:

* "What is the age of my friends who have friends who are older than 30 years old?"
* "What other products have my friends purchased who have also purchased a product of type X?"

```text
g = TinkerGraphFactory.createTinkerGraph()
```

The query below says, in plain English: "What are the ages of the people that know people that are 30+ years old?" The call to @back(3)@ refers to the elements 3 steps ago that have paths up to the @back(3)@ step (i.e. back to the @V@ step). In the example below, @back(3)@ "wraps" @outE('knows').inV{it.age > 30}@.

```text
gremlin> g.V.outE('knows').inV{it.age > 30}.back(3).age
==>29
```

A more complicated example is provided over the Grateful Dead graph diagrammed in [[Defining a More Complex Property Graph]].

```text
g = new TinkerGraph()
GraphMLReader.inputGraph(g, new FileInputStream('data/graph-example-2.xml'))
```

The example query below states the following:

* get the song with id @89@ (Dark Star).
* get all the songs that follow Dark Star in concert.
* get the singers of those songs.
* filter to only those songs that are sung by Jerry Garcia.
* go back 4 steps to yield those songs that follow Dark Star and are sung by Jerry Garcia.
* get the names of those songs that follow Dark Star and are sung by Jerry Garcia.

```text
gremlin> g.v(89).outE('followed_by').inV.outE('sung_by').inV[[name:'Garcia']].back(3).name
==>EYES OF THE WORLD
==>SING ME BACK HOME
==>MORNING DEW
==>HES GONE
==>CHINA DOLL
==>WHARF RAT
==>BROKEDOWN PALACE
==>TERRAPIN STATION
==>DEAL
==>ATTICS OF MY LIFE
==>COMES A TIME
==>STELLA BLUE
==>BERTHA
```

[[https://github.com/tinkerpop/gremlin/raw/master/doc/images/jerry-followed_by-example.jpg]]

In order to determine how many steps to go back, the @GremlinPipeline.toString()@ can be handy for displaying all the steps in an expression.

```text
gremlin> println g.v(89).outE('followed_by').inV.outE('sung_by').inV[[name:'Garcia']]
[OutEdgesPipe<followed_by>, InVertexPipe, OutEdgesPipe<sung_by>, InVertexPipe, PropertyFilterPipe<name,NOT_EQUAL,Garcia>]
==>null
```

Now, using the @back@ step, notice how @back(3)@ wraps 3 pipes prior to it. The name of the pipe in "Pipes":http://pipes.tinkerpop.com is @FutureFilterPipe@.

```text
gremlin> println g.v(89).outE('followed_by').inV.outE('sung_by').inV[[name:'Garcia']].back(3).name
[OutEdgesPipe<followed_by>, InVertexPipe, FutureFilterPipe<[OutEdgesPipe<sung_by>, InVertexPipe, PropertyFilterPipe<name,NOT_EQUAL,Garcia>]>, PropertyPipe<name>]
==>null
```
93 changes: 93 additions & 0 deletions doc/wiki/Basic-Graph-Traversals.textile
@@ -0,0 +1,93 @@
This section will present basic graph traversals by way of examples on the simple property graph diagrammed below.

!https://github.com/tinkerpop/gremlin/raw/master/doc/images/graph-example-1.jpg!

```text
gremlin> g = TinkerGraphFactory.createTinkerGraph()
==>tinkergraph[vertices:6 edges:6]
gremlin> v = g.v(1)
==>v[1]
```

The symbol @v@ denotes that the element is a vertex and @1@ denotes the elements unique identifier. To determine all of the outgoing edges from the vertex, the following statement suffices.

```text
gremlin> v.outE
==>e[7][1-knows->2]
==>e[9][1-created->3]
==>e[8][1-knows->4]
```

As a convenience, Gremlin prints the outgoing and incoming vertex identifiers along with the edge label. To acquire the vertices at the head of these edges (known as the incoming vertices), apply another step in the path.

```text
gremlin> v.outE.inV
==>v[2]
==>v[3]
==>v[4]
```

It is important to note that in Gremlin, vertices are adjacent to edges and edges are adjacent to vertices. The reason for this will become apparent later when making use of element properties in path expressions. The reserved terms for denoting adjacency selection are the steps @outE@, @inE@, @bothE@, @outV@, @inV@, and @bothV@ (see [[Gremlin Steps]]). The components of a property graph are diagrammed in the example sub-graph below.

!https://github.com/tinkerpop/gremlin/raw/master/doc/images/graph-model.jpg!

The process of traversing a graph, in this manner, can continue indefinitely (granted, if there are loops in the graph).

```text
gremlin> v.outE.inV.outE.inV
==>v[5]
==>v[3]
```

Moreover, it is possible to make use of Groovy's language to repeat patterns. For example, the previous example can be denoted as follows.

```text
gremlin> list = [v]
gremlin> for(i in 1..2)
list = list._().outE.inV.collect{it}
gremlin> list
==>v[5]
==>v[3]
```

This can also be done using the @loop@ step.

```text
gremlin> v.outE.inV.loop(2){it.loops < 3}
==>v[5]
==>v[3]
```

If the Gremlin graph data structure was only a directed graph, then outgoing/incoming edges and outgoing/incoming vertices would be the limits of what could be expressed. However, given that vertices and edges can have properties, it is possible to use these properties within a path expression. For example, suppose you want to know the name of vertex 1.

```text
gremlin> v = g.v(1)
==>v[1]
gremlin> v.name
==>marko
```

The @name@ construct denotes the property key @name@ and returns the value of that key. The first component of the path is vertex 1. Thus, the @name@ of vertex 1 is "marko." Another, more complex example that uses vertex and edge properties is to determine the @name@ of the vertices that vertex 1 @knows@ and that are older than 30 years of age, is expressed as such.

```text
gremlin> v.outE{it.label=='knows'}.inV{it.age > 30}.name
==>josh
```

In this expression, the @[ ]@ notation serves to filter results of previous step in the path. Thus, @v.outE@ is filtered to only those edges that have a @label@ of "knows." With respect to the diagrammed graph, this leaves only two edges. Next, the incoming vertices at the head of these two edges are determined and then filtered to only those whose @age@ property is greater than 30 (@T.gt@). Given the diagram, this only leaves vertex 4. In the final segment of the path expression, the @name@ of vertex 4 is selected and what is returned is "josh."

To conclude, let's do a more complicated graph traversal that uses backtracking and an in-line regular expression.

```text
gremlin> v.outE{it.label=='knows'}.inV{it.age > 21}.name._{it.matches('jo.{2}|JO.{2}')}.back(3).age
==>32
```

With the root vertex being vertex 1, this path expression returns the age of those vertices that vertex 1 knows, are older than 21, and whose names are 4 characters and start with a 'jo' or 'JO'. While contrived, it demonstrates using closures to call functions on properties as well as backtracking to a vertex previously visited.

This expression does the same thing without backtracking. Both are provided in order to demonstrate the many ways in which to express the same thing.

```text
gremlin> v.outE{it.label=='knows'}.inV{it.age > 21 & it.name.matches('jo.{2}|JO.{2}')}.age
==>32
```
94 changes: 94 additions & 0 deletions doc/wiki/Building-Gremlin-from-Source.textile
@@ -0,0 +1,94 @@
Gremlin releases come in steps (see [[Release Notes]]). In order to be up to data with the latest and greatest of Gremlin functionality and fixes, you can download the raw source and build it locally on your machine. This section of documentation will discuss how to build Gremlin from its source code.

# "Downloading the Gremlin Source Code":#download
** "Downloading using GIT":#git
** "Downloading ZIP or TAR":#ziptar
# "Building Gremlin with Maven":#maven
# "Running Gremlin":#run

h2(#download). Downloading the Gremlin Source Code

In order to build the latest version of Gremlin, you must download the source code to your local machine. "GITHub":http://github.com/ (the Gremlin source code repository host) provides two means of doing this. Through the "GIT":http://git-scm.com/ protocol or by downloading a ZIP or TAR archive of the source. The two methods are discussed in the following sub-sections.

h3(#git). Downloading using GIT

In order to get the latest source, you must have GIT installed on your computer. Once you have GIT installed, simply execute the following command in the directory you wish the Gremlin code to be.

bc. marko$ git clone http://github.com/tinkerpop/gremlin.git
Initialized empty Git repository in /tmp/gremlin/.git/

When this process completes, the source code is in the directory @gremlin/@. If you have already checked out the Gremlin source previously using GIT, then use @git pull@ to grab the latest changes.

h3(#ziptar). Downloading using ZIP or TAR

On the "master source page":http://github.com/tinkerpop/gremlin/tree/master, you can download the source code as a "ZIP":http://en.wikipedia.org/wiki/ZIP_%28file_format%29 archive or as a "TAR":http://en.wikipedia.org/wiki/Tar_%28file_format%29 archive. Once you have downloaded an archive, unpack it. Below is an example of unpacking the ZIP archive version of the Gremlin code base.

```text
marko$ unzip tinkerpop-gremlin-238e90b.zip
Archive: tinkerpop-gremlin-238e90b.zip
238e90bfb52be23dc4abf53344a9f8fa260d9299
creating: tinkerpop-gremlin-238e90b/
inflating: tinkerpop-gremlin-238e90b/README.textile
creating: tinkerpop-gremlin-238e90b/
creating: tinkerpop-gremlin-238e90b/doc/
creating: tinkerpop-gremlin-238e90b/doc/images/
inflating: tinkerpop-gremlin-238e90b/doc/images/co-followed_by-example.jpg
extracting: tinkerpop-gremlin-238e90b/doc/images/co-followed_by.graffle
extracting: tinkerpop-gremlin-238e90b/doc/images/dbpedia-logo.png
extracting: tinkerpop-gremlin-238e90b/doc/images/grammar-example-1.graffle
inflating: tinkerpop-gremlin-238e90b/doc/images/grammar-example-1.jpg
inflating: tinkerpop-gremlin-238e90b/doc/images/grammar-map-example-1.jpg
...
```

The source code will be in the newly created directory.

h2(#maven). Building Gremlin with Maven

Gremlin uses "Maven":http://maven.apache.org/ as its build manager. Before building the Gremlin source code, you must have Maven installed on your local machine. Once you have Maven installed, you can do build of the Gremlin source code by executing the command @mvn clean install@.

```text
marko$ mvn clean install
[INFO] Scanning for projects...
[INFO] ------------------------------------------------------------------------
[INFO] Building Gremlin: A Graph Traversal Language
[INFO] task-segment: [clean, install]
[INFO] ------------------------------------------------------------------------
[INFO] [clean:clean {execution: default-clean}]
[INFO] Deleting directory /Users/marko/software/gremlin/target
[INFO] [resources:resources {execution: default-resources}]
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 1 resource
[INFO] [compiler:compile {execution: default-compile}]
[INFO] Compiling 146 source files to /Users/marko/software/gremlin/target/classes
[INFO] [resources:testResources {execution: default-testResources}]
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /Users/marko/software/gremlin/src/test/resources
[INFO] [compiler:testCompile {execution: default-testCompile}]
[INFO] Compiling 90 source files to /Users/marko/software/gremlin/target/test-classes
[INFO] [surefire:test {execution: default-test}]
[INFO] Surefire report directory: /Users/marko/software/gremlin/target/surefire-reports
-------------------------------------------------------
T E S T S
-------------------------------------------------------
...
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESSFUL
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 51 seconds
[INFO] Finished at: Fri Aug 13 13:20:36 MDT 2010
[INFO] Final Memory: 44M/99M
[INFO] ------------------------------------------------------------------------
```

h2(#run). Running Gremlin

Once Gremlin is built, you will have created @target/gremlin-xx-standalone@ directory. The standalone has all the jar dependencies included. There are two shell scripts, one for Unix-based systems (@gremlin.sh@) and one for Windows systems (@gremlin.bat@). Finally, the Gremlin "JavaDoc":http://en.wikipedia.org/wiki/Javadoc can be found in @target/site/apidocs/@ if you execute the @mvn site@ command.

```text
marko$ ./gremlin.sh
\,,,/
(o o)
-----oOOo-(_)-oOOo-----
gremlin>
```
65 changes: 65 additions & 0 deletions doc/wiki/Counting-Objects.textile
@@ -0,0 +1,65 @@
The examples below use the toy graph diagrammed in [[Defining a Property Graph]] and is loaded as follows:

```text
gremlin> g = TinkerGraphFactory.createTinkerGraph()
==>tinkergraph[vertices:6 edges:6]
```

h2. Basic Object Counting

Given a path expression, how many objects are touched at the end of that expression? This problem is easily solved using @Pipe.count()@.

```text
gremlin> g.V.count()
==>6
gremlin> g.v(1).outE.count()
==>3
```

Here is another problem: How many objects are touched midway through an expression? This problem is solved using @foreach{}@. The example below increments the counter @c@ for all objects that pass through the first @outE@ step. Thus, while the end of this expression yields only 2 objects, in the middle of the expression (after @outE@), there are 3.

```text
gremlin> c = 0
==>0
gremlin> g.v(1).outE.foreach{c++}.inV.outE.inV
==>v[5]
==>v[3]
gremlin> c
==>3
```

h2. Using @groupCount@ Effectively

In many situations, it is desirable to know how many times a particular object has been traversed over. This calculation is made easy with @groupCount@. The generic use of @groupCount@ will maintain a @Map<Object,Long>@ that keeps track of how many times an element has been traversed over.

The first example demonstrates the use of basic counter updating. Every time an object goes through @groupCount@ its @Map@ counter is updated by @1@.

```text
gremlin> m = [:]
gremlin> g.V.outE.inV.groupCount(m)
==>v[2]
==>v[3]
==>v[4]
==>v[3]
==>v[5]
==>v[3]
gremlin> m
==>v[2]=1
==>v[3]=3
==>v[4]=1
==>v[5]=1
```

Next, it is possible to dynamically alter the counting method by providing a closure to @groupCount@. This is useful for simulating an energy diffusion over the graph. That is, where as more steps progress, the energy decays. Every time an object passes through @groupCount@, the provided closure is called with @it@ set to the current value of the object in the @Map@ and with the return of the closure being the new value for the @Map@. Note that @m@ is defined as @[:].withDefaultValue{0}@ so that a @null@ check is not required (a handy trick).

```text
gremlin> m = [:].withDefault{0}
gremlin> g.v(1).outE.inV.groupCount(m){it+1.0}.outE.inV.groupCount(m){it+0.5}
==>v[5]
==>v[3]
gremlin> m
==>v[2]=1.0
==>v[3]=1.5
==>v[4]=1.0
==>v[5]=0.5
```
29 changes: 29 additions & 0 deletions doc/wiki/Defining-a-More-Complex-Property-Graph.textile
@@ -0,0 +1,29 @@
The documentation up to this point has been using examples from a simple toy graph of 6 vertices and 6 edges. For this section, a more complicated graph structure is used in the examples. A clipped representation (i.e. low weighted edges removed) of this graph is diagrammed below. This graph is a representation of the American band, the "Grateful Dead":http://en.wikipedia.org/wiki/Grateful_Dead.

!https://github.com/tinkerpop/gremlin/raw/master/doc/images/graph-example-2.jpg!

More information about this data set can be found in the following article.

Rodriguez, M.A., Gintautas, V., Pepe, A., "A Grateful Dead Analysis: The Relationship Between Concert and Listening Behavior":http://arxiv.org/abs/0807.2466, First Monday, volume 14, number 1, University of Illinois at Chicago Library, January 2009.

```java
g = new TinkerGraph()
GraphMLReader.inputGraph(g, new FileInputStream('data/graph-example-2.xml'))
```

In the above Grateful Dead graph, there are vertices and there are edges. The vertices are broken into two sets: songs (e.g. "Dark Star":http://en.wikipedia.org/wiki/Dark_Star_%28song%29, "China Cat Sunflower":http://en.wikipedia.org/wiki/China_Cat_Sunflower) and artists (e.g. "Jerry Garcia":http://en.wikipedia.org/wiki/Jerry_Garcia, "Robert Hunter":http://en.wikipedia.org/wiki/Robert_Hunter_%28lyricist%29). The following itemization describes the properties associated with vertices and edges.

# vertices
** song vertices
**** type (string): always 'song' for song vertices.
**** name (string): the name of the song.
**** performances (integer): the number of times the song was played in concert.
**** song_type (string): whether the song is a 'cover' song or an 'original'.
** artist vertices
**** type (string): always 'artist' for artist vertices.
**** name (string): the name of the artist.
# edges
** followed_by (song -> song): if the tail song was followed by the head song in concert.
**** weight (integer): the number of times these two songs were paired in concert.
** sung_by (song -> artist): if the tail song was primarily sung by the head artist.
** written_by (song -> artist): if the tail song was written by the head artist.

0 comments on commit 29b9dd8

Please sign in to comment.