No description or website provided.
HTML Python Scala Java Shell
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


Importing Chicago Crime Dataset into Neo4j

The following are instructions for importing the open Chicago crime data set into Neo4j.

export CSV_FILE=Crimes_-_2001_to_present.csv
tar -xvf spark-1.3.0-bin-hadoop1.tgz
  • Generate the CSV files that Neo4j import will use:


You should see the following at the beginning of the output:

$ ./
Spark assembly has been built with Hive, including Datanucleus jars on classpath
Using /Users/markneedham/projects/neo4j-spark-chicago/Crimes_-_2001_to_present.csv

The following files will be generated:

$ ls -alh tmp/*.csv
-rwxrwxrwx  1 markneedham  wheel   3.0K 14 Apr 06:48 /tmp/beats.csv
-rwxrwxrwx  1 markneedham  wheel   217M 14 Apr 06:48 /tmp/crimes.csv
-rwxrwxrwx  1 markneedham  wheel    84M 14 Apr 06:49 /tmp/crimesBeats.csv
-rwxrwxrwx  1 markneedham  wheel   120M 14 Apr 06:49 /tmp/crimesPrimaryTypes.csv
-rwxrwxrwx  1 markneedham  wheel   912B 14 Apr 06:48 /tmp/primaryTypes.csv

Now let’s clean up the CSV files to get rid of empty columns:

tar -xvf neo4j-enterprise-2.2.3-unix.tar.gz
  • Create a Neo4j graph from the CSV files:

./ [/path/to/csv/files] [/path/to/neo4j]


Using a different version of Spark

If you want to use a different version of Spark you’ll need to update the appropriate references in and build.sbt

I can’t generate the CSV files

If you forget to set the CSV_FILE environment variable or set it to a non-existent file you’ll see the following error message:

$ ./
Spark assembly has been built with Hive, including Datanucleus jars on classpath
Exception in thread "main" java.lang.RuntimeException: Cannot find CSV file [null]

Set the CSV_FILE environment variable and you’re good to go:

export CSV_FILE=Crimes_-_2001_to_present.csv

I made some changes to the project and want to build it

If you haven’t made any changes to the source code of the project there’s no need to build it - a JAR is checked in and referenced in You can, however, rebuild the JAR with the following command:

sbt clean package

I can’t build this project

sbt doesn’t seem to honour java_home so if you’re seeing the following error:

2015-04-14 13:57:50,116 INFO  [main] spark.SparkContext (Logging.scala:logInfo(59)) - Job finished: saveAsTextFile at GenerateCSVFiles.scala:51, took 8.292283862 s
Exception in thread "main" java.lang.UnsupportedClassVersionError: MyFileUtil : Unsupported major.minor version 52.0
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(

You should set the following environment variable:

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/

And then retry:

PATH=$JAVA_HOME/bin:$PATH sbt clean package