Export feature classes from ArcMap to Hadoop, run MapReduce jobs and import result back into ArcMap as features classes.
A use case will be something like the following, a GeoEventProcessor is streaming data points into HDFS or into S3. A Geo-Data Scientist that is using ArcMap has a set of polygons that needs data aggregation from that streaming data. He can launch a Hadoop cluster from ArcMap, export the polygons, run a MapReduce job that points to the streaming data as input for spatial analysis. The result is joined back to the input polygons for symbolization and visualization.
GIS Tools for Hadoop
You must first git clone and compile the Esri Geometry API.
$ git clone https://github.com/Esri/geometry-api-java.git $ cd geometry-api-java $ mvn install
Compiling and packaging
Make sure to install arcobjects.jar in your local maven repo. You can typically find it in C:\Program Files (x86)\ArcGIS\Desktop10.1\java\lib.
$ mvn install:install-file -Dfile=arcobjects.jar -DgroupId=com.esri -DartifactId=arcobjects -Dversion=10.1 -Dpackaging=jar -DgeneratePom=true
clone and package:
$ mvn clean package
Installing the extension in ArcMap
Copy from the target folder the MRToolbox-1.1-SNAPSHOT.jar file and the libs folder into the C:\Program Files (x86)\ArcGIS\Desktop10.1\java\lib\ext folder.
Before starting ArcMap, you have to adjust the ArcGIS JVM Heap values. Run as administrator JavaConfigTool located in C:\Program Files (x86)\ArcGIS\Desktop10.1\bin
Check out this to see how to add a Toolbox and a Tool to ArcMap.
Start ArcMap. Create a toolbox named MRToolbox, and add to it the ExportToHDFSTool and the JobRunnerTool. You should have something like the following:
This GP tool exports a feature class from ArcMap into a Hadoop File System path in Esri JSON format.
Here is a sample hadoop.properties file content:
hadoop.job.ugi=root,root fs.default.name=hdfs\://ec2-xx-xx-xx-xx.compute-1.amazonaws.com\:8020/ mapred.job.tracker=ec2-xx-xx-xx-xx.compute-1.amazonaws.com\:8021 hadoop.socks.server=localhost\:6666 fs.s3n.awsAccessKeyId=my-access-key-id fs.s3n.awsSecretAccessKey=my-secret-access-key fs.s3.awsAccessKeyId=my-access-key-id fs.s3.awsSecretAccessKey=my-secret-access-key hadoop.rpc.socket.factory.class.default=org.apache.hadoop.net.SocksSocketFactory dfs.client.use.legacy.blockreader=true
This GP tool runs a map reduce job. It performs a spatial join between a very large set of points and a set of exported polygons. The result of the job is a table in ArcMap where each row has two fields. The first field is the polygon name identifier, and the second field is the number of points in that polygon.