MapReduce Image Processing framework for Hadoop
Switch branches/tags
Nothing to show
Clone or download
Type Name Latest commit message Commit time
Failed to load latest commit information.
core_package change opencv ver to 2.4.11 Oct 13, 2015
includes_OpenCV move lbpcascade. fix opencv pom file. remove MaxSplitSize Oct 13, 2015
includes_OpenIMAJ new decomposed MIPr version Aug 13, 2015
.gitignore excluding config file Oct 7, 2015 fix clone url in README Aug 13, 2015
pom.xml reformat code. Oct 6, 2015

MIPr - MapReduce Image Processing framework for Hadoop

MIPr provides the ability to process images in Hadoop.

MIPr includes:

  • Writable Wrappers for images
  • InputFormat and OutputFormat for images
  • Several Jobs for image processing
  • OpenCV and OpenIMAJ support



  • Java 7 (preferably Oracle)
  • Maven 3.2.5


  1. Clone repository with MIPr sources

    git clone

  2. Build package by using Apache Maven

    To build full package with OpenIMAJ and OpenCV support run

    mvn package

    Notice that size of the package will be greater than separate build

    To build separate packages run

    mvn package -pl [desired_package] -am

    Where desired_package is one of the followings:

    • core_package
    • includes_OpenCV (includes core with OpenCV support)
    • includes_OpenIMAJ (includes core with OpenIMAJ support)
  3. It will build jar file ...-jar-with-dependencies.jar and place it in the target folder.


  1. Copy image files to HDFS:

    $ hadoop fs -copyFromLocal local_image_folder hdfs_image_folder

  2. Run test MIPr Job which converts color images to grayscale:

    $ hadoop jar mipr-core-0.1-jar-with-dependencies.jar experiments.Img2Gray hdfs_image_folder hdfs_output_folder

  3. Copy processed images back from HDFS to the local filesystem:

    $ hadoop fs -copyToLocal hdfs_output_folder local_output_folder

  4. Check that images were converted correctly.

Creating your own Hadoop job

To process images by your own way you need to create one class. For example, lets create job, which processes color images to grayscale by using OpenCV. For now, MIPr already has this class which placed in includes_OpenCV\src\main\java\experiments\Img2Gray_opencv.

  1. Create public class inherited from Configured superclass and Tool interface.

    public class Img2Gray_opencv extends Configured implements Tool{
        public static void main(String[] args) throws Exception {
            int res  = Img2Gray(), args);
  2. Create run method inside your class. Fill it regarding library you will use.

    public int run(String[] args) throws Exception {
        String input = args[0];
        String output = args[1];
        Job job = MiprMain.getOpenCVJobTemplate();
        Path outputPath = new Path(output);
        FileInputFormat.setInputPaths(job, input);
        FileOutputFormat.setOutputPath(job, outputPath);
        return job.waitForCompletion(true) ? 0 : 1;

    Most important configurations are:

    • job.setInputFormatClass([InputFormat].class)

      Where [InputFormat] one of the following:

      • Java 2D


      • OpenIMAJ


      • OpenCV



    • job.setOutputFormatClass([OutputFormat].class)

      Where [OutputFormat] is similar to [InputFormat]

    • job.setMapperClass([MapperClass].class)

      Where [MapperClass] is your implemented Mapper class which contains map-method.

    • job.setOutputKeyClass(NullWritable.class)

      In most cases of image processing Key class doesn't necessary. You can leave it by using special NullWritable hadoop-class which contains nothing.

    • job.setOutputValueClass([Value].class)

      [Value] depends on which library you are going to use.

      • Java 2D


      • OpenIMAJ


      • OpenCV


  3. Create Mapper class. Your class should extend OpenCVMapper superclass to make available usage of OpenCV library in parallel mode. Method map contains image processing algorithm.

        public static class Img2Gray_opencvMapper extends OpenCVMapper<NullWritable, MatImageWritable, NullWritable, MatImageWritable>{
            protected void map(NullWritable key, MatImageWritable value, Context context) throws IOException, InterruptedException {
                Mat image = value.getImage();
                Mat result = new Mat(image.height(), image.width(), CvType.CV_8UC3);
                if (image.type() == CvType.CV_8UC3) {
                    Imgproc.cvtColor(image, result, Imgproc.COLOR_RGB2GRAY);
                } else result = image;
                context.write(NullWritable.get(), new MatImageWritable(result, value.getFileName(), value.getFormat()));
  4. Return to running section and build package including your own hadoop-job.