<a href="https://colab.research.google.com/github/jcestevezc/Cloudera/blob/master/Multimedia/Image%2C_audio_and_video_in_distributed_Environment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Storage in a distributed environment

## Option 1:

Hadoop provides us the facility to read/write binary files. So, practically anything which can be converted into bytes can be stored into HDFS(images, videos etc). To do that Hadoop provides something called asSequenceFiles. SequenceFile is a flat file consisting of binary key/value pairs. 

The SequenceFile provides a Writer, Reader and Sorter classes for writing, reading and sorting respectively. So, is necessary to convert the image/video file into a SeuenceFile and store it into the HDFS.

Here is small piece of code that will take an image file and convert it into a SequenceFile, where name of the file is the key and image content is the value :

In [None]:
public class ImageToSeq {
    public static void main(String args[]) throws Exception {

        Configuration confHadoop = new Configuration();     
        confHadoop.addResource(new Path("/hadoop/projects/hadoop-1.0.4/conf/core-site.xml"));
        confHadoop.addResource(new Path("/hadoop/projects/hadoop-1.0.4/conf/hdfs-site.xml"));   
        FileSystem fs = FileSystem.get(confHadoop);
        Path inPath = new Path("/mapin/1.png");
        Path outPath = new Path("/mapin/11.png");
        FSDataInputStream in = null;
        Text key = new Text();
        BytesWritable value = new BytesWritable();
        SequenceFile.Writer writer = null;
        try{
            in = fs.open(inPath);
            byte buffer[] = new byte[in.available()];
            in.read(buffer);
            writer = SequenceFile.createWriter(fs, confHadoop, outPath, key.getClass(),value.getClass());
            writer.append(new Text(inPath.getName()), new BytesWritable(buffer));
        }catch (Exception e) {
            System.out.println("Exception MESSAGES = "+e.getMessage());
        }
        finally {
            IOUtils.closeStream(writer);
            System.out.println("last line of the code....!!!!!!!!!!");
        }
    }
}

## Option 2:

If the intention is to just dump the files as it is, with this command is possible:

In [None]:
bin/hadoop fs -put /src_image_file /dst_image_file

# **How to Analyze Video Data Using Hadoop?**

![logo](https://static1.tothenew.com/blog/wp-content/uploads/2016/11/hipi.png)

HIPI is a library for Hadoop's MapReduce framework that provides an API for performing image processing tasks in a distributed computing environment.

## Operations Performed During Video Analytics using Hadoop

![hipi](https://static1.tothenew.com/blog/wp-content/uploads/2016/11/blog-on-video-analytics-1-.png)

1) Conversion of Video into Frames: JCodec is an open source library for video codecs and formats that is  implemented on Java.There are various tools for the digital transcoding of the video data into frames such as JCodec, Xuggler.

The following code is used to convert the video into a frame:

In [None]:
int frameNo = 785;
BufferedImage frame1 = FrameGrab.getFrame(new File("video.mp4"), frameNo);
ImageIO.write(frame1, "png", new File("abc.png"));

2) Put Frames in the HDFS: Putting frames or images in the HDFS using the put command is not possible. So to store the images or frames into the HDFS, first convert the frames as the stream of bytes and then store in HDFS. Hadoop provides us the facility to read/write binary files. So, practically anything which can be converted into bytes can be stored in HDFS.

3) Store images in an HIPI ImageBundle: After the process of transcoding the images, these are combined into a single large file so that it can easily be managed and analyzed. Using the add image method, we can add every image into the HIPI imageBundle. So HIPI ImageBundle can be considered as a bunch of Images. Each mapper will generate an HIPI ImageBundle, and the Reducer will merge all  bundles into a single large bundle. By storing images in this way now you are able to work on HIPI framework. Now MapReduce jobs are running on these image Bundles for image analysis.

4) Analysis Of Frame by HIPI Framework: HIPI is an image processing library designed to process a large number of images with the help of Hadoop MapReduce parallel programming framework. HIPI facilitates efficient and high-throughput image processing with MapReduce style parallel programs typically executed on a cluster. It provides a solution to store a large collection of images on the Hadoop Distributed File System (HDFS) and make them available for efficient distributed processing.

**References**

* https://community.cloudera.com/t5/Support-Questions/how-hadoop-stores-unstructured-data-like-image-audio-and/td-p/188234
* https://stackoverflow.com/questions/16546040/store-images-videos-into-hadoop-hdfs
* https://www.tothenew.com/blog/how-to-manage-and-analyze-video-data-using-hadoop/
* https://tanzu.vmware.com/content/blog/using-hadoop-mapreduce-for-distributed-video-transcoding