Simple Single Project YARN Application :: Learn how to build a simple Spring YARN application
Java Shell
Permalink
Failed to load latest commit information.
complete
initial
test
.gitignore
.travis.yml
LICENSE.code.txt
LICENSE.writing.txt
README.adoc

README.adoc

tags projects
hadoop
yarn
boot
spring-hadoop

This guide walks you through the process of creating a Spring Hadoop YARN application.

What you’ll build

You’ll build a simple Hadoop YARN application with Spring Hadoop and Spring Boot. In other examples you may have seen a multi-project structure to be used but by all means this is not necessary and in this sample only a single project and jar file is created.

Set up the project

build.gradle

link:initial/build.gradle[]

settings.gradle

link:initial/settings.gradle[]

In the above gradle build file we simply create three different jars, each having classes for its specific role. These jars are then repackaged by Spring Boot’s gradle plugin to create an executable jar.

Create an Application

Here you create Application and HelloPojo classes.

src/main/java/hello/Application.java

link:complete/src/main/java/hello/Application.java[]

In the above Application, notice how we added the @ComponentScan annotation at the main class level and the @YarnComponent annotation on the inner HelloPojo class.

HelloPojo class is a simple POJO in a sense that it doesn’t extend any Spring YARN base classes. What we did in this class:

  • We added a class level @YarnComponent annotation.

  • We added a method level @OnContainerStart annotation

  • We @Autowired a Hadoop’s Configuration class

  • We added a method level @Profile annotation

@YarnComponent is a stereotype annotation, providing a Spring @Component annotation. This is automatically marking a class to be a candidate for having @YarnComponent functionality. We specifically use @Profile to mark bean to be created only if container profile is active. Having a @ComponentScan present in Application class will then instruct context to automatically create beans by classpath scanning.

Within this class we can use @OnContainerStart annotation to mark a public method with void return type and no arguments act as an entry point for some application code that needs to be executed on Hadoop.

To demonstrate that we actually have some real functionality in this class, we simply use Spring Hadoop’s @FsShell to list entries from the root of the HDFS file system. We needed to have Hadoop’s Configuration which is prepared for you so that you can just rely on autowiring for access to it.

The main() method uses Spring Boot’s SpringApplication.run() method to launch an application. What happens next depends on configuration and detected condition on YarnClient, YarnAppmaster or YarnContainer.

Create an Application Configuration

Create a new yaml configuration file for gs-yarn-basic-single-app project.

src/main/resources/application.yml

link:complete/src/main/resources/application.yml[]
Note
Pay attention to the yaml file format which expects correct indentation and no tab characters.

Final part for your application is its runtime configuration, which glues all the components together, which then can be executed as a Spring YARN application. This configuration act as source for Spring Boot’s @ConfigurationProperties and contains relevant configuration properties which cannot be auto-discovered or otherwise needs to have an option to be overwritten by an end user.

This way you can define your own defaults for your environment. Because these @ConfigurationProperties are resolved at runtime by Spring Boot, you even have an easy option to overwrite these properties either by using command-line options, environment variables or by providing additional configuration property files.

Run the Application

Now that you’ve successfully compiled and packaged your application, it’s time to do the fun part and execute it on Hadoop YARN.

To accomplish this, simply run your executable client jar from the projects root dirctory.

$ java -jar target/gs-yarn-basic-single-0.1.0.jar

To find Hadoop’s application logs, you need to do a simple find within the hadoop clusters configured userlogs directory.

$ find hadoop/logs/userlogs/ | grep std
hadoop/logs/userlogs/application_1395578417086_0001/container_1395578417086_0001_01_000001/Appmaster.stdout
hadoop/logs/userlogs/application_1395578417086_0001/container_1395578417086_0001_01_000001/Appmaster.stderr
hadoop/logs/userlogs/application_1395578417086_0001/container_1395578417086_0001_01_000002/Container.stdout
hadoop/logs/userlogs/application_1395578417086_0001/container_1395578417086_0001_01_000002/Container.stderr

Grep logging output from a HelloPojo class.

$ grep HelloPojo hadoop/logs/userlogs/application_1395578417086_0001/container_1395578417086_0001_01_000002/Container.stdout
[2014-03-23 12:42:05.763] boot - 17064  INFO [main] --- HelloPojo: Hello from HelloPojo
[2014-03-23 12:42:05.763] boot - 17064  INFO [main] --- HelloPojo: About to list from hdfs root content
[2014-03-23 12:42:06.745] boot - 17064  INFO [main] --- HelloPojo: FileStatus{path=hdfs://localhost:8020/; isDirectory=true; modification_time=1395397562421; access_time=0; owner=root;
group=supergroup; permission=rwxr-xr-x; isSymlink=false}
[2014-03-23 12:42:06.746] boot - 17064  INFO [main] --- HelloPojo:
FileStatus{path=hdfs://localhost:8020/app; isDirectory=true;
modification_time=1395501405412; access_time=0; owner=hadoop; group=supergroup; permission=rwxr-xr-x; isSymlink=false}

Summary

Congratulations! You’ve just developed a Spring YARN application!