Skip to content
Jeroen van Dijk edited this page Jun 14, 2013 · 1 revision

General FAQs

Amazon Elastic Mapreduce (EMR)

Hadoop Configuration

General FAQs

Amazon’s Elastic Mapreduce (EMR)

Q: What are common errors on EMR?

A: The following list shows the types of errors one can encounter:

  • ClassNotFoundExceptions due to the use of dashes (“-”) in the namespaces or functions that are part of name of main classes for Hadoop. Advice: don’t use dashes.
  • Classpath collisions with libraries that come with the Hadoop distribution. See Classpath precedence

Q: Why does my job fail when running on EMR, but not locally?

A: Generally speaking EMR is different from when running locally through leiningen. The steps to debug this are the following:

  • Is there error in this list?
  • Does your job run locally on the same version of Hadoop as EMR is using. See How to run job locally?
  • Does the error occur when you re-run the job? No, then wait until you see a pattern.
  • Are you using spot instances? If yes, have the instances been killed?
  • Ask the mailing list

Q: How to deploy to EMR?

A: Lemur is a tool build to easily launch Hadoop jobs to EMR

Hadoop Configuration

Q: How do I make sure my libraries are loaded before the libraries of the Hadoop distribution?

A: Certain Hadoop versions allow to control the classpath “precedence” through configuration options.

Hadoop version(s) Configuration option
0.20.203 – 0.20.205 mapreduce.user.classpath.first=true