Cascalog and Hadoop Security

Stefan Hübner edited this page Jun 26, 2012 · 1 revision

Cascalog and Hadoop Security

If your cluster features Hadoop Security your queries may run into exceptions like this one:

org.apache.hadoop.ipc.RemoteException: token (...) can't be found in cache

That exception fails the second step in any multi-step Cascalog (or Cascading for that regard) query. Reason is, the Kerberos token gets cancelled after the first step succeeded.

A solution to this is to configure JobConf with mapreduce.job.complete.cancel.delegation.tokens set to false, like so:

    (with-job-conf {"mapreduce.job.complete.cancel.delegation.tokens" false}

Or add it to your job-conf.clj.

Also, if you happen to schedule your Cascalog jobs via Oozie, you may want to google for HADOOP_TOKEN_FILE_LOCATION and mapreduce.job.credentials.binary and set your jobconf accordingly.