Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark cannot determine task id #152

Closed
rbraley opened this Issue Feb 25, 2014 · 2 comments

Comments

Projects
None yet
2 participants
@rbraley
Copy link

rbraley commented Feb 25, 2014

using http://typesafe.artifactoryonline.com/typesafe/sonatype-snapshots/org/elasticsearch/elasticsearch-hadoop/1.3.0.BUILD-SNAPSHOT/elasticsearch-hadoop-1.3.0.BUILD-20140224.171205-318.jar
I get
org.apache.spark.SparkException: Job aborted: Task 0.0:0 failed 1 times (most recent failure: Exception failure: java.lang.IllegalArgumentException: Unable to determine task id - please report your distro/setting through the issue tracker)

The code is the same as shown in #148
This is related to #151 for spark integration

$ sbt run
[info] Loading project definition from /Users/rbraley/IdeaProjects/elasticsearch-spark-example/project
[info] Set current project to elasticsearch-spark-example (in build file:/Users/rbraley/IdeaProjects/elasticsearch-spark-example/)
[info] Running io.traintracks.elasticsearch.spark.example.SimpleApp
14/02/25 12:00:39 INFO slf4j.Slf4jLogger: Slf4jLogger started
14/02/25 12:00:39 INFO Remoting: Starting remoting
14/02/25 12:00:39 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@192.168.2.88:50171]
14/02/25 12:00:39 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@192.168.2.88:50171]
14/02/25 12:00:39 INFO spark.SparkEnv: Registering BlockManagerMaster
14/02/25 12:00:39 INFO storage.DiskBlockManager: Created local directory at /var/folders/rn/p2d7mh016b34qvm47jybmg380000gn/T/spark-local-20140225120039-5926
14/02/25 12:00:39 INFO storage.MemoryStore: MemoryStore started with capacity 890.9 MB.
14/02/25 12:00:39 INFO network.ConnectionManager: Bound socket to port 50172 with id = ConnectionManagerId(192.168.2.88,50172)
14/02/25 12:00:40 INFO storage.BlockManagerMaster: Trying to register BlockManager
14/02/25 12:00:40 INFO storage.BlockManagerMasterActor$BlockManagerInfo: Registering block manager 192.168.2.88:50172 with 890.9 MB RAM
14/02/25 12:00:40 INFO storage.BlockManagerMaster: Registered BlockManager
14/02/25 12:00:40 INFO spark.HttpServer: Starting HTTP Server
14/02/25 12:00:40 INFO server.Server: jetty-7.6.8.v20121106
14/02/25 12:00:40 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:50173
14/02/25 12:00:40 INFO broadcast.HttpBroadcast: Broadcast server started at http://192.168.2.88:50173
14/02/25 12:00:40 INFO spark.SparkEnv: Registering MapOutputTracker
14/02/25 12:00:40 INFO spark.HttpFileServer: HTTP File server directory is /var/folders/rn/p2d7mh016b34qvm47jybmg380000gn/T/spark-d9105b10-cb2a-4acc-9e43-58c947b7f5e5
14/02/25 12:00:40 INFO spark.HttpServer: Starting HTTP Server
14/02/25 12:00:40 INFO server.Server: jetty-7.6.8.v20121106
14/02/25 12:00:40 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:50174
14/02/25 12:00:40 INFO server.Server: jetty-7.6.8.v20121106
14/02/25 12:00:40 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/storage/rdd,null}
14/02/25 12:00:40 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/storage,null}
14/02/25 12:00:40 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/stages/stage,null}
14/02/25 12:00:40 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/stages/pool,null}
14/02/25 12:00:40 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/stages,null}
14/02/25 12:00:40 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/environment,null}
14/02/25 12:00:40 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/executors,null}
14/02/25 12:00:40 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/metrics/json,null}
14/02/25 12:00:40 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/static,null}
14/02/25 12:00:40 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/,null}
14/02/25 12:00:40 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
14/02/25 12:00:40 INFO ui.SparkUI: Started Spark Web UI at http://192.168.2.88:4040
2014-02-25 12:00:40.450 java[974:6b0f] Unable to load realm info from SCDynamicStore
14/02/25 12:00:40 INFO storage.MemoryStore: ensureFreeSpace(32969) called with curMem=0, maxMem=934163251
14/02/25 12:00:40 INFO storage.MemoryStore: Block broadcast_0 stored as values to memory (estimated size 32.2 KB, free 890.9 MB)
14/02/25 12:00:40 INFO mr.EsInputFormat: Discovered mapping {demo=[test=STRING]} for [demo/demo]
14/02/25 12:00:40 INFO mr.EsInputFormat: Created [5] shard-splits
14/02/25 12:00:40 INFO spark.SparkContext: Starting job: foreach at SimpleApp.scala:18
14/02/25 12:00:40 INFO scheduler.DAGScheduler: Got job 0 (foreach at SimpleApp.scala:18) with 5 output partitions (allowLocal=false)
14/02/25 12:00:40 INFO scheduler.DAGScheduler: Final stage: Stage 0 (foreach at SimpleApp.scala:18)
14/02/25 12:00:40 INFO scheduler.DAGScheduler: Parents of final stage: List()
14/02/25 12:00:40 INFO scheduler.DAGScheduler: Missing parents: List()
14/02/25 12:00:40 INFO scheduler.DAGScheduler: Submitting Stage 0 (HadoopRDD[0] at hadoopRDD at SimpleApp.scala:16), which has no missing parents
14/02/25 12:00:41 INFO scheduler.DAGScheduler: Submitting 5 missing tasks from Stage 0 (HadoopRDD[0] at hadoopRDD at SimpleApp.scala:16)
14/02/25 12:00:41 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 5 tasks
14/02/25 12:00:41 INFO scheduler.TaskSetManager: Starting task 0.0:0 as TID 0 on executor localhost: localhost (PROCESS_LOCAL)
14/02/25 12:00:41 INFO scheduler.TaskSetManager: Serialized task 0.0:0 as 2714 bytes in 7 ms
14/02/25 12:00:41 INFO executor.Executor: Running task ID 0
14/02/25 12:00:41 INFO storage.BlockManager: Found block broadcast_0 locally
14/02/25 12:00:41 INFO rdd.HadoopRDD: Input split: ShardInputSplit [node=[RYrePPv6SpGV-NOehFoEuw/Natchios, Elektra|127.0.0.1:9200],shard=0]
14/02/25 12:00:41 ERROR mr.EsInputFormat: Cannot determine task id - current properties are {fs.s3n.impl=org.apache.hadoop.fs.s3native.NativeS3FileSystem, mapred.task.cache.levels=2, hadoop.tmp.dir=/tmp/hadoop-${user.name}, hadoop.native.lib=true, map.sort.class=org.apache.hadoop.util.QuickSort, es.internal.mr.target.resource=demo/demo, ipc.client.idlethreshold=4000, mapred.system.dir=${hadoop.tmp.dir}/mapred/system, mapred.job.tracker.persist.jobstatus.hours=0, io.skip.checksum.errors=false, fs.default.name=file:///, mapred.cluster.reduce.memory.mb=-1, mapred.child.tmp=./tmp, fs.har.impl.disable.cache=true, es.internal.es.version=0.90.10, mapred.skip.reduce.max.skip.groups=0, mapred.heartbeats.in.second=100, mapred.tasktracker.dns.nameserver=default, io.sort.factor=10, mapred.task.timeout=600000, mapred.max.tracker.failures=4, hadoop.rpc.socket.factory.class.default=org.apache.hadoop.net.StandardSocketFactory, fs.hdfs.impl=org.apache.hadoop.hdfs.DistributedFileSystem, mapred.job.tracker.jobhistory.lru.cache.size=5, mapred.skip.map.auto.incr.proc.count=true, mapreduce.job.complete.cancel.delegation.tokens=true, io.mapfile.bloom.size=1048576, mapreduce.reduce.shuffle.connect.timeout=180000, mapred.jobtracker.blacklist.fault-timeout-window=180, tasktracker.http.threads=40, mapred.job.shuffle.merge.percent=0.66, fs.ftp.impl=org.apache.hadoop.fs.ftp.FTPFileSystem, io.bytes.per.checksum=512, mapred.output.compress=false, mapred.combine.recordsBeforeProgress=10000, mapred.healthChecker.script.timeout=600000, topology.node.switch.mapping.impl=org.apache.hadoop.net.ScriptBasedMapping, mapred.reduce.slowstart.completed.maps=0.05, mapred.reduce.max.attempts=4, es.ser.reader.value.class=org.elasticsearch.hadoop.mr.WritableValueReader, fs.ramfs.impl=org.apache.hadoop.fs.InMemoryFileSystem, mapred.skip.map.max.skip.records=0, mapred.cluster.map.memory.mb=-1, hadoop.security.group.mapping=org.apache.hadoop.security.ShellBasedUnixGroupsMapping, mapred.job.tracker.persist.jobstatus.dir=/jobtracker/jobsInfo, fs.s3.buffer.dir=${hadoop.tmp.dir}/s3, job.end.retry.attempts=0, fs.file.impl=org.apache.hadoop.fs.LocalFileSystem, mapred.local.dir.minspacestart=0, mapred.output.compression.type=RECORD, topology.script.number.args=100, io.mapfile.bloom.error.rate=0.005, mapred.cluster.max.reduce.memory.mb=-1, mapred.max.tracker.blacklists=4, mapred.task.profile.maps=0-2, mapred.userlog.retain.hours=24, mapred.job.tracker.persist.jobstatus.active=false, hadoop.security.authorization=false, local.cache.size=10737418240, mapred.map.tasks=2, mapred.min.split.size=0, mapred.child.java.opts=-Xmx200m, mapreduce.job.counters.limit=120, mapred.job.queue.name=default, mapred.job.tracker.retiredjobs.cache.size=1000, ipc.server.listen.queue.size=128, mapred.inmem.merge.threshold=1000, job.end.retry.interval=30000, mapreduce.tasktracker.outofband.heartbeat.damper=1000000, mapred.skip.attempts.to.start.skipping=2, fs.checkpoint.dir=${hadoop.tmp.dir}/dfs/namesecondary, mapred.reduce.tasks=1, mapred.merge.recordsBeforeProgress=10000, mapred.userlog.limit.kb=0, mapred.job.reduce.memory.mb=-1, webinterface.private.actions=false, hadoop.security.token.service.use_ip=true, io.sort.spill.percent=0.80, mapred.job.shuffle.input.buffer.percent=0.70, mapred.map.tasks.speculative.execution=true, hadoop.util.hash.type=murmur, mapred.map.max.attempts=4, mapreduce.job.acl-view-job= , mapred.job.tracker.handler.count=10, mapreduce.reduce.shuffle.read.timeout=180000, mapred.tasktracker.expiry.interval=600000, mapred.jobtracker.job.history.block.size=3145728, mapred.jobtracker.maxtasks.per.job=-1, keep.failed.task.files=false, ipc.client.tcpnodelay=false, mapred.task.profile.reduces=0-2, mapred.output.compression.codec=org.apache.hadoop.io.compress.DefaultCodec, io.map.index.skip=0, ipc.server.tcpnodelay=false, mapred.jobtracker.blacklist.fault-bucket-width=15, es.resource=demo/demo, mapred.job.map.memory.mb=-1, hadoop.logfile.size=10000000, mapred.reduce.tasks.speculative.execution=true, mapreduce.tasktracker.outofband.heartbeat=false, mapreduce.reduce.input.limit=-1, hadoop.security.authentication=simple, fs.checkpoint.period=3600, mapred.job.reuse.jvm.num.tasks=1, mapred.jobtracker.completeuserjobs.maximum=100, mapred.task.tracker.task-controller=org.apache.hadoop.mapred.DefaultTaskController, fs.s3.maxRetries=4, mapred.cluster.max.map.memory.mb=-1, mapreduce.reduce.shuffle.maxfetchfailures=10, mapreduce.job.acl-modify-job= , fs.hftp.impl=org.apache.hadoop.hdfs.HftpFileSystem, mapred.local.dir=${hadoop.tmp.dir}/mapred/local, fs.s3.sleepTimeSeconds=10, fs.trash.interval=0, mapred.submit.replication=10, fs.har.impl=org.apache.hadoop.fs.HarFileSystem, mapred.map.output.compression.codec=org.apache.hadoop.io.compress.DefaultCodec, mapred.tasktracker.dns.interface=default, mapred.job.tracker=local, io.seqfile.sorter.recordlimit=1000000, mapred.line.input.format.linespermap=1, mapred.jobtracker.taskScheduler=org.apache.hadoop.mapred.JobQueueTaskScheduler, fs.webhdfs.impl=org.apache.hadoop.hdfs.web.WebHdfsFileSystem, mapred.local.dir.minspacekill=0, io.sort.record.percent=0.05, fs.kfs.impl=org.apache.hadoop.fs.kfs.KosmosFileSystem, mapred.temp.dir=${hadoop.tmp.dir}/mapred/temp, mapred.tasktracker.reduce.tasks.maximum=2, fs.checkpoint.edits.dir=${fs.checkpoint.dir}, mapred.tasktracker.tasks.sleeptime-before-sigkill=5000, mapred.job.reduce.input.buffer.percent=0.0, mapred.tasktracker.indexcache.mb=10, es.internal.hosts=, mapreduce.job.split.metainfo.maxsize=10000000, hadoop.logfile.count=10, mapred.skip.reduce.auto.incr.proc.count=true, io.seqfile.compress.blocksize=1000000, fs.s3.block.size=67108864, mapred.tasktracker.taskmemorymanager.monitoring-interval=5000, mapred.acls.enabled=false, mapred.queue.default.state=RUNNING, mapreduce.jobtracker.staging.root.dir=${hadoop.tmp.dir}/mapred/staging, mapred.queue.names=default, fs.hsftp.impl=org.apache.hadoop.hdfs.HsftpFileSystem, mapred.task.tracker.http.address=0.0.0.0:50060, mapred.reduce.parallel.copies=5, io.seqfile.lazydecompress=true, io.sort.mb=100, ipc.client.connection.maxidletime=10000, mapred.task.tracker.report.address=127.0.0.1:0, mapred.compress.map.output=false, hadoop.security.uid.cache.secs=14400, mapred.healthChecker.interval=60000, ipc.client.kill.max=10, ipc.client.connect.max.retries=10, fs.s3.impl=org.apache.hadoop.fs.s3.S3FileSystem, mapred.user.jobconf.limit=5242880, mapred.job.tracker.http.address=0.0.0.0:50030, io.file.buffer.size=4096, mapred.jobtracker.restart.recover=false, io.serializations=org.apache.hadoop.io.serializer.WritableSerialization, mapred.task.profile=false, jobclient.output.filter=FAILED, mapred.tasktracker.map.tasks.maximum=2, io.compression.codecs=org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec, fs.checkpoint.size=67108864}
14/02/25 12:00:41 ERROR executor.Executor: Exception in task ID 0
java.lang.IllegalArgumentException: Unable to determine task id - please report your distro/setting through the issue tracker
    at org.elasticsearch.hadoop.util.Assert.notNull(Assert.java:38)
    at org.elasticsearch.hadoop.mr.HeartBeat.<init>(HeartBeat.java:57)
    at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.init(EsInputFormat.java:204)
    at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.<init>(EsInputFormat.java:167)
    at org.elasticsearch.hadoop.mr.EsInputFormat$WritableShardRecordReader.<init>(EsInputFormat.java:328)
    at org.elasticsearch.hadoop.mr.EsInputFormat.getRecordReader(EsInputFormat.java:449)
    at org.elasticsearch.hadoop.mr.EsInputFormat.getRecordReader(EsInputFormat.java:66)
    at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:156)
    at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:149)
    at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:64)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109)
    at org.apache.spark.scheduler.Task.run(Task.scala:53)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213)
    at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)
14/02/25 12:00:41 INFO scheduler.TaskSetManager: Starting task 0.0:1 as TID 1 on executor localhost: localhost (PROCESS_LOCAL)
14/02/25 12:00:41 INFO scheduler.TaskSetManager: Serialized task 0.0:1 as 2714 bytes in 0 ms
14/02/25 12:00:41 INFO executor.Executor: Running task ID 1
14/02/25 12:00:41 WARN scheduler.TaskSetManager: Lost TID 0 (task 0.0:0)
14/02/25 12:00:41 INFO storage.BlockManager: Found block broadcast_0 locally
14/02/25 12:00:41 INFO rdd.HadoopRDD: Input split: ShardInputSplit [node=[RYrePPv6SpGV-NOehFoEuw/Natchios, Elektra|127.0.0.1:9200],shard=1]
14/02/25 12:00:41 ERROR mr.EsInputFormat: Cannot determine task id - current properties are {fs.s3n.impl=org.apache.hadoop.fs.s3native.NativeS3FileSystem, mapred.task.cache.levels=2, hadoop.tmp.dir=/tmp/hadoop-${user.name}, hadoop.native.lib=true, map.sort.class=org.apache.hadoop.util.QuickSort, es.internal.mr.target.resource=demo/demo, ipc.client.idlethreshold=4000, mapred.system.dir=${hadoop.tmp.dir}/mapred/system, mapred.job.tracker.persist.jobstatus.hours=0, io.skip.checksum.errors=false, fs.default.name=file:///, mapred.cluster.reduce.memory.mb=-1, mapred.child.tmp=./tmp, fs.har.impl.disable.cache=true, es.internal.es.version=0.90.10, mapred.skip.reduce.max.skip.groups=0, mapred.heartbeats.in.second=100, mapred.tasktracker.dns.nameserver=default, io.sort.factor=10, mapred.task.timeout=600000, mapred.max.tracker.failures=4, hadoop.rpc.socket.factory.class.default=org.apache.hadoop.net.StandardSocketFactory, fs.hdfs.impl=org.apache.hadoop.hdfs.DistributedFileSystem, mapred.job.tracker.jobhistory.lru.cache.size=5, mapred.skip.map.auto.incr.proc.count=true, mapreduce.job.complete.cancel.delegation.tokens=true, io.mapfile.bloom.size=1048576, mapreduce.reduce.shuffle.connect.timeout=180000, mapred.jobtracker.blacklist.fault-timeout-window=180, tasktracker.http.threads=40, mapred.job.shuffle.merge.percent=0.66, fs.ftp.impl=org.apache.hadoop.fs.ftp.FTPFileSystem, io.bytes.per.checksum=512, mapred.output.compress=false, mapred.combine.recordsBeforeProgress=10000, mapred.healthChecker.script.timeout=600000, topology.node.switch.mapping.impl=org.apache.hadoop.net.ScriptBasedMapping, mapred.reduce.slowstart.completed.maps=0.05, mapred.reduce.max.attempts=4, es.ser.reader.value.class=org.elasticsearch.hadoop.mr.WritableValueReader, fs.ramfs.impl=org.apache.hadoop.fs.InMemoryFileSystem, mapred.skip.map.max.skip.records=0, mapred.cluster.map.memory.mb=-1, hadoop.security.group.mapping=org.apache.hadoop.security.ShellBasedUnixGroupsMapping, mapred.job.tracker.persist.jobstatus.dir=/jobtracker/jobsInfo, fs.s3.buffer.dir=${hadoop.tmp.dir}/s3, job.end.retry.attempts=0, fs.file.impl=org.apache.hadoop.fs.LocalFileSystem, mapred.local.dir.minspacestart=0, mapred.output.compression.type=RECORD, topology.script.number.args=100, io.mapfile.bloom.error.rate=0.005, mapred.cluster.max.reduce.memory.mb=-1, mapred.max.tracker.blacklists=4, mapred.task.profile.maps=0-2, mapred.userlog.retain.hours=24, mapred.job.tracker.persist.jobstatus.active=false, hadoop.security.authorization=false, local.cache.size=10737418240, mapred.map.tasks=2, mapred.min.split.size=0, mapred.child.java.opts=-Xmx200m, mapreduce.job.counters.limit=120, mapred.job.queue.name=default, mapred.job.tracker.retiredjobs.cache.size=1000, ipc.server.listen.queue.size=128, mapred.inmem.merge.threshold=1000, job.end.retry.interval=30000, mapreduce.tasktracker.outofband.heartbeat.damper=1000000, mapred.skip.attempts.to.start.skipping=2, fs.checkpoint.dir=${hadoop.tmp.dir}/dfs/namesecondary, mapred.reduce.tasks=1, mapred.merge.recordsBeforeProgress=10000, mapred.userlog.limit.kb=0, mapred.job.reduce.memory.mb=-1, webinterface.private.actions=false, hadoop.security.token.service.use_ip=true, io.sort.spill.percent=0.80, mapred.job.shuffle.input.buffer.percent=0.70, mapred.map.tasks.speculative.execution=true, hadoop.util.hash.type=murmur, mapred.map.max.attempts=4, mapreduce.job.acl-view-job= , mapred.job.tracker.handler.count=10, mapreduce.reduce.shuffle.read.timeout=180000, mapred.tasktracker.expiry.interval=600000, mapred.jobtracker.job.history.block.size=3145728, mapred.jobtracker.maxtasks.per.job=-1, keep.failed.task.files=false, ipc.client.tcpnodelay=false, mapred.task.profile.reduces=0-2, mapred.output.compression.codec=org.apache.hadoop.io.compress.DefaultCodec, io.map.index.skip=0, ipc.server.tcpnodelay=false, mapred.jobtracker.blacklist.fault-bucket-width=15, es.resource=demo/demo, mapred.job.map.memory.mb=-1, hadoop.logfile.size=10000000, mapred.reduce.tasks.speculative.execution=true, mapreduce.tasktracker.outofband.heartbeat=false, mapreduce.reduce.input.limit=-1, hadoop.security.authentication=simple, fs.checkpoint.period=3600, mapred.job.reuse.jvm.num.tasks=1, mapred.jobtracker.completeuserjobs.maximum=100, mapred.task.tracker.task-controller=org.apache.hadoop.mapred.DefaultTaskController, fs.s3.maxRetries=4, mapred.cluster.max.map.memory.mb=-1, mapreduce.reduce.shuffle.maxfetchfailures=10, mapreduce.job.acl-modify-job= , fs.hftp.impl=org.apache.hadoop.hdfs.HftpFileSystem, mapred.local.dir=${hadoop.tmp.dir}/mapred/local, fs.s3.sleepTimeSeconds=10, fs.trash.interval=0, mapred.submit.replication=10, fs.har.impl=org.apache.hadoop.fs.HarFileSystem, mapred.map.output.compression.codec=org.apache.hadoop.io.compress.DefaultCodec, mapred.tasktracker.dns.interface=default, mapred.job.tracker=local, io.seqfile.sorter.recordlimit=1000000, mapred.line.input.format.linespermap=1, mapred.jobtracker.taskScheduler=org.apache.hadoop.mapred.JobQueueTaskScheduler, fs.webhdfs.impl=org.apache.hadoop.hdfs.web.WebHdfsFileSystem, mapred.local.dir.minspacekill=0, io.sort.record.percent=0.05, fs.kfs.impl=org.apache.hadoop.fs.kfs.KosmosFileSystem, mapred.temp.dir=${hadoop.tmp.dir}/mapred/temp, mapred.tasktracker.reduce.tasks.maximum=2, fs.checkpoint.edits.dir=${fs.checkpoint.dir}, mapred.tasktracker.tasks.sleeptime-before-sigkill=5000, mapred.job.reduce.input.buffer.percent=0.0, mapred.tasktracker.indexcache.mb=10, es.internal.hosts=, mapreduce.job.split.metainfo.maxsize=10000000, hadoop.logfile.count=10, mapred.skip.reduce.auto.incr.proc.count=true, io.seqfile.compress.blocksize=1000000, fs.s3.block.size=67108864, mapred.tasktracker.taskmemorymanager.monitoring-interval=5000, mapred.acls.enabled=false, mapred.queue.default.state=RUNNING, mapreduce.jobtracker.staging.root.dir=${hadoop.tmp.dir}/mapred/staging, mapred.queue.names=default, fs.hsftp.impl=org.apache.hadoop.hdfs.HsftpFileSystem, mapred.task.tracker.http.address=0.0.0.0:50060, mapred.reduce.parallel.copies=5, io.seqfile.lazydecompress=true, io.sort.mb=100, ipc.client.connection.maxidletime=10000, mapred.task.tracker.report.address=127.0.0.1:0, mapred.compress.map.output=false, hadoop.security.uid.cache.secs=14400, mapred.healthChecker.interval=60000, ipc.client.kill.max=10, ipc.client.connect.max.retries=10, fs.s3.impl=org.apache.hadoop.fs.s3.S3FileSystem, mapred.user.jobconf.limit=5242880, mapred.job.tracker.http.address=0.0.0.0:50030, io.file.buffer.size=4096, mapred.jobtracker.restart.recover=false, io.serializations=org.apache.hadoop.io.serializer.WritableSerialization, mapred.task.profile=false, jobclient.output.filter=FAILED, mapred.tasktracker.map.tasks.maximum=2, io.compression.codecs=org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec, fs.checkpoint.size=67108864}
14/02/25 12:00:41 ERROR executor.Executor: Exception in task ID 1
java.lang.IllegalArgumentException: Unable to determine task id - please report your distro/setting through the issue tracker
    at org.elasticsearch.hadoop.util.Assert.notNull(Assert.java:38)
    at org.elasticsearch.hadoop.mr.HeartBeat.<init>(HeartBeat.java:57)
    at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.init(EsInputFormat.java:204)
    at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.<init>(EsInputFormat.java:167)
    at org.elasticsearch.hadoop.mr.EsInputFormat$WritableShardRecordReader.<init>(EsInputFormat.java:328)
    at org.elasticsearch.hadoop.mr.EsInputFormat.getRecordReader(EsInputFormat.java:449)
    at org.elasticsearch.hadoop.mr.EsInputFormat.getRecordReader(EsInputFormat.java:66)
    at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:156)
    at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:149)
    at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:64)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109)
    at org.apache.spark.scheduler.Task.run(Task.scala:53)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213)
    at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)
14/02/25 12:00:41 WARN scheduler.TaskSetManager: Loss was due to java.lang.IllegalArgumentException
java.lang.IllegalArgumentException: Unable to determine task id - please report your distro/setting through the issue tracker
    at org.elasticsearch.hadoop.util.Assert.notNull(Assert.java:38)
    at org.elasticsearch.hadoop.mr.HeartBeat.<init>(HeartBeat.java:57)
    at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.init(EsInputFormat.java:204)
    at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.<init>(EsInputFormat.java:167)
    at org.elasticsearch.hadoop.mr.EsInputFormat$WritableShardRecordReader.<init>(EsInputFormat.java:328)
    at org.elasticsearch.hadoop.mr.EsInputFormat.getRecordReader(EsInputFormat.java:449)
    at org.elasticsearch.hadoop.mr.EsInputFormat.getRecordReader(EsInputFormat.java:66)
    at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:156)
    at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:149)
    at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:64)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109)
    at org.apache.spark.scheduler.Task.run(Task.scala:53)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213)
    at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)
14/02/25 12:00:41 ERROR scheduler.TaskSetManager: Task 0.0:0 failed 1 times; aborting job
14/02/25 12:00:41 INFO scheduler.TaskSchedulerImpl: Remove TaskSet 0.0 from pool
14/02/25 12:00:41 INFO scheduler.TaskSchedulerImpl: Ignoring update with state RUNNING from TID 1 because its task set is gone
14/02/25 12:00:41 INFO scheduler.TaskSchedulerImpl: Ignoring update with state FAILED from TID 1 because its task set is gone
14/02/25 12:00:41 INFO scheduler.DAGScheduler: Failed to run foreach at SimpleApp.scala:18
[error] (run-main) org.apache.spark.SparkException: Job aborted: Task 0.0:0 failed 1 times (most recent failure: Exception failure: java.lang.IllegalArgumentException: Unable to determine task id - please report your distro/setting through the issue tracker)
org.apache.spark.SparkException: Job aborted: Task 0.0:0 failed 1 times (most recent failure: Exception failure: java.lang.IllegalArgumentException: Unable to determine task id - please report your distro/setting through the issue tracker)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1026)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
    at scala.Option.foreach(Option.scala:236)
    at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:619)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:207)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
    at akka.actor.ActorCell.invoke(ActorCell.scala:456)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
    at akka.dispatch.Mailbox.run(Mailbox.scala:219)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
[trace] Stack trace suppressed: run last compile:run for the full output.
14/02/25 12:00:41 INFO network.ConnectionManager: Selector thread was interrupted!
java.lang.RuntimeException: Nonzero exit code: 1
    at scala.sys.package$.error(package.scala:27)
[trace] Stack trace suppressed: run last compile:run for the full output.
[error] (compile:run) Nonzero exit code: 1
[error] Total time: 3 s, completed Feb 25, 2014 12:00:41 PM
rbraley at Ryans-MacBook-Pro in ~/IdeaProjects/elasticsearch-spark-example on master*
$ sbt package
[info] Loading project definition from /Users/rbraley/IdeaProjects/elasticsearch-spark-example/project
[info] Set current project to elasticsearch-spark-example (in build file:/Users/rbraley/IdeaProjects/elasticsearch-spark-example/)
[success] Total time: 0 s, completed Feb 25, 2014 12:01:11 PM
rbraley at Ryans-MacBook-Pro in ~/IdeaProjects/elasticsearch-spark-example on master*
$ sbt run
[info] Loading project definition from /Users/rbraley/IdeaProjects/elasticsearch-spark-example/project
[info] Set current project to elasticsearch-spark-example (in build file:/Users/rbraley/IdeaProjects/elasticsearch-spark-example/)
[info] Running io.traintracks.elasticsearch.spark.example.SimpleApp
14/02/25 12:01:17 INFO slf4j.Slf4jLogger: Slf4jLogger started
14/02/25 12:01:17 INFO Remoting: Starting remoting
14/02/25 12:01:17 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@192.168.2.88:50178]
14/02/25 12:01:17 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@192.168.2.88:50178]
14/02/25 12:01:17 INFO spark.SparkEnv: Registering BlockManagerMaster
14/02/25 12:01:17 INFO storage.DiskBlockManager: Created local directory at /var/folders/rn/p2d7mh016b34qvm47jybmg380000gn/T/spark-local-20140225120117-e711
14/02/25 12:01:17 INFO storage.MemoryStore: MemoryStore started with capacity 890.9 MB.
14/02/25 12:01:17 INFO network.ConnectionManager: Bound socket to port 50179 with id = ConnectionManagerId(192.168.2.88,50179)
14/02/25 12:01:17 INFO storage.BlockManagerMaster: Trying to register BlockManager
14/02/25 12:01:17 INFO storage.BlockManagerMasterActor$BlockManagerInfo: Registering block manager 192.168.2.88:50179 with 890.9 MB RAM
14/02/25 12:01:17 INFO storage.BlockManagerMaster: Registered BlockManager
14/02/25 12:01:17 INFO spark.HttpServer: Starting HTTP Server
14/02/25 12:01:18 INFO server.Server: jetty-7.6.8.v20121106
14/02/25 12:01:18 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:50180
14/02/25 12:01:18 INFO broadcast.HttpBroadcast: Broadcast server started at http://192.168.2.88:50180
14/02/25 12:01:18 INFO spark.SparkEnv: Registering MapOutputTracker
14/02/25 12:01:18 INFO spark.HttpFileServer: HTTP File server directory is /var/folders/rn/p2d7mh016b34qvm47jybmg380000gn/T/spark-f407ba7c-5455-476d-8cfe-250296e45668
14/02/25 12:01:18 INFO spark.HttpServer: Starting HTTP Server
14/02/25 12:01:18 INFO server.Server: jetty-7.6.8.v20121106
14/02/25 12:01:18 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:50181
14/02/25 12:01:18 INFO server.Server: jetty-7.6.8.v20121106
14/02/25 12:01:18 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/storage/rdd,null}
14/02/25 12:01:18 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/storage,null}
14/02/25 12:01:18 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/stages/stage,null}
14/02/25 12:01:18 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/stages/pool,null}
14/02/25 12:01:18 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/stages,null}
14/02/25 12:01:18 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/environment,null}
14/02/25 12:01:18 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/executors,null}
14/02/25 12:01:18 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/metrics/json,null}
14/02/25 12:01:18 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/static,null}
14/02/25 12:01:18 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/,null}
14/02/25 12:01:18 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
14/02/25 12:01:18 INFO ui.SparkUI: Started Spark Web UI at http://192.168.2.88:4040
2014-02-25 12:01:18.391 java[1012:690f] Unable to load realm info from SCDynamicStore
14/02/25 12:01:18 INFO storage.MemoryStore: ensureFreeSpace(32969) called with curMem=0, maxMem=934163251
14/02/25 12:01:18 INFO storage.MemoryStore: Block broadcast_0 stored as values to memory (estimated size 32.2 KB, free 890.9 MB)
14/02/25 12:01:18 INFO mr.EsInputFormat: Discovered mapping {demo=[test=STRING]} for [demo/demo]
14/02/25 12:01:18 INFO mr.EsInputFormat: Created [5] shard-splits
14/02/25 12:01:18 INFO spark.SparkContext: Starting job: foreach at SimpleApp.scala:18
14/02/25 12:01:18 INFO scheduler.DAGScheduler: Got job 0 (foreach at SimpleApp.scala:18) with 5 output partitions (allowLocal=false)
14/02/25 12:01:18 INFO scheduler.DAGScheduler: Final stage: Stage 0 (foreach at SimpleApp.scala:18)
14/02/25 12:01:18 INFO scheduler.DAGScheduler: Parents of final stage: List()
14/02/25 12:01:18 INFO scheduler.DAGScheduler: Missing parents: List()
14/02/25 12:01:18 INFO scheduler.DAGScheduler: Submitting Stage 0 (HadoopRDD[0] at hadoopRDD at SimpleApp.scala:16), which has no missing parents
14/02/25 12:01:18 INFO scheduler.DAGScheduler: Submitting 5 missing tasks from Stage 0 (HadoopRDD[0] at hadoopRDD at SimpleApp.scala:16)
14/02/25 12:01:18 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 5 tasks
14/02/25 12:01:18 INFO scheduler.TaskSetManager: Starting task 0.0:0 as TID 0 on executor localhost: localhost (PROCESS_LOCAL)
14/02/25 12:01:18 INFO scheduler.TaskSetManager: Serialized task 0.0:0 as 2714 bytes in 7 ms
14/02/25 12:01:18 INFO executor.Executor: Running task ID 0
14/02/25 12:01:18 INFO storage.BlockManager: Found block broadcast_0 locally
14/02/25 12:01:18 INFO rdd.HadoopRDD: Input split: ShardInputSplit [node=[RYrePPv6SpGV-NOehFoEuw/Natchios, Elektra|127.0.0.1:9200],shard=0]
14/02/25 12:01:19 ERROR mr.EsInputFormat: Cannot determine task id - current properties are {fs.s3n.impl=org.apache.hadoop.fs.s3native.NativeS3FileSystem, mapred.task.cache.levels=2, hadoop.tmp.dir=/tmp/hadoop-${user.name}, hadoop.native.lib=true, map.sort.class=org.apache.hadoop.util.QuickSort, es.internal.mr.target.resource=demo/demo, ipc.client.idlethreshold=4000, mapred.system.dir=${hadoop.tmp.dir}/mapred/system, mapred.job.tracker.persist.jobstatus.hours=0, io.skip.checksum.errors=false, fs.default.name=file:///, mapred.cluster.reduce.memory.mb=-1, mapred.child.tmp=./tmp, fs.har.impl.disable.cache=true, es.internal.es.version=0.90.10, mapred.skip.reduce.max.skip.groups=0, mapred.heartbeats.in.second=100, mapred.tasktracker.dns.nameserver=default, io.sort.factor=10, mapred.task.timeout=600000, mapred.max.tracker.failures=4, hadoop.rpc.socket.factory.class.default=org.apache.hadoop.net.StandardSocketFactory, fs.hdfs.impl=org.apache.hadoop.hdfs.DistributedFileSystem, mapred.job.tracker.jobhistory.lru.cache.size=5, mapred.skip.map.auto.incr.proc.count=true, mapreduce.job.complete.cancel.delegation.tokens=true, io.mapfile.bloom.size=1048576, mapreduce.reduce.shuffle.connect.timeout=180000, mapred.jobtracker.blacklist.fault-timeout-window=180, tasktracker.http.threads=40, mapred.job.shuffle.merge.percent=0.66, fs.ftp.impl=org.apache.hadoop.fs.ftp.FTPFileSystem, io.bytes.per.checksum=512, mapred.output.compress=false, mapred.combine.recordsBeforeProgress=10000, mapred.healthChecker.script.timeout=600000, topology.node.switch.mapping.impl=org.apache.hadoop.net.ScriptBasedMapping, mapred.reduce.slowstart.completed.maps=0.05, mapred.reduce.max.attempts=4, es.ser.reader.value.class=org.elasticsearch.hadoop.mr.WritableValueReader, fs.ramfs.impl=org.apache.hadoop.fs.InMemoryFileSystem, mapred.skip.map.max.skip.records=0, mapred.cluster.map.memory.mb=-1, hadoop.security.group.mapping=org.apache.hadoop.security.ShellBasedUnixGroupsMapping, mapred.job.tracker.persist.jobstatus.dir=/jobtracker/jobsInfo, fs.s3.buffer.dir=${hadoop.tmp.dir}/s3, job.end.retry.attempts=0, fs.file.impl=org.apache.hadoop.fs.LocalFileSystem, mapred.local.dir.minspacestart=0, mapred.output.compression.type=RECORD, topology.script.number.args=100, io.mapfile.bloom.error.rate=0.005, mapred.cluster.max.reduce.memory.mb=-1, mapred.max.tracker.blacklists=4, mapred.task.profile.maps=0-2, mapred.userlog.retain.hours=24, mapred.job.tracker.persist.jobstatus.active=false, hadoop.security.authorization=false, local.cache.size=10737418240, mapred.map.tasks=2, mapred.min.split.size=0, mapred.child.java.opts=-Xmx200m, mapreduce.job.counters.limit=120, mapred.job.queue.name=default, mapred.job.tracker.retiredjobs.cache.size=1000, ipc.server.listen.queue.size=128, mapred.inmem.merge.threshold=1000, job.end.retry.interval=30000, mapreduce.tasktracker.outofband.heartbeat.damper=1000000, mapred.skip.attempts.to.start.skipping=2, fs.checkpoint.dir=${hadoop.tmp.dir}/dfs/namesecondary, mapred.reduce.tasks=1, mapred.merge.recordsBeforeProgress=10000, mapred.userlog.limit.kb=0, mapred.job.reduce.memory.mb=-1, webinterface.private.actions=false, hadoop.security.token.service.use_ip=true, io.sort.spill.percent=0.80, mapred.job.shuffle.input.buffer.percent=0.70, mapred.map.tasks.speculative.execution=true, hadoop.util.hash.type=murmur, mapred.map.max.attempts=4, mapreduce.job.acl-view-job= , mapred.job.tracker.handler.count=10, mapreduce.reduce.shuffle.read.timeout=180000, mapred.tasktracker.expiry.interval=600000, mapred.jobtracker.job.history.block.size=3145728, mapred.jobtracker.maxtasks.per.job=-1, keep.failed.task.files=false, ipc.client.tcpnodelay=false, mapred.task.profile.reduces=0-2, mapred.output.compression.codec=org.apache.hadoop.io.compress.DefaultCodec, io.map.index.skip=0, ipc.server.tcpnodelay=false, mapred.jobtracker.blacklist.fault-bucket-width=15, es.resource=demo/demo, mapred.job.map.memory.mb=-1, hadoop.logfile.size=10000000, mapred.reduce.tasks.speculative.execution=true, mapreduce.tasktracker.outofband.heartbeat=false, mapreduce.reduce.input.limit=-1, hadoop.security.authentication=simple, fs.checkpoint.period=3600, mapred.job.reuse.jvm.num.tasks=1, mapred.jobtracker.completeuserjobs.maximum=100, mapred.task.tracker.task-controller=org.apache.hadoop.mapred.DefaultTaskController, fs.s3.maxRetries=4, mapred.cluster.max.map.memory.mb=-1, mapreduce.reduce.shuffle.maxfetchfailures=10, mapreduce.job.acl-modify-job= , fs.hftp.impl=org.apache.hadoop.hdfs.HftpFileSystem, mapred.local.dir=${hadoop.tmp.dir}/mapred/local, fs.s3.sleepTimeSeconds=10, fs.trash.interval=0, mapred.submit.replication=10, fs.har.impl=org.apache.hadoop.fs.HarFileSystem, mapred.map.output.compression.codec=org.apache.hadoop.io.compress.DefaultCodec, mapred.tasktracker.dns.interface=default, mapred.job.tracker=local, io.seqfile.sorter.recordlimit=1000000, mapred.line.input.format.linespermap=1, mapred.jobtracker.taskScheduler=org.apache.hadoop.mapred.JobQueueTaskScheduler, fs.webhdfs.impl=org.apache.hadoop.hdfs.web.WebHdfsFileSystem, mapred.local.dir.minspacekill=0, io.sort.record.percent=0.05, fs.kfs.impl=org.apache.hadoop.fs.kfs.KosmosFileSystem, mapred.temp.dir=${hadoop.tmp.dir}/mapred/temp, mapred.tasktracker.reduce.tasks.maximum=2, fs.checkpoint.edits.dir=${fs.checkpoint.dir}, mapred.tasktracker.tasks.sleeptime-before-sigkill=5000, mapred.job.reduce.input.buffer.percent=0.0, mapred.tasktracker.indexcache.mb=10, es.internal.hosts=, mapreduce.job.split.metainfo.maxsize=10000000, hadoop.logfile.count=10, mapred.skip.reduce.auto.incr.proc.count=true, io.seqfile.compress.blocksize=1000000, fs.s3.block.size=67108864, mapred.tasktracker.taskmemorymanager.monitoring-interval=5000, mapred.acls.enabled=false, mapred.queue.default.state=RUNNING, mapreduce.jobtracker.staging.root.dir=${hadoop.tmp.dir}/mapred/staging, mapred.queue.names=default, fs.hsftp.impl=org.apache.hadoop.hdfs.HsftpFileSystem, mapred.task.tracker.http.address=0.0.0.0:50060, mapred.reduce.parallel.copies=5, io.seqfile.lazydecompress=true, io.sort.mb=100, ipc.client.connection.maxidletime=10000, mapred.task.tracker.report.address=127.0.0.1:0, mapred.compress.map.output=false, hadoop.security.uid.cache.secs=14400, mapred.healthChecker.interval=60000, ipc.client.kill.max=10, ipc.client.connect.max.retries=10, fs.s3.impl=org.apache.hadoop.fs.s3.S3FileSystem, mapred.user.jobconf.limit=5242880, mapred.job.tracker.http.address=0.0.0.0:50030, io.file.buffer.size=4096, mapred.jobtracker.restart.recover=false, io.serializations=org.apache.hadoop.io.serializer.WritableSerialization, mapred.task.profile=false, jobclient.output.filter=FAILED, mapred.tasktracker.map.tasks.maximum=2, io.compression.codecs=org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec, fs.checkpoint.size=67108864}
14/02/25 12:01:19 ERROR executor.Executor: Exception in task ID 0
java.lang.IllegalArgumentException: Unable to determine task id - please report your distro/setting through the issue tracker
    at org.elasticsearch.hadoop.util.Assert.notNull(Assert.java:38)
    at org.elasticsearch.hadoop.mr.HeartBeat.<init>(HeartBeat.java:57)
    at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.init(EsInputFormat.java:204)
    at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.<init>(EsInputFormat.java:167)
    at org.elasticsearch.hadoop.mr.EsInputFormat$WritableShardRecordReader.<init>(EsInputFormat.java:328)
    at org.elasticsearch.hadoop.mr.EsInputFormat.getRecordReader(EsInputFormat.java:449)
    at org.elasticsearch.hadoop.mr.EsInputFormat.getRecordReader(EsInputFormat.java:66)
    at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:156)
    at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:149)
    at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:64)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109)
    at org.apache.spark.scheduler.Task.run(Task.scala:53)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213)
    at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)
14/02/25 12:01:19 INFO scheduler.TaskSetManager: Starting task 0.0:1 as TID 1 on executor localhost: localhost (PROCESS_LOCAL)
14/02/25 12:01:19 INFO scheduler.TaskSetManager: Serialized task 0.0:1 as 2714 bytes in 0 ms
14/02/25 12:01:19 INFO executor.Executor: Running task ID 1
14/02/25 12:01:19 WARN scheduler.TaskSetManager: Lost TID 0 (task 0.0:0)
14/02/25 12:01:19 INFO storage.BlockManager: Found block broadcast_0 locally
14/02/25 12:01:19 INFO rdd.HadoopRDD: Input split: ShardInputSplit [node=[RYrePPv6SpGV-NOehFoEuw/Natchios, Elektra|127.0.0.1:9200],shard=1]
14/02/25 12:01:19 ERROR mr.EsInputFormat: Cannot determine task id - current properties are {fs.s3n.impl=org.apache.hadoop.fs.s3native.NativeS3FileSystem, mapred.task.cache.levels=2, hadoop.tmp.dir=/tmp/hadoop-${user.name}, hadoop.native.lib=true, map.sort.class=org.apache.hadoop.util.QuickSort, es.internal.mr.target.resource=demo/demo, ipc.client.idlethreshold=4000, mapred.system.dir=${hadoop.tmp.dir}/mapred/system, mapred.job.tracker.persist.jobstatus.hours=0, io.skip.checksum.errors=false, fs.default.name=file:///, mapred.cluster.reduce.memory.mb=-1, mapred.child.tmp=./tmp, fs.har.impl.disable.cache=true, es.internal.es.version=0.90.10, mapred.skip.reduce.max.skip.groups=0, mapred.heartbeats.in.second=100, mapred.tasktracker.dns.nameserver=default, io.sort.factor=10, mapred.task.timeout=600000, mapred.max.tracker.failures=4, hadoop.rpc.socket.factory.class.default=org.apache.hadoop.net.StandardSocketFactory, fs.hdfs.impl=org.apache.hadoop.hdfs.DistributedFileSystem, mapred.job.tracker.jobhistory.lru.cache.size=5, mapred.skip.map.auto.incr.proc.count=true, mapreduce.job.complete.cancel.delegation.tokens=true, io.mapfile.bloom.size=1048576, mapreduce.reduce.shuffle.connect.timeout=180000, mapred.jobtracker.blacklist.fault-timeout-window=180, tasktracker.http.threads=40, mapred.job.shuffle.merge.percent=0.66, fs.ftp.impl=org.apache.hadoop.fs.ftp.FTPFileSystem, io.bytes.per.checksum=512, mapred.output.compress=false, mapred.combine.recordsBeforeProgress=10000, mapred.healthChecker.script.timeout=600000, topology.node.switch.mapping.impl=org.apache.hadoop.net.ScriptBasedMapping, mapred.reduce.slowstart.completed.maps=0.05, mapred.reduce.max.attempts=4, es.ser.reader.value.class=org.elasticsearch.hadoop.mr.WritableValueReader, fs.ramfs.impl=org.apache.hadoop.fs.InMemoryFileSystem, mapred.skip.map.max.skip.records=0, mapred.cluster.map.memory.mb=-1, hadoop.security.group.mapping=org.apache.hadoop.security.ShellBasedUnixGroupsMapping, mapred.job.tracker.persist.jobstatus.dir=/jobtracker/jobsInfo, fs.s3.buffer.dir=${hadoop.tmp.dir}/s3, job.end.retry.attempts=0, fs.file.impl=org.apache.hadoop.fs.LocalFileSystem, mapred.local.dir.minspacestart=0, mapred.output.compression.type=RECORD, topology.script.number.args=100, io.mapfile.bloom.error.rate=0.005, mapred.cluster.max.reduce.memory.mb=-1, mapred.max.tracker.blacklists=4, mapred.task.profile.maps=0-2, mapred.userlog.retain.hours=24, mapred.job.tracker.persist.jobstatus.active=false, hadoop.security.authorization=false, local.cache.size=10737418240, mapred.map.tasks=2, mapred.min.split.size=0, mapred.child.java.opts=-Xmx200m, mapreduce.job.counters.limit=120, mapred.job.queue.name=default, mapred.job.tracker.retiredjobs.cache.size=1000, ipc.server.listen.queue.size=128, mapred.inmem.merge.threshold=1000, job.end.retry.interval=30000, mapreduce.tasktracker.outofband.heartbeat.damper=1000000, mapred.skip.attempts.to.start.skipping=2, fs.checkpoint.dir=${hadoop.tmp.dir}/dfs/namesecondary, mapred.reduce.tasks=1, mapred.merge.recordsBeforeProgress=10000, mapred.userlog.limit.kb=0, mapred.job.reduce.memory.mb=-1, webinterface.private.actions=false, hadoop.security.token.service.use_ip=true, io.sort.spill.percent=0.80, mapred.job.shuffle.input.buffer.percent=0.70, mapred.map.tasks.speculative.execution=true, hadoop.util.hash.type=murmur, mapred.map.max.attempts=4, mapreduce.job.acl-view-job= , mapred.job.tracker.handler.count=10, mapreduce.reduce.shuffle.read.timeout=180000, mapred.tasktracker.expiry.interval=600000, mapred.jobtracker.job.history.block.size=3145728, mapred.jobtracker.maxtasks.per.job=-1, keep.failed.task.files=false, ipc.client.tcpnodelay=false, mapred.task.profile.reduces=0-2, mapred.output.compression.codec=org.apache.hadoop.io.compress.DefaultCodec, io.map.index.skip=0, ipc.server.tcpnodelay=false, mapred.jobtracker.blacklist.fault-bucket-width=15, es.resource=demo/demo, mapred.job.map.memory.mb=-1, hadoop.logfile.size=10000000, mapred.reduce.tasks.speculative.execution=true, mapreduce.tasktracker.outofband.heartbeat=false, mapreduce.reduce.input.limit=-1, hadoop.security.authentication=simple, fs.checkpoint.period=3600, mapred.job.reuse.jvm.num.tasks=1, mapred.jobtracker.completeuserjobs.maximum=100, mapred.task.tracker.task-controller=org.apache.hadoop.mapred.DefaultTaskController, fs.s3.maxRetries=4, mapred.cluster.max.map.memory.mb=-1, mapreduce.reduce.shuffle.maxfetchfailures=10, mapreduce.job.acl-modify-job= , fs.hftp.impl=org.apache.hadoop.hdfs.HftpFileSystem, mapred.local.dir=${hadoop.tmp.dir}/mapred/local, fs.s3.sleepTimeSeconds=10, fs.trash.interval=0, mapred.submit.replication=10, fs.har.impl=org.apache.hadoop.fs.HarFileSystem, mapred.map.output.compression.codec=org.apache.hadoop.io.compress.DefaultCodec, mapred.tasktracker.dns.interface=default, mapred.job.tracker=local, io.seqfile.sorter.recordlimit=1000000, mapred.line.input.format.linespermap=1, mapred.jobtracker.taskScheduler=org.apache.hadoop.mapred.JobQueueTaskScheduler, fs.webhdfs.impl=org.apache.hadoop.hdfs.web.WebHdfsFileSystem, mapred.local.dir.minspacekill=0, io.sort.record.percent=0.05, fs.kfs.impl=org.apache.hadoop.fs.kfs.KosmosFileSystem, mapred.temp.dir=${hadoop.tmp.dir}/mapred/temp, mapred.tasktracker.reduce.tasks.maximum=2, fs.checkpoint.edits.dir=${fs.checkpoint.dir}, mapred.tasktracker.tasks.sleeptime-before-sigkill=5000, mapred.job.reduce.input.buffer.percent=0.0, mapred.tasktracker.indexcache.mb=10, es.internal.hosts=, mapreduce.job.split.metainfo.maxsize=10000000, hadoop.logfile.count=10, mapred.skip.reduce.auto.incr.proc.count=true, io.seqfile.compress.blocksize=1000000, fs.s3.block.size=67108864, mapred.tasktracker.taskmemorymanager.monitoring-interval=5000, mapred.acls.enabled=false, mapred.queue.default.state=RUNNING, mapreduce.jobtracker.staging.root.dir=${hadoop.tmp.dir}/mapred/staging, mapred.queue.names=default, fs.hsftp.impl=org.apache.hadoop.hdfs.HsftpFileSystem, mapred.task.tracker.http.address=0.0.0.0:50060, mapred.reduce.parallel.copies=5, io.seqfile.lazydecompress=true, io.sort.mb=100, ipc.client.connection.maxidletime=10000, mapred.task.tracker.report.address=127.0.0.1:0, mapred.compress.map.output=false, hadoop.security.uid.cache.secs=14400, mapred.healthChecker.interval=60000, ipc.client.kill.max=10, ipc.client.connect.max.retries=10, fs.s3.impl=org.apache.hadoop.fs.s3.S3FileSystem, mapred.user.jobconf.limit=5242880, mapred.job.tracker.http.address=0.0.0.0:50030, io.file.buffer.size=4096, mapred.jobtracker.restart.recover=false, io.serializations=org.apache.hadoop.io.serializer.WritableSerialization, mapred.task.profile=false, jobclient.output.filter=FAILED, mapred.tasktracker.map.tasks.maximum=2, io.compression.codecs=org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec, fs.checkpoint.size=67108864}
14/02/25 12:01:19 ERROR executor.Executor: Exception in task ID 1
java.lang.IllegalArgumentException: Unable to determine task id - please report your distro/setting through the issue tracker
    at org.elasticsearch.hadoop.util.Assert.notNull(Assert.java:38)
    at org.elasticsearch.hadoop.mr.HeartBeat.<init>(HeartBeat.java:57)
    at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.init(EsInputFormat.java:204)
    at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.<init>(EsInputFormat.java:167)
    at org.elasticsearch.hadoop.mr.EsInputFormat$WritableShardRecordReader.<init>(EsInputFormat.java:328)
    at org.elasticsearch.hadoop.mr.EsInputFormat.getRecordReader(EsInputFormat.java:449)
    at org.elasticsearch.hadoop.mr.EsInputFormat.getRecordReader(EsInputFormat.java:66)
    at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:156)
    at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:149)
    at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:64)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109)
    at org.apache.spark.scheduler.Task.run(Task.scala:53)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213)
    at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)
14/02/25 12:01:19 WARN scheduler.TaskSetManager: Loss was due to java.lang.IllegalArgumentException
java.lang.IllegalArgumentException: Unable to determine task id - please report your distro/setting through the issue tracker
    at org.elasticsearch.hadoop.util.Assert.notNull(Assert.java:38)
    at org.elasticsearch.hadoop.mr.HeartBeat.<init>(HeartBeat.java:57)
    at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.init(EsInputFormat.java:204)
    at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.<init>(EsInputFormat.java:167)
    at org.elasticsearch.hadoop.mr.EsInputFormat$WritableShardRecordReader.<init>(EsInputFormat.java:328)
    at org.elasticsearch.hadoop.mr.EsInputFormat.getRecordReader(EsInputFormat.java:449)
    at org.elasticsearch.hadoop.mr.EsInputFormat.getRecordReader(EsInputFormat.java:66)
    at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:156)
    at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:149)
    at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:64)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:109)
    at org.apache.spark.scheduler.Task.run(Task.scala:53)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213)
    at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)
14/02/25 12:01:19 ERROR scheduler.TaskSetManager: Task 0.0:0 failed 1 times; aborting job
14/02/25 12:01:19 INFO scheduler.TaskSchedulerImpl: Remove TaskSet 0.0 from pool
14/02/25 12:01:19 INFO scheduler.TaskSchedulerImpl: Ignoring update with state RUNNING from TID 1 because its task set is gone
14/02/25 12:01:19 INFO scheduler.TaskSchedulerImpl: Ignoring update with state FAILED from TID 1 because its task set is gone
14/02/25 12:01:19 INFO scheduler.DAGScheduler: Failed to run foreach at SimpleApp.scala:18
[error] (run-main) org.apache.spark.SparkException: Job aborted: Task 0.0:0 failed 1 times (most recent failure: Exception failure: java.lang.IllegalArgumentException: Unable to determine task id - please report your distro/setting through the issue tracker)
org.apache.spark.SparkException: Job aborted: Task 0.0:0 failed 1 times (most recent failure: Exception failure: java.lang.IllegalArgumentException: Unable to determine task id - please report your distro/setting through the issue tracker)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1026)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
    at scala.Option.foreach(Option.scala:236)
    at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:619)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:207)
    at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
    at akka.actor.ActorCell.invoke(ActorCell.scala:456)
    at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
    at akka.dispatch.Mailbox.run(Mailbox.scala:219)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
[trace] Stack trace suppressed: run last compile:run for the full output.
14/02/25 12:01:19 INFO network.ConnectionManager: Selector thread was interrupted!
java.lang.RuntimeException: Nonzero exit code: 1
    at scala.sys.package$.error(package.scala:27)
[trace] Stack trace suppressed: run last compile:run for the full output.
[error] (compile:run) Nonzero exit code: 1
[error] Total time: 3 s, completed Feb 25, 2014 12:01:19 PM
@costin

This comment has been minimized.

Copy link
Member

costin commented Feb 25, 2014

The nightly build didn't have the latest changes (hence the suggestion to use master instead mentioned in the previous issues). I've pushed out a new nightly build at Sonatype (elasticsearch-hadoop-1.3.0.BUILD-20140225.065840-319). Can you please check it out?

@rbraley

This comment has been minimized.

Copy link
Author

rbraley commented Feb 25, 2014

Checked it out and it works! Nice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.