Spark history logs fetching issue in HA Cluster #123

vinaygulani1 · 2016-08-02T02:00:27Z

DrElephant is not able to fetch the Spark history logs in Yarn HA cluster by setting the namenode_addresses, below are the configs :

<params>
<event_log_size_limit_in_mb>100</event_log_size_limit_in_mb>
<event_log_dir>/user/spark/jobhistory</event_log_dir>
<spark_log_ext>_1</spark_log_ext>
#the values specified in namenode_addresses will be used for obtaining spark logs. The cluster configuration will be ignored.
<namenode_addresses>hahdfs1.hostname:50070, hahdfs2.hostname:50070</namenode_addresses>
</params>

But it works with webhdfs if I specifically go for current active namenode, below are the configs:
<params>
<event_log_size_limit_in_mb>100</event_log_size_limit_in_mb>
<event_log_dir>webhdfs://hahdfs1.hostname.net:50070/user/spark/jobhistory</event_log_dir>
<event_log_dir>/user/spark/jobhistory</event_log_dir>
<spark_log_ext>_1</spark_log_ext>
</params>

Error logs:
08-01-2016 21:45:13 ERROR [dr-el-executor-thread-2] com.linkedin.drelephant.ElephantRunner : java.security.PrivilegedActionException: java.io.FileNotFoundException: File does not exist: /user/spark/jobhistory/application_1460147926973_0091_1
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:356)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1636)
at com.linkedin.drelephant.security.HadoopSecurity.doAs(HadoopSecurity.java:99)
at org.apache.spark.deploy.history.SparkFSFetcher.fetchData(SparkFSFetcher.scala:189)
at org.apache.spark.deploy.history.SparkFSFetcher.fetchData(SparkFSFetcher.scala:55)
at com.linkedin.drelephant.analysis.AnalyticJob.getAnalysis(AnalyticJob.java:231)
at com.linkedin.drelephant.ElephantRunner$ExecutorThread.run(ElephantRunner.java:181)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.FileNotFoundException: File does not exist: /user/spark/jobhistory/application_1460147926973_0091_1
at sun.reflect.GeneratedConstructorAccessor25.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:385)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$600(WebHdfsFileSystem.java:91)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.shouldRetry(WebHdfsFileSystem.java:656)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:622)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:458)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:487)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.run(WebHdfsFileSystem.java:483)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getHdfsFileStatus(WebHdfsFileSystem.java:838)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getFileStatus(WebHdfsFileSystem.java:853)
at org.apache.spark.deploy.history.SparkFSFetcher.org$apache$spark$deploy$history$SparkFSFetcher$$shouldThrottle(SparkFSFetcher.scala:324)
at org.apache.spark.deploy.history.SparkFSFetcher$$anon$1.run(SparkFSFetcher.scala:242)
at org.apache.spark.deploy.history.SparkFSFetcher$$anon$1.run(SparkFSFetcher.scala:189)

The text was updated successfully, but these errors were encountered:

akshayrai · 2016-08-16T16:23:20Z

Tracked in the Mailing List,

https://groups.google.com/forum/?utm_medium=email&utm_source=footer#!topic/dr-elephant-users/YeDh47u2kvU

akshayrai added the help wanted label Aug 9, 2016

akshayrai closed this as completed Aug 16, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark history logs fetching issue in HA Cluster #123

Spark history logs fetching issue in HA Cluster #123

vinaygulani1 commented Aug 2, 2016 •

edited

akshayrai commented Aug 16, 2016

Spark history logs fetching issue in HA Cluster #123

Spark history logs fetching issue in HA Cluster #123

Comments

vinaygulani1 commented Aug 2, 2016 • edited

akshayrai commented Aug 16, 2016

vinaygulani1 commented Aug 2, 2016 •

edited