-
Notifications
You must be signed in to change notification settings - Fork 859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark jobs not showing up on Dr Elephant UI #456
Comments
There is a logs directory before the your dr.elephant folder that I didn't see you list. $DR_ELEPHANT_DIR/../logs/elephant/dr_elephant.log |
Thanks @ColinArmstrong for the response. I did check and here is the log and this time reran another spark job on the cluster and noticed that the elephant UI says it is a Hadoop job and doesn't identify it as a spark job. The dr-elephant.log file does not give me any error messages. When i filter out the jobs on the UI by Job Type Spark, it returns no results. thanks, |
Is HTTPS enabled on YARN? If HTTPS is not enabled then use the below steps to get it working
This should get Dr. Elephant working against Spark Jobs. |
To find the dr_elephant.log use $locate dr_elephant.log. In my case to start getting Spark jobs I had to add the following in app-conf/FetcherConf.xml
Our Spark event log dir is configured as And comment out these lines:
More info at #206 |
@shahrukhkhan489 and @lubomir-angelov I tried making the suggested changes.
Please correct me if I am wrong.
I tried with and without adding the hdfs path for the eventlogs. both of them did not work. Here is the error message i got from the logs:
Appreciate your help with this. |
It looks like your spark history server is not responding.
I think you need a patched version of SHS to get Spark2 jobs registered.
#327
…On Mon, Nov 26, 2018, 21:41 Kartik ***@***.***> wrote:
@shahrukhkhan489 <https://github.com/shahrukhkhan489> and @lubomir-angelov
<https://github.com/lubomir-angelov>
thanks for the response.
I tried making the suggested changes.
1. Inject exports of SPARK_HOME and SPARK_CONF_DIR in ./bin/start.sh
file. I hope you meant the following:
export SPARK_HOME=/usr/lib/spark
export SPARK_CONF_DIR=/etc/spark/conf
Please correct me if I am wrong.
1. Make sure you have Spark Client Installed as a Component is you are
using Vendor Specific Distribution.
We have spark client bootstrapped with EMR
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.1.1
/_/
Using Python version 2.7.12 (default, Sep 1 2016 22:14:00)
SparkSession available as 'spark'.
>>>
1. Updated the Spark fetcher configuration to the following:
<fetcher>
<applicationtype>spark</applicationtype>
<classname>com.linkedin.drelephant.spark.fetchers.SparkFetcher</classname>
<params>
<use_rest_for_eventlogs>true</use_rest_for_eventlogs>
<should_process_logs_locally>true</should_process_logs_locally>
</params>
</fetcher>
I tried with and without adding the hdfs path for the eventlogs. both of
them did not work.
Here is the error message i got from the logs:
11-26-2018 19:24:35 INFO [dr-el-executor-thread-2] com.linkedin.drelephant.ElephantRunner : Analyzing SPARK application_1520505558307_35023
11-26-2018 19:24:35 INFO [ForkJoinPool-1-worker-9] com.linkedin.drelephant.spark.fetchers.SparkRestClient : calling REST API at http://hostname:18080/api/v1/applications/application_1520505558307_35027
11-26-2018 <http://hostname:18080/api/v1/applications/application_1520505558307_3502711-26-2018> 19:24:35 INFO [dr-el-executor-thread-2] com.linkedin.drelephant.spark.fetchers.SparkFetcher : Fetching data for application_1520505558307_35023
11-26-2018 19:24:35 INFO [ForkJoinPool-1-worker-5] com.linkedin.drelephant.spark.fetchers.SparkRestClient : calling REST API at http://hostname:18080/api/v1/applications/application_1520505558307_35023
11-26-2018 <http://hostname:18080/api/v1/applications/application_1520505558307_3502311-26-2018> 19:24:35 ERROR [ForkJoinPool-1-worker-9] com.linkedin.drelephant.spark.fetchers.SparkRestClient : error reading applicationInfo http:hostname:18080/api/v1/applications/application_1520505558307_35027. Exception Message = HTTP 404 Not Found
11-26-2018 19:24:35 WARN [dr-el-executor-thread-1] com.linkedin.drelephant.spark.fetchers.SparkFetcher : Failed fetching data for application_1520505558307_35027. I will retry after some time! Exception Message is: HTTP 404 Not Found
Appreciate your help with this.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#456 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGaxRL8q8uD9_vKM8v31MusR_IqrjNaZks5uzEPVgaJpZM4X4XQ2>
.
|
@kartiknooli The error 404 indicates that your logs have been rolled out. This might not be the same case with all spark applications
Try opening the same link using browser. You will see the same log - http:hostname:18080/api/v1/applications/application_1520505558307_35027 |
doesn't exist dr-elephant webUI sparkjobs I am at dr-elephant version 2.1.7 hadoop3.0.0 spark1.6 spark <event_log_size_limit_in_mb>100</event_log_size_limit_in_mb> |
hello, I am having a similar issue like a lot of others mentioned but none of those tickets helped me resolve my issue. My spark jobs won't show up on Dr. Elephant UI. I can only see MapReduce jobs. I went through this thread but could not figure out where to find dr elephant logs for the spark jobs? I am on EMR with Hadoop v 2.7.3, Spark 2.1.1. All the configs you mentioned above exist in my cluster. I can see the running spark job on the Resource Manager UI as well as spark history server once it's completed.
spark.yarn.historyServer.address ip-10-XX-XX-X.ec2.internal:18080
spark.eventLog.dir hdfs:///var/log/spark/apps
Here is how my dr elephant folder looks like:
drwxr-xr-x 2 ec2-user ec2-user 4096 Oct 24 16:29 app-conf
drwxr-xr-x 2 ec2-user ec2-user 4096 Oct 17 22:29 bin
drwxr-xr-x 3 ec2-user ec2-user 4096 Oct 17 22:29 conf
-rwxr-xr-x 1 ec2-user ec2-user 1199 Oct 24 16:30 dr.log
drwxr-xr-x 2 ec2-user ec2-user 16384 Oct 17 22:29 lib
drwxr-xr-x 2 ec2-user ec2-user 4096 Oct 24 16:31 logs
-rwxr-xr-x 1 ec2-user ec2-user 2925 Oct 17 22:26 README.md
-rw-r--r-- 1 root root 5 Oct 24 16:30 RUNNING_PID
drwxr-xr-x 3 ec2-user ec2-user 4096 Oct 17 22:29 scripts
drwxr-xr-x 3 ec2-user ec2-user 4096 Oct 17 22:29 share
echo $SPARK_HOME
/usr/lib/spark
echo $SPARK_CONF_DIR
/usr/lib/spark/conf
Am I missing something here? Please help.
thanks,
Kartik.
The text was updated successfully, but these errors were encountered: