Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark jobs not showing up on Dr Elephant UI #456

Open
kartiknooli opened this issue Oct 24, 2018 · 8 comments
Open

Spark jobs not showing up on Dr Elephant UI #456

kartiknooli opened this issue Oct 24, 2018 · 8 comments

Comments

@kartiknooli
Copy link

hello, I am having a similar issue like a lot of others mentioned but none of those tickets helped me resolve my issue. My spark jobs won't show up on Dr. Elephant UI. I can only see MapReduce jobs. I went through this thread but could not figure out where to find dr elephant logs for the spark jobs? I am on EMR with Hadoop v 2.7.3, Spark 2.1.1. All the configs you mentioned above exist in my cluster. I can see the running spark job on the Resource Manager UI as well as spark history server once it's completed.

spark.yarn.historyServer.address ip-10-XX-XX-X.ec2.internal:18080
spark.eventLog.dir hdfs:///var/log/spark/apps
Here is how my dr elephant folder looks like:
drwxr-xr-x 2 ec2-user ec2-user 4096 Oct 24 16:29 app-conf
drwxr-xr-x 2 ec2-user ec2-user 4096 Oct 17 22:29 bin
drwxr-xr-x 3 ec2-user ec2-user 4096 Oct 17 22:29 conf
-rwxr-xr-x 1 ec2-user ec2-user 1199 Oct 24 16:30 dr.log
drwxr-xr-x 2 ec2-user ec2-user 16384 Oct 17 22:29 lib
drwxr-xr-x 2 ec2-user ec2-user 4096 Oct 24 16:31 logs
-rwxr-xr-x 1 ec2-user ec2-user 2925 Oct 17 22:26 README.md
-rw-r--r-- 1 root root 5 Oct 24 16:30 RUNNING_PID
drwxr-xr-x 3 ec2-user ec2-user 4096 Oct 17 22:29 scripts
drwxr-xr-x 3 ec2-user ec2-user 4096 Oct 17 22:29 share
echo $SPARK_HOME
/usr/lib/spark

echo $SPARK_CONF_DIR
/usr/lib/spark/conf
Am I missing something here? Please help.

thanks,
Kartik.

@ColinArmstrong
Copy link

There is a logs directory before the your dr.elephant folder that I didn't see you list.

$DR_ELEPHANT_DIR/../logs/elephant/dr_elephant.log

@kartiknooli
Copy link
Author

Thanks @ColinArmstrong for the response. I did check and here is the log and this time reran another spark job on the cluster and noticed that the elephant UI says it is a Hadoop job and doesn't identify it as a spark job. The dr-elephant.log file does not give me any error messages.
Is my understanding not right about how Dr Elephant displays spark jobs on the UI?

When i filter out the jobs on the UI by Job Type Spark, it returns no results.

thanks,
Kartik.

@shahrukhkhan489
Copy link
Contributor

Is HTTPS enabled on YARN? If HTTPS is not enabled then use the below steps to get it working

  1. Inject exports of SPARK_HOME and SPARK_CONF_DIR in ./bin/start.sh file.

  2. Make sure you have Spark Client Installed as a Component is you are using Vendor Specific Distribution.

  3. Update the Spark fetcher configuration to com.linkedin.drelephant.spark.fetchers.SparkFetcher in the conf file app-conf/FetcherConf.xml. By default it is commented

This should get Dr. Elephant working against Spark Jobs.

@lubomir-angelov
Copy link

lubomir-angelov commented Nov 22, 2018

@kartiknooli

To find the dr_elephant.log use $locate dr_elephant.log.

In my case to start getting Spark jobs I had to add the following in app-conf/FetcherConf.xml

<fetcher> <applicationtype>spark</applicationtype> <classname>com.linkedin.drelephant.spark.fetchers.SparkFetcher</classname> <params> <use_rest_for_eventlogs>true</use_rest_for_eventlogs> <should_process_logs_locally>true</should_process_logs_locally> <event_log_dir>webhdfs:///spark-history</event_log_dir> </params> </fetcher>

Our Spark event log dir is configured as hdfs:///spark-history -> we added <event_log_dir>webhdfs:///spark-history</event_log_dir>

And comment out these lines:

<applicationtype>spark</applicationtype> <classname>com.linkedin.drelephant.spark.fetchers.FSFetcher</classname>

More info at #206

@kartiknooli
Copy link
Author

@shahrukhkhan489 and @lubomir-angelov
thanks for the response.

I tried making the suggested changes.

  1. Inject exports of SPARK_HOME and SPARK_CONF_DIR in ./bin/start.sh file. I hope you meant the following:
export SPARK_HOME=/usr/lib/spark
export SPARK_CONF_DIR=/etc/spark/conf

Please correct me if I am wrong.

  1. Make sure you have Spark Client Installed as a Component is you are using Vendor Specific Distribution.
    We have spark client bootstrapped with EMR
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.1.1
      /_/

Using Python version 2.7.12 (default, Sep  1 2016 22:14:00)
SparkSession available as 'spark'.
>>>
  1. Updated the Spark fetcher configuration to the following:
<fetcher>
    <applicationtype>spark</applicationtype>
    <classname>com.linkedin.drelephant.spark.fetchers.SparkFetcher</classname>
    <params>
      <use_rest_for_eventlogs>true</use_rest_for_eventlogs>
      <should_process_logs_locally>true</should_process_logs_locally>
    </params>
  </fetcher>

I tried with and without adding the hdfs path for the eventlogs. both of them did not work.

Here is the error message i got from the logs:

11-26-2018 19:24:35 INFO  [dr-el-executor-thread-2] com.linkedin.drelephant.ElephantRunner : Analyzing SPARK application_1520505558307_35023
11-26-2018 19:24:35 INFO  [ForkJoinPool-1-worker-9] com.linkedin.drelephant.spark.fetchers.SparkRestClient : calling REST API at http://hostname:18080/api/v1/applications/application_1520505558307_35027
11-26-2018 19:24:35 INFO  [dr-el-executor-thread-2] com.linkedin.drelephant.spark.fetchers.SparkFetcher : Fetching data for application_1520505558307_35023
11-26-2018 19:24:35 INFO  [ForkJoinPool-1-worker-5] com.linkedin.drelephant.spark.fetchers.SparkRestClient : calling REST API at http://hostname:18080/api/v1/applications/application_1520505558307_35023
11-26-2018 19:24:35 ERROR [ForkJoinPool-1-worker-9] com.linkedin.drelephant.spark.fetchers.SparkRestClient : error reading applicationInfo http:hostname:18080/api/v1/applications/application_1520505558307_35027. Exception Message = HTTP 404 Not Found
11-26-2018 19:24:35 WARN  [dr-el-executor-thread-1] com.linkedin.drelephant.spark.fetchers.SparkFetcher : Failed fetching data for application_1520505558307_35027. I will retry after some time! Exception Message is: HTTP 404 Not Found

Appreciate your help with this.

@lubomir-angelov
Copy link

lubomir-angelov commented Nov 26, 2018 via email

@shahrukhkhan489
Copy link
Contributor

shahrukhkhan489 commented Nov 27, 2018

@kartiknooli The error 404 indicates that your logs have been rolled out. This might not be the same case with all spark applications

error reading applicationInfo http:hostname:18080/api/v1/applications/application_1520505558307_35027. Exception Message = HTTP 404 Not Found

Try opening the same link using browser. You will see the same log - http:hostname:18080/api/v1/applications/application_1520505558307_35027

@fusonghe
Copy link

fusonghe commented Jan 5, 2019

doesn't exist dr-elephant webUI sparkjobs I am at dr-elephant version 2.1.7 hadoop3.0.0 spark1.6
at app-conf/FetcherConf.xml

spark
org.apache.spark.deploy.history.SparkFSFetcher

<event_log_size_limit_in_mb>100</event_log_size_limit_in_mb>
<event_log_dir>/spark2-history</event_log_dir>
<spark_log_ext>.snappy</spark_log_ext> @shahrukhkhan489

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants