UPDATE: Spark 2.x support in Dr. Elephant #327

akshayrai · 2018-02-01T05:30:54Z

Hi everyone,

Sharing some updates from Linkedin.

We are successfully analyzing all Spark 2.x jobs using Dr. Elephant.

In the last couple of months, we have been trying hard to enable Spark support in Dr. Elephant but we faced several blockers due to the stability issues in the Spark History Server (SHS). We were finally able make it work at Linkedin after deploying a custom version of SHS which mostly builds on top of the pre-existing effort that is ongoing in the open source Spark branch to improve SHS. On top of it, we added a few patches to Spark that adds more metrics which we will be contributing to open source soon.

So, essentially, if you want Dr. Elephant to analyze Spark 2.x jobs, then you need this custom setup of Spark History Server, at least until all the work is part of an official release. Most of the open source SHS improvement work was done in SPARK-18085 (closed recently). From the Dr. Elephant perspective, all the related work is already part of the open source repository (master branch).

@edwinalu /@skakker will be sharing more details on the Spark SHS setup soon.

Regards,
Akshay

shkhrgpt · 2018-02-01T08:25:45Z

Thanks, @akshayrai for sharing this information. I am a little confused with some of the stuff, therefore I would like to clarify a few things:

According to SPARK-18085 JIRA ticket, it seems like that improved SHS will be part of Spark 2.3. Does this mean Dr. Elephant will not support Spark 2.0, 2.1, and 2.2?
Currently, Dr. Elephant does not compile against Spark 2.X. Are we going to fix this or we will use the old methodology of compiling against Spark 1.4 and run for Spark 2.3?
If I am correct then Spark-18085 does not add all the new metrics that some the recent PRs are using. Does that mean that Spark 2.3 would not have those metrics?

rajeshcode · 2018-02-01T18:09:21Z

This is still through SparkRestclient we are getting data from spark history server and analysis right. Still the focus was not to fix the compile time errors with spark2.x right.

edwinalu · 2018-02-01T18:12:58Z

The new executor memory metrics have been proposed in SPARK-23206.

akshayrai · 2018-02-02T05:30:14Z

Hi @shkhrgpt, please find answers in-line below.

According to SPARK-18085 JIRA ticket, it seems like that improved SHS will be part of Spark 2.3. Does this mean Dr. Elephant will not support Spark 2.0, 2.1, and 2.2?

Dr. Elephant only needs a stable version of Spark History Server. So if we can setup the improved SHS against the older versions of Spark, then Dr. Elephant could work. This hasn't been tested though. At Linkedin, we have setup the improved SHS to run with Spark 2.1. @edwinalu can correct me if I am wrong.

Currently, Dr. Elephant does not compile against Spark 2.X. Are we going to fix this or we will use the old methodology of compiling against Spark 1.4 and run for Spark 2.3?

This needs to be fixed. For now, we are compiling against Spark 1.x and run on Spark 2.x

If I am correct then Spark-18085 does not add all the new metrics that some the recent PRs are using. Does that mean that Spark 2.3 would not have those metrics?

The new metrics do not block the Dr. Elephant/Spark setup. They are only required in some of the Heuristics. But, yes, if the metrics are not part of Spark 2.3 then some heuristics would not make sense. However, I am not sure how much of an effort that will be to back-port it to the older Spark versions.

akshayrai · 2018-02-02T05:32:08Z

@rajeshcode, yes, we are using the REST based SparkFetcher. And for the answer to your second question refer my answer to Shekhar above (point 2).

edwinalu · 2018-02-02T13:40:44Z

@akshayrai is correct. At LinkedIn, we are running Spark 2.3 History Server, which processes Spark applications running Spark 2.1 and 1.6. We have not tried with 1.4.

andrijaperovic · 2018-02-20T23:58:44Z

Is it possible to document your steps for custom SHS configuration running on Spark 2.x in the setup article? Or is it mostly documented in the SPARK-18085 JIRA you mentioned?

It doesn't look like Cloudera is distributing Spark 2.3 (which supports https://issues.apache.org/jira/browse/SPARK-18085) as of yet, have you done the upgrade by hand?
Additionally, if we want to try this out should we be building against customSHSwork branch instead of master?

rajeshcode · 2018-02-26T18:46:22Z

@akshayrai
Is just Upgrading our SHS to spark2.3 will fix the issue to analyze spark2.x jobs ?
Do we need to add any patch or update for DR elephant related to this ? In master branch didn't find any thing recently got updated based on this.

akshayrai · 2018-02-27T08:45:59Z

@skakker , as discussed earlier, can you please update the Dr. Elephant release notes page and document the setup instructions for spark 2.x support and update this thread? Ensure you address all the above questions in the document.

hereTac · 2018-03-05T02:59:05Z

With the spark2.3 release , any work here about support spark2.x whether has begining or any plan ? please let us know . The support about spark2.x will be very helpful . (ps. so much hope about analysis spark2.x .)

skakker · 2018-03-06T12:36:00Z

To clarify and answer most of the questions:

There are two types changes we have made internally to SHS:
Stability changes (SPARK-23608 (fixed the Jetty issue), SPARK-23607 (performance improvements))
Adding new metrics (SPARK-23206) -> For adding new spark heuristics in Dr. Elephant

If you wish to use the code from the master branch of Dr. Elephant, you can do so with your SHS as well, only issue being that it may run into Jetty issues if load on the history server increases. We have sent out a PR after fixing the jetty issue, which can be viewed in the SPARK-23608 ticket. It hasn't been merged yet but you can apply that patch on top of your SHS 2.x.

The new heuristics changes are in the "customSHS" branch in Dr. Elephant. If you wish to use these heuristics, you will have to wait for the "SPARK-23206" changes to be merged with the SHS.

mateo41 · 2018-04-20T23:10:46Z

Has anyone tried using Dr Elephant on a version of AWS EMR with Spark 2.3.0?

achyuthsamudrala · 2018-05-16T15:15:58Z

What does compiling against an older version and running against a new version mean? Thats not going to analyze the logs correctly anyway. I keep running into the issues that some event logs are not being parsed. (which are part of spark 2.x) Unless I am doing something incorrectly.

jiajie999 · 2018-06-04T12:00:36Z

@mateo41 Not working on EMR with Spark 2.3.0, Hive tez works fine.

ritika11 · 2018-07-04T08:58:17Z

Hi

Do we have some additional cpu/executor metrics for spark jobs?

dmateusp · 2018-08-23T16:24:59Z

I would be interested in getting that to work on EMR 2.3.0, what issues are you guys facing ? @jiajie999

Out of the box I ran into

08-23-2018 16:07:34 ERROR [ForkJoinPool-1-worker-7] com.linkedin.drelephant.spark.fetchers.SparkRestClient : error reading failedTasks http://XXXX.ec2.internal:XXXX/api/v1/applications/application_XXXX/stages/failedTasks
javax.ws.rs.NotFoundException: HTTP 404 Not Found

ankurchourasiya · 2018-11-17T17:04:36Z

I would be interested in getting that to work on EMR 2.3.0, what issues are you guys facing ? @jiajie999

Out of the box I ran into
08-23-2018 16:07:34 ERROR [ForkJoinPool-1-worker-7] com.linkedin.drelephant.spark.fetchers.SparkRestClient : error reading failedTasks http://XXXX.ec2.internal:XXXX/api/v1/applications/application_XXXX/stages/failedTasks
javax.ws.rs.NotFoundException: HTTP 404 Not Found

This is Fetcher issue. Please update your fetcher in /app-conf/FetcherConf.xml to update spark fetcher as below:
<fetcher> <applicationtype>spark</applicationtype> <classname>com.linkedin.drelephant.spark.fetchers.SparkFetcher</classname> <params> <use_rest_for_eventlogs>true</use_rest_for_eventlogs> <should_process_logs_locally>true</should_process_logs_locally> </params> </fetcher>

orbiran88 · 2018-12-12T13:09:59Z

@akshayrai There are any updates about compiling against Spark 2.x? thanks!!

t2hw · 2019-01-10T08:23:08Z

a lengthy discussion，so the conclusion so far is " nothing is documented and nobody wants（anyone who cans） "

so , any updates ? thanks 1000 times

mareksimunek · 2019-01-25T12:36:06Z

Also can't see any Spark metrics in Dr. Elephant UI. SHS version 2.3.0

I got
com.linkedin.drelephant.spark.SparkMetricsAggregator : applicationDurationMillis is negative. Skipping Metrics Aggregation:-1548370957424

helenfeng737 · 2019-03-19T04:38:49Z

Any updates on this?

ligao101 · 2019-11-13T16:41:41Z

Any updates on this? With Spark 2.4 and more upcoming Spark 3.0 what is the current progress to support newer spark versions?

ashwin-flip · 2019-12-04T07:08:51Z

Hi @shkhrgpt, please find answers in-line below.

According to SPARK-18085 JIRA ticket, it seems like that improved SHS will be part of Spark 2.3. Does this mean Dr. Elephant will not support Spark 2.0, 2.1, and 2.2?

Dr. Elephant only needs a stable version of Spark History Server. So if we can setup the improved SHS against the older versions of Spark, then Dr. Elephant could work. This hasn't been tested though. At Linkedin, we have setup the improved SHS to run with Spark 2.1. @edwinalu can correct me if I am wrong.

Currently, Dr. Elephant does not compile against Spark 2.X. Are we going to fix this or we will use the old methodology of compiling against Spark 1.4 and run for Spark 2.3?

This needs to be fixed. For now, we are compiling against Spark 1.x and run on Spark 2.x

Hi, I am trying to build for Spark 2.4 but since it's not compiling, I tried with the default Spark 1.4 and in build.sbt I mentioned Scala version as 2.11. But the compilation is still failing. Any work around for this ?

…n#327

akshayrai closed this as completed Feb 2, 2018

akshayrai reopened this Feb 2, 2018

Tagar mentioned this issue Mar 12, 2018

Spark 2.3 is now out... Will Dr.elephant support it out of the box? #349

Open

Parth59 mentioned this issue May 24, 2018

Unable to get correct metrics for spark #389

Open

lubomir-angelov mentioned this issue Nov 26, 2018

Spark jobs not showing up on Dr Elephant UI #456

Open

f1gjam mentioned this issue Mar 11, 2019

Compile failing - Ivy/NPM and other issues #530

Open

mareksimunek mentioned this issue Mar 15, 2019

Compilation failed for Spark 2.3.0 #531

Open

oleksiilopasov pushed a commit to oleksiilopasov/dr-elephant that referenced this issue Mar 26, 2020

Spark version decreased to 1.6.3 as 2.X is not supported, see linkedi…

7edd4bc

…n#327

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPDATE: Spark 2.x support in Dr. Elephant #327

UPDATE: Spark 2.x support in Dr. Elephant #327

akshayrai commented Feb 1, 2018 •

edited

Loading

shkhrgpt commented Feb 1, 2018

rajeshcode commented Feb 1, 2018

edwinalu commented Feb 1, 2018

akshayrai commented Feb 2, 2018

akshayrai commented Feb 2, 2018

edwinalu commented Feb 2, 2018

andrijaperovic commented Feb 20, 2018 •

edited

Loading

rajeshcode commented Feb 26, 2018

akshayrai commented Feb 27, 2018

hereTac commented Mar 5, 2018

skakker commented Mar 6, 2018

mateo41 commented Apr 20, 2018 •

edited

Loading

achyuthsamudrala commented May 16, 2018

jiajie999 commented Jun 4, 2018

ritika11 commented Jul 4, 2018

dmateusp commented Aug 23, 2018 •

edited

Loading

ankurchourasiya commented Nov 17, 2018 •

edited

Loading

orbiran88 commented Dec 12, 2018 •

edited

Loading

t2hw commented Jan 10, 2019

mareksimunek commented Jan 25, 2019

helenfeng737 commented Mar 19, 2019

ligao101 commented Nov 13, 2019

ashwin-flip commented Dec 4, 2019

UPDATE: Spark 2.x support in Dr. Elephant #327

UPDATE: Spark 2.x support in Dr. Elephant #327

Comments

akshayrai commented Feb 1, 2018 • edited Loading

shkhrgpt commented Feb 1, 2018

rajeshcode commented Feb 1, 2018

edwinalu commented Feb 1, 2018

akshayrai commented Feb 2, 2018

akshayrai commented Feb 2, 2018

edwinalu commented Feb 2, 2018

andrijaperovic commented Feb 20, 2018 • edited Loading

rajeshcode commented Feb 26, 2018

akshayrai commented Feb 27, 2018

hereTac commented Mar 5, 2018

skakker commented Mar 6, 2018

mateo41 commented Apr 20, 2018 • edited Loading

achyuthsamudrala commented May 16, 2018

jiajie999 commented Jun 4, 2018

ritika11 commented Jul 4, 2018

dmateusp commented Aug 23, 2018 • edited Loading

ankurchourasiya commented Nov 17, 2018 • edited Loading

orbiran88 commented Dec 12, 2018 • edited Loading

t2hw commented Jan 10, 2019

mareksimunek commented Jan 25, 2019

helenfeng737 commented Mar 19, 2019

ligao101 commented Nov 13, 2019

ashwin-flip commented Dec 4, 2019

akshayrai commented Feb 1, 2018 •

edited

Loading

andrijaperovic commented Feb 20, 2018 •

edited

Loading

mateo41 commented Apr 20, 2018 •

edited

Loading

dmateusp commented Aug 23, 2018 •

edited

Loading

ankurchourasiya commented Nov 17, 2018 •

edited

Loading

orbiran88 commented Dec 12, 2018 •

edited

Loading