From 9686d5ba3be613deebad69e2a574e9c76e3e267b Mon Sep 17 00:00:00 2001 From: Kartik Nagappa Date: Tue, 7 Mar 2023 16:30:21 -0800 Subject: [PATCH] new EMR console - Updated content with new EMR console screenshots - Kept the old EMR console screenshots as well - Restructured and made minor edits for readability / flow --- README.md | 34 ++++++++++++++++++++++++---------- 1 file changed, 24 insertions(+), 10 deletions(-) diff --git a/README.md b/README.md index 09facf5..efcd35d 100644 --- a/README.md +++ b/README.md @@ -35,26 +35,40 @@ The script relies on AWS CLI to retreive the data. ``` `` is the cluster id that you are interested in parsing. The cluster id is prefixed with 'j-'. +`` represents [the region the cluster ran in](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#concepts-available-regions). - +New EMR console | Old EMR Console +:-------------------------:|:-------------------------: + | -`` represents [the region the cluster ran in](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-regions-availability-zones.html#concepts-available-regions). The script doesn't rely on the region configured in AWS config to align with the region the cluster actually ran in. (e.g. us-east-1) +## Step 2: Retrieve EMR Spark logs and upload into Autotuner -## Step 2: Retrieve EMR Spark logs and upload into Autotuner step #2 +1. Go to the EMR console in AWS, and find the cluster that ran the job you are interested in optimizing. Click on the cluster name to view details of the cluster. -1. Assure that you have spark.eventLog.enabled set to true for any jobs you are interested in optimizing. +2. Verify that you have `spark.eventLog.enabled` set to true for any jobs you are interested in optimizing. The Sync Autotuner needs a Spark event log from a job run in order to provide optimized cluster configurations for the job. -2. Go to the EMR console in AWS, and find the cluster that ran the job you are interested in optimizing. Click on the cluster name to view details of the cluster. - +New EMR console | Old EMR Console +:-------------------------:|:-------------------------: + | -3. Once you are in the cluster information page, click on the “Application user interfaces” tab, and click on “Spark history server” (in red below) under “Persistent application user interfaces.” - +3. If `spark.eventLog.dir` is set and specifies an S3 location then download the Spark event log from the specified S3 location. Skip to Step 7. +4. If `spark.eventLog.dir` is **not set**, follow the steps below to download the Spark event log from the Spark history server. -4. A new tab should open up with the Spark history server. It may take a minute to load. Click the download button under the event log column to download the Spark event log. Upload this log into the Autotuner in step #2. +5. Once you are in the cluster information page, click on the “Application user interfaces” tab, and click on “Spark history server” (in red below) under “Persistent application user interfaces.” + +New EMR console | Old EMR Console +:-------------------------:|:-------------------------: + | + +6. A new tab should open up with the Spark history server. It may take a minute to load. Click the download button under the event log column to download the Spark event log. +7. Upload the Spark event log into the Autotuner. + + + # Databricks Tools @@ -118,4 +132,4 @@ Instructions for finding a cluster-id through the Databricks console can be foun ], "total_count": 22 } -``` \ No newline at end of file +```