Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tuning #430

Closed
wants to merge 22 commits into from
Closed

Tuning #430

wants to merge 22 commits into from

Conversation

pralabhkumar
Copy link
Contributor

@pralabhkumar pralabhkumar commented Aug 31, 2018

This pull request is an implementation of Unified Architecture . Its basically an effort to combine Heuristic based tuning and Optimization based tuning. This also includes effort to refactor TuneIn , such that it should be easily extensible
For further details , please visit below design doc

Unified Architecture : https://docs.google.com/document/d/1U7s1NDYujsG5cnvX39KrB0vrtVJUtIBx524BAyqGFcw/edit

class_diagram

Integration Test cases
https://docs.google.com/spreadsheets/d/1Z4YRtMkhPcSil3JU68tjMpyrNAjTM1B-b8umQ6Ev_y4/edit#gid=0

Unit test case have been written to take care of new functionality

@mkumar1984
Copy link
Contributor

mkumar1984 commented Oct 3, 2018

I have done the initial review. One general comment is we need to look into logging in detail and think about overall strategy for logging. We need to decide for each manager and each execution what should be logged as info and debug. I know most of these logging statements were already there, but as number of calls to tuning APIs are increasing because of change in default behavior, logging will become much more important from performance and debugging perspective.

(long) (suggestedDriverMemory + (roundedContainerSize - suggestedDriverMemory - memoryOverhead - FileUtils.ONE_MB) * 0.9);

return driverMemory;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we combine these two methods , getRoundedExecutorMemory and getRoundedDriverMemory ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's fine. Both are using different parameters. Currently logic is similar but going forward it may be different.

* threshold of max executor core then good, otherwise reduce the core and check memory
* requirement.
*/
private void suggestExecutorMemoryCore() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think , for the next version , we should also think to add number of partitions / tasks parameters , to come up with final executor memory .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely.

@mkumar1984
Copy link
Contributor

LGTM. Pending changes can be taken care in next PR.

varunsaxena pushed a commit that referenced this pull request Oct 17, 2018
Squashed commit of the following:

    * Made fitness calculation, tuning type specific
    * Changed Boolean to boolean
    * Removing spark heuristics specific changes to separate commit
    * Removing spark heuristics specific changes to separate commit
    * Handling scenario where no param generator implementation for algorith type
    * Documentation/Javadoc added
    * Updated HBT parameter generation for Spark
    * Changing log status to debug
    * Code changes to handle multiple jobs in Spark
    * Changing spark input configuration for version 1 api
    * Adding Spark Job Specific Configuration Input Output
    * Changed logic to have algorithm selection based on version
    * Changed logging to debug from info , optimization and indenetation changes
    * Unit test cases for Paramgenerator HBT . Documentation added for entity
    * Unit test cases for Fitness Manager for HBT
    * Added Unit Test cases , Changes specific to IPSO
    * Renamed TuningType to ParamGenerator , Generalized ParameterGenerateManagerOBT , minor changes in naming variables/metohds
    * New Spark HBT algorithm for specific suggestion
    * Changed to get tuning type from DB instead of Azkaban : Changed to work for HBT to OBT toggle : Added unit test cases : Enabled debug log for debugging
    * Added Documentation ; Added is_suggested_param_set ; Added logic for disable tuning for HBT
    * Ipso configuration changes for Unified Architecture
    * unified architecture changes
Copy link
Contributor

@varunsaxena varunsaxena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the code coverage for newly added code? As this is not master, Travis CI wont run for this PR. Probably run the quality tools locally and share the results.

Copy link
Contributor

@varunsaxena varunsaxena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the code coverage for the newly added code? As Travis CI wont run for this branch, we can probably run the quality tools locally and share the results.

varunsaxena pushed a commit that referenced this pull request Oct 29, 2018
Squashed commit of the following:

    * Made fitness calculation, tuning type specific
    * Changed Boolean to boolean
    * Removing spark heuristics specific changes to separate commit
    * Removing spark heuristics specific changes to separate commit
    * Handling scenario where no param generator implementation for algorith type
    * Documentation/Javadoc added
    * Updated HBT parameter generation for Spark
    * Changing log status to debug
    * Code changes to handle multiple jobs in Spark
    * Changing spark input configuration for version 1 api
    * Adding Spark Job Specific Configuration Input Output
    * Changed logic to have algorithm selection based on version
    * Changed logging to debug from info , optimization and indenetation changes
    * Unit test cases for Paramgenerator HBT . Documentation added for entity
    * Unit test cases for Fitness Manager for HBT
    * Added Unit Test cases , Changes specific to IPSO
    * Renamed TuningType to ParamGenerator , Generalized ParameterGenerateManagerOBT , minor changes in naming variables/metohds
    * New Spark HBT algorithm for specific suggestion
    * Changed to get tuning type from DB instead of Azkaban : Changed to work for HBT to OBT toggle : Added unit test cases : Enabled debug log for debugging
    * Added Documentation ; Added is_suggested_param_set ; Added logic for disable tuning for HBT
    * Ipso configuration changes for Unified Architecture
    * unified architecture changes

(cherry picked from commit 6241016)
@pralabhkumar
Copy link
Contributor Author

pralabhkumar commented Feb 19, 2019

This pull request is cherrypicked and merged to customSHS

@fusonghe
Copy link

fusonghe commented May 7, 2019

@@@Dr-elephant how to configure the azkaban scheduler, or configure the default

@fusonghe
Copy link

fusonghe commented May 7, 2019

@mkumar1984
Dr-elephant how to configure the azkaban scheduler, or configure the default

@fusonghe
Copy link

fusonghe commented May 9, 2019

image

Dr-elephant how to configure the scheduler or the default @ @mkumar1984 @pralabhkumar

@ShubhamGupta29
Copy link
Contributor

@fusonghe you can have a look at sample app-conf/SchedulerConf.xml. Just in case here is the sample for scheduler conf which will be mentioned in the SchedulerConf.xml

<scheduler> <name>azkaban</name> <classname>com.linkedin.drelephant.schedulers.AzkabanScheduler</classname> <params> <exception_enabled>true</exception_enabled> <workflow_client>com.linkedin.drelephant.clients.azkaban.AzkabanWorkflowClient</workflow_client> <username>sample_username</username> <password>sample_password</password> </params> </scheduler>

@fusonghe
Copy link

image

Dr-elephant comepare and FlowHistroy how to configure
The configuration scheduler azkaban is also not displayed @mkumar1984 @ShubhamGupta29

@ShubhamGupta29
Copy link
Contributor

@fusonghe can you create an Issue for this as discussing this is not the right place to discuss this.

@fusonghe
Copy link

image
@pralabhkumar

@pralabhkumar
Copy link
Contributor Author

pralabhkumar commented Jul 8, 2019

This pull request is cherrypicked and merged to customSHS . Hence closing this eeec95d

@varunsaxena varunsaxena self-assigned this Jul 15, 2019
Copy link
Contributor

@varunsaxena varunsaxena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the code coverage for the newly added code? As Travis CI wont run for this branch, we can probably run the quality tools locally and share the results.

Resolving this as coverage was added later after integration of code coverage in Travis CI.

*/
public abstract class AbstractBaselineManager implements Manager {
protected final String BASELINE_EXECUTION_COUNT = "baseline.execution.count";
protected Integer NUM_JOBS_FOR_BASELINE_DEFAULT = 30;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use int instead of Integer

@@ -0,0 +1,174 @@
package com.linkedin.drelephant.tuning;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include file header. Comment applies for other files as well

ignoreExecutionWaitInterval =
Utils.getNonNegativeLong(configuration, IGNORE_EXECUTION_WAIT_INTERVAL, 2 * 60 * AutoTuner.ONE_MIN);

// #executions after which tuning will stop even if parameters don't converge
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On what basis 39 and 18 have been decided? Also add a constant for them

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants