Skip to content
This repository has been archived by the owner on Mar 12, 2018. It is now read-only.

Interfaces for Dynamic Optimization #31

Closed
johnyangk opened this issue Dec 31, 2016 · 11 comments
Closed

Interfaces for Dynamic Optimization #31

johnyangk opened this issue Dec 31, 2016 · 11 comments

Comments

@johnyangk
Copy link
Contributor

Currently, we assume that the compiler optimization happens just once, before the job commences.

Let's allow the optimization to happen multiple times, at runtime. We need to carefully think about how the interfaces between different components in the system should change.

The execution flow might look like this: The engine feeds runtime metrics into the compiler optimizer, which outputs a new IR for the compiler backend. The compiler backend then manipulates the JobDAG, with which the engine resumes execution.

@johnyangk
Copy link
Contributor Author

OK, so I talked with Eunji(our new lab member) and got some great insights for this.

In traditional compilers, jit compilation is usually used for code blocks that are executed multiple times(think loops). They generate a complete execution plan prior to the execution, and then as the iteration goes on, re-write the plan based on obtained stats.

In contrast, the DAGs we execute do not contain cycles, and runtime optimization(e.g., dynamic partitioning) of an operator usually depends on the results of previous operators. This lets us get away with a simpler design where the compiler generates the execution plan piece by piece, without having to re-write the plan across multiple layers(ir/runtime-logical/runtime-physical).

For example, let's say we're applying a dynamic partitioning optimization to a MapReduce application. First, the compiler generates a partial execution plan for Map and dynamically added KeyHistogram operators. After that, it can generate the rest of the execution plan for Reduce. We do not need to do any re-writes during the process.

I know Optimus does re-writing. But my feeling is that it is more to do with the system's legacy code than with the actual requirements.

Of course, we might want to do a re-write while executing the operator(e.g., change the number of reducers while executing the reducer), especially for streaming applications. Thus, I guess we need the re-writing mechanism after all? Then I guess we really need to put in efforts to make the re-writing across multiple layers easy. What are your thoughts?

@johnyangk
Copy link
Contributor Author

Hmm... I guess we need the re-writing mechanism after all. Even the Beam applications that we have now(MLR, ALS) have implicit conditional cycles/loops. It'd be great if we can explicitly express the cycles/loops in our IR, and use the traditional compiler's techniques on them.

@bgchun
Copy link

bgchun commented Mar 30, 2017

How do we handle the cycles/loops currently?
It'd be nice to have cycles/loops in our IR.

@johnyangk
Copy link
Contributor Author

Currently, we do not have cycle/loop in our IR. I'll file an issue for it.
Note that Beam also does not have cycle/loop in its language. I suppose we need to identify it ourselves in our Beam frontend.

@bgchun
Copy link

bgchun commented Mar 30, 2017

Then, how do we handle cycles (iterations) now?

@johnyangk
Copy link
Contributor Author

Right now, we just have a long DAG with duplicate sets of operators.
Because Beam does not have the concept of conditional loop, we have fixed number of iterations for ALS and MLR.
For example, if we have 2 operators in a loop, and there are 10 iterations, we simply have a long chain of 20 operators.

@bgchun
Copy link

bgchun commented Mar 30, 2017

@johnyangk Do you plan to add a loop in the current IR? This can be a good topic to discuss in our meeting.

@johnyangk
Copy link
Contributor Author

Sure this is a good discussion topic. If we have concrete optimization techniques to use for loops, then we can make this a high priority task , since this is something the Optimus paper did not have. However, if we don't, then I think it'd be better to postpone this task.

@bgchun
Copy link

bgchun commented Mar 31, 2017

If we decide to add a loop, there are two potential approaches

  • Add a high-level loop construct
  • Add a low-level jump/condition construct to create a loop

johnyangk pushed a commit that referenced this issue Apr 6, 2017
This PR:

- receives optimization policies as a parameter
  - it can be either "pado", "disaggregation", or "runtime_opt" (runtime_opt to be supported through #31)
- receives the parameter as an argument following the program class name argument
- applies tests that runs different policies

Resolves #52
@wonook wonook self-assigned this Apr 10, 2017
@wonook
Copy link
Contributor

wonook commented Apr 10, 2017

Loops will be handled with #121

@johnyangk johnyangk removed the major label Apr 15, 2017
@wonook wonook changed the title Interfaces for Runtime Optimization Interfaces for Dynamic Optimization Jun 11, 2017
@jooykim jooykim added this to the Sailfish on Vortex milestone Jun 28, 2017
@wonook
Copy link
Contributor

wonook commented Aug 29, 2017

See issues marked with DynOpt

@wonook wonook closed this as completed Aug 29, 2017
wonook added a commit that referenced this issue Feb 28, 2018
This PR:

- receives optimization policies as a parameter
  - it can be either "pado", "disaggregation", or "runtime_opt" (runtime_opt to be supported through #31)
- receives the parameter as an argument following the program class name argument
- applies tests that runs different policies

Resolves #52
wonook added a commit that referenced this issue Feb 28, 2018
This PR:

- receives optimization policies as a parameter
  - it can be either "pado", "disaggregation", or "runtime_opt" (runtime_opt to be supported through #31)
- receives the parameter as an argument following the program class name argument
- applies tests that runs different policies

Resolves #52
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants