Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: redesign of BasePea and BaseRuntime #1539

Merged
merged 65 commits into from Jan 1, 2021
Merged

Conversation

hanxiao
Copy link
Member

@hanxiao hanxiao commented Dec 24, 2020

Big important ticket, during the holidays so I will start simple.

TL;DR

The current implementation of Pea, Runtime, and Pod introduces the following structure.

Pod 
 | 
 -  LocalRuntime
     |
     - Pea
 - ContainerRuntime (No pea)
 - AsyncRuntime (No Pea)
 - RemoteRuntime (No Pea)

Roughly speaking, this PR will "switch" the semantics of current Pea & Runtime, and change it into:

Pod 
 | 
 -  Pea
     |
     - ZEDRuntime (ZMQlet+Executor+Driver)
     - ContainerRuntime
     - AsyncRuntime (for gRPC)
     - RemoteRuntime

Current Problems

The philosophy of separating Runtime Manager and Runtime is a good direction. However, the realization of this philosophy has the following problems:

  • It weakens the design of Pea, breaking the semantic relation of Pod<> Pea. It injects a Runtime in-between: now the Pod is maintaining multiple Runtime instead of Peas. Such big change with little notice & adaption on other modules/blog/docs/Jina 101 is not acceptable.
  • When looking at the runtimes module implementation, it is half-done.
    • Many of the methods are legacy methods & docstrings from BasePea, making people question about the design phiolophy (which has nothing wrong).
    • All Runtime classes do not leverage inherit methods and each seems to have its own way of managing the lifecycle.

All in all, the current implemenation in runtimes module gives me a strong impression that it is a not complete work. It does not exhaustively abstract the philosophy of separating Runtime Manager and Runtime. Therefore, changes must be done before 1.0, asap.

This PR

  1. I will change the semantic back to Pod manages Pea, and Pea manages Runtime.
  2. Runtime is a member of Pea, managed by the Pea. It is defined as "a procedure that blocks the main process once running, therefore must be put into a separated thread/process. Any program/library/package/module that blocks the main
    process, can be formulated into a :class:BaseRuntime class and then be used in :class:BasePea." See jina/peapods/runtimes/base.py for details and how now BasePea is managing Runtime.
  3. For the sake of clarity, I'm removing all pea & runtime modules in this PR atm. These modules have to be heavily refactored anyway.
  4. There will be only one class of Pea, but Runtime has many subclasses, e.g. for remote, async, container, etc.

Progress

  • Restructure the semantics of Pea and Runtime
  • Design BaseRuntime and four essential interfaces.
  • Design Pea to use BaseRuntime
  • Refactor Pod to use Pea
  • Fix & refactor remote, container, async features in old Pea/Runtime
  • Refactor parser to separate Pea & Runtime argument
  • All tests

ETA

Extremely important refactoring. By the end of this year.

@hanxiao hanxiao requested a review from a team as a code owner December 24, 2020 15:31
@jina-bot jina-bot added size/XL area/core This issue/PR affects the core codebase area/helper This issue/PR affects the helper functionality area/network This issue/PR affects network functionality area/testing This issue/PR affects testing component/peapod labels Dec 24, 2020
@JoanFM
Copy link
Member

JoanFM commented Dec 24, 2020

I disagree that it confuses the user in the definition of Pods and Peas.

I think current design clarifies better what IS a Pea and what is NOT a Pea. It is more important to have a Pea properly context managed by a Runtime than the fact that a Pod manages a Pea (which is False in the direction of this PR).

Inheriting from Pea things that do not have to know anything about Executors or ZmqLets confuses even more the concepts of Pods and Peas.

Current design has a better separation of concerns clarifying better the different ways of parallelizing Peas (local, remote and in docker) and the actual work of a Pea.

Runtime has very different work because they do very different things. Inheriting should be made when possible but it is not a benefit to try to fit very different logics into the same structure using inheritance as a way trying to recycle code at all costs even when readbility is compromised.

@hanxiao
Copy link
Member Author

hanxiao commented Dec 24, 2020

It is more important to have a Pea properly context managed by a Runtime than the fact that a Pod manages a Pea

To understand this PR, the old Runtime is the new Pea in this PR.

@JoanFM
Copy link
Member

JoanFM commented Dec 24, 2020

I would suggest tob create a new class to encapsulate the (ZMQLet + Executor) in a context manager in the way BasePea is designed right now such that ZEDRuntime can ensure its proper context management.

@JoanFM
Copy link
Member

JoanFM commented Dec 24, 2020

It is more important to have a Pea properly context managed by a Runtime than the fact that a Pod manages a Pea

To understand this PR, the old Runtime is the new Pea in this PR.

and the old Pea is the new Runtime?


"""

def serve_forever(self):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change this to run_forever

@hanxiao
Copy link
Member Author

hanxiao commented Dec 24, 2020

It is more important to have a Pea properly context managed by a Runtime than the fact that a Pod manages a Pea

To understand this PR, the old Runtime is the new Pea in this PR.

and the old Pea is the new Runtime?

yes, added this raw concept to PR's body

@github-actions
Copy link

github-actions bot commented Dec 24, 2020

Latency summary

Current PR yields:

  • 😶 index QPS at 1964, delta to last 3 avg.: +0%
  • 😶 query QPS at 33, delta to last 3 avg.: -3%

Breakdown

Version Index QPS Query QPS
current 1964 33
0.8.16 1934 34
0.8.15 1982 34
0.8.14 1963 34

Backed by latency-tracking. Further commits will update this comment.

self._set_envs()

try:
self.runtime.setup()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if possible, using Runtime as context manager will be cleaner and more robust than a complicated set of Exceptions.

I also think that the different Runtimes are so diferent that I think is better to have different run methods for each. It is easier to read specialized methods than fitting 3 things in the same template.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For instance, RuntimeTerminated is only thrown by a local Runtime. Each of them will have their own nuances and I think is better to have it prepared to be of different nature, so that they can evolve in different directions.

Copy link
Member Author

@hanxiao hanxiao Dec 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each of them will have their own nuances

That's the part I don't think the current implementation is clear about that. And therefore, the BaseRuntime will capture this pattern, once for all.

Note throwing exception RuntimeTerminated is not defined as the way to end from run_forever, see BaseRuntime docstring: calling cancel will can end a forever loop peacefully. (ofcouse the Runtime designer must implement it, e.g. like in GatewayPea how I use async FIFO task end forever loop). Comparing to throwing any exception (not only RuntimeTerminated), it is more gentle and graceful. But throwing exception (not only RuntimeTerminated) allows one to jump out from multiple nested complicated loops immediately, which can be used as the last resort.

"""
try:
try:
self._executor = BaseExecutor.load_config(self.args.uses,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a big benefit of the latest refactor is to have executor context managed by the Pea which is a very robust design. Can we keep it this way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any plan to benefit of this? that would give a lot of robustness in ensuring the proper closure of Executors. Instead of trusting that the developer will safely capture every potential problem

Copy link
Member Author

@hanxiao hanxiao Jan 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

executor context managed , ... , is there any plan to benefit of this?

There is no magic on context manager. It relies on handwritten code such as .start(), .close. Previously one of the big headache is you never now when the context manager is ending its context, in main process/subprocess, in side forever loop or outside forever loop. It is unrealistic to just believe with ... will magically handle everything.

BasePea is designed to be Robust on handling All exceptions raised from Runtime at Any time. This is implemented by abstracting the four-function-interface in BaseRuntime, and then having clear & explicit try...except block to handle. Making sure that teardown is always properly called no matter what.

So it's not part of the PR, as I see it is a sugary syntax which only weakly benefits people who work on Runtime but no others. Further improvement on using with in separate PR is welcome but please first understand all corner cases described in the BasePea:start()

@codecov
Copy link

codecov bot commented Jan 1, 2021

Codecov Report

Merging #1539 (0965a20) into master (4a2c744) will increase coverage by 0.22%.
The diff coverage is 83.78%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1539      +/-   ##
==========================================
+ Coverage   84.35%   84.58%   +0.22%     
==========================================
  Files         108      128      +20     
  Lines        6367     6643     +276     
==========================================
+ Hits         5371     5619     +248     
- Misses        996     1024      +28     
Impacted Files Coverage Δ
jina/clients/helper.py 94.00% <ø> (+15.31%) ⬆️
jina/drivers/predict.py 89.83% <0.00%> (ø)
jina/parsers/check.py 0.00% <0.00%> (ø)
jina/parsers/export_api.py 0.00% <0.00%> (ø)
jina/parsers/hub/new.py 0.00% <0.00%> (ø)
jina/parsers/logger.py 0.00% <0.00%> (ø)
jina/peapods/runtimes/asyncio/grpc/async_call.py 97.91% <ø> (ø)
jina/peapods/runtimes/ssh/__init__.py 36.84% <36.84%> (ø)
jina/jaml/parsers/flow/legacy.py 50.00% <37.50%> (-2.64%) ⬇️
jina/peapods/runtimes/zmq/base.py 50.00% <50.00%> (ø)
... and 92 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 33dc267...0965a20. Read the comment docs.

@jina-bot jina-bot added the area/helloworld This issue/PR affects the helloworld label Jan 1, 2021
self.zmqlet.print_stats()
self.logger.error(f'{ex!r} during {self.runtime.setup!r}', exc_info=self.args.show_exc_info)
else:
self.is_ready.set()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this else construction isnt it a bit weird?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of the else clause is better than adding additional code to the try clause because it avoids accidentally catching an exception that wasn’t raised by the code being protected by the try ... except statement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one of the big benefits of the four-function-design of BaseRuntime is its decoupling the progress-status of the Runtime: for example, you never have to inject is_ready from Pea into some Runtime.func() just to let Pea to know Runtime is ready.

Therefore, try except at BasePea must catch exception in a precise way. It must see the four functions from Runtime as the atomic function and catch each of them precisely.

except Exception as ex:
self.logger.error(f'{ex!r} during {self.runtime.teardown!r}', exc_info=self.args.show_exc_info)
finally:
_finally()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finally is not a good method name

def request(self) -> 'Request':
"""Get the current request body inside the protobuf message"""
return self._request
def run(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having a specializrd Pea to handle ZEQRuntime makes the Runtime of the main part of the logic much clearer and robust. instead of adapting so many different runtimes to be handled in the same exact way.

Copy link
Member Author

@hanxiao hanxiao Jan 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one of the big headache is the explosive polymorphism on Pod, Pea and Runtime. We have too many Peas and Pods and then Runtimes. In this PR I killed all subclasses of Pod and Pea, they are now "sealed", only Runtime allows polymorphism.

Again, I believe the four-function-design in the BaseRuntime and its relationship with Pea is much clearer than before.

@hanxiao hanxiao merged commit 9b81559 into master Jan 1, 2021
@hanxiao hanxiao deleted the refactor-runtime-ref2 branch January 1, 2021 14:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cli This issue/PR affects the command line interface area/core This issue/PR affects the core codebase area/helloworld This issue/PR affects the helloworld area/helper This issue/PR affects the helper functionality area/network This issue/PR affects network functionality area/testing This issue/PR affects testing component/client component/docker component/driver component/executor component/flow component/jaml component/logging component/peapod executor/meta size/XL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants