refactor: redesign of BasePea and BaseRuntime #1539

hanxiao · 2020-12-24T15:31:59Z

Big important ticket, during the holidays so I will start simple.

TL;DR

The current implementation of Pea, Runtime, and Pod introduces the following structure.

Pod 
 | 
 -  LocalRuntime
     |
     - Pea
 - ContainerRuntime (No pea)
 - AsyncRuntime (No Pea)
 - RemoteRuntime (No Pea)

Roughly speaking, this PR will "switch" the semantics of current Pea & Runtime, and change it into:

Pod 
 | 
 -  Pea
     |
     - ZEDRuntime (ZMQlet+Executor+Driver)
     - ContainerRuntime
     - AsyncRuntime (for gRPC)
     - RemoteRuntime

Current Problems

The philosophy of separating Runtime Manager and Runtime is a good direction. However, the realization of this philosophy has the following problems:

It weakens the design of Pea, breaking the semantic relation of Pod<> Pea. It injects a Runtime in-between: now the Pod is maintaining multiple Runtime instead of Peas. Such big change with little notice & adaption on other modules/blog/docs/Jina 101 is not acceptable.
When looking at the runtimes module implementation, it is half-done.
- Many of the methods are legacy methods & docstrings from BasePea, making people question about the design phiolophy (which has nothing wrong).
- All Runtime classes do not leverage inherit methods and each seems to have its own way of managing the lifecycle.

All in all, the current implemenation in runtimes module gives me a strong impression that it is a not complete work. It does not exhaustively abstract the philosophy of separating Runtime Manager and Runtime. Therefore, changes must be done before 1.0, asap.

This PR

I will change the semantic back to Pod manages Pea, and Pea manages Runtime.
Runtime is a member of Pea, managed by the Pea. It is defined as "a procedure that blocks the main process once running, therefore must be put into a separated thread/process. Any program/library/package/module that blocks the main
process, can be formulated into a :class:BaseRuntime class and then be used in :class:BasePea." See jina/peapods/runtimes/base.py for details and how now BasePea is managing Runtime.
For the sake of clarity, I'm removing all pea & runtime modules in this PR atm. These modules have to be heavily refactored anyway.
There will be only one class of Pea, but Runtime has many subclasses, e.g. for remote, async, container, etc.

Progress

Restructure the semantics of Pea and Runtime
Design BaseRuntime and four essential interfaces.
Design Pea to use BaseRuntime
Refactor Pod to use Pea
Fix & refactor remote, container, async features in old Pea/Runtime
Refactor parser to separate Pea & Runtime argument
All tests

ETA

Extremely important refactoring. By the end of this year.

JoanFM · 2020-12-24T15:57:20Z

I disagree that it confuses the user in the definition of Pods and Peas.

I think current design clarifies better what IS a Pea and what is NOT a Pea. It is more important to have a Pea properly context managed by a Runtime than the fact that a Pod manages a Pea (which is False in the direction of this PR).

Inheriting from Pea things that do not have to know anything about Executors or ZmqLets confuses even more the concepts of Pods and Peas.

Current design has a better separation of concerns clarifying better the different ways of parallelizing Peas (local, remote and in docker) and the actual work of a Pea.

Runtime has very different work because they do very different things. Inheriting should be made when possible but it is not a benefit to try to fit very different logics into the same structure using inheritance as a way trying to recycle code at all costs even when readbility is compromised.

hanxiao · 2020-12-24T16:01:14Z

It is more important to have a Pea properly context managed by a Runtime than the fact that a Pod manages a Pea

To understand this PR, the old Runtime is the new Pea in this PR.

JoanFM · 2020-12-24T16:02:15Z

I would suggest tob create a new class to encapsulate the (ZMQLet + Executor) in a context manager in the way BasePea is designed right now such that ZEDRuntime can ensure its proper context management.

JoanFM · 2020-12-24T16:02:47Z

It is more important to have a Pea properly context managed by a Runtime than the fact that a Pod manages a Pea

To understand this PR, the old Runtime is the new Pea in this PR.

and the old Pea is the new Runtime?

hanxiao · 2020-12-24T16:06:37Z

jina/peapods/runtimes/base.py

+
+     """
+
+    def serve_forever(self):


change this to run_forever

hanxiao · 2020-12-24T16:11:47Z

It is more important to have a Pea properly context managed by a Runtime than the fact that a Pod manages a Pea

To understand this PR, the old Runtime is the new Pea in this PR.

and the old Pea is the new Runtime?

yes, added this raw concept to PR's body

github-actions · 2020-12-24T21:57:57Z

Latency summary

Current PR yields:

😶 index QPS at 1964, delta to last 3 avg.: +0%
😶 query QPS at 33, delta to last 3 avg.: -3%

Breakdown

Version	Index QPS	Query QPS
current	1964	33
`0.8.16`	1934	34
`0.8.15`	1982	34
`0.8.14`	1963	34

Backed by latency-tracking. Further commits will update this comment.

JoanFM · 2020-12-25T08:47:42Z

jina/peapods/peas/base.py

+        self._set_envs()
+
+        try:
+            self.runtime.setup()


if possible, using Runtime as context manager will be cleaner and more robust than a complicated set of Exceptions.

I also think that the different Runtimes are so diferent that I think is better to have different run methods for each. It is easier to read specialized methods than fitting 3 things in the same template.

For instance, RuntimeTerminated is only thrown by a local Runtime. Each of them will have their own nuances and I think is better to have it prepared to be of different nature, so that they can evolve in different directions.

Each of them will have their own nuances

That's the part I don't think the current implementation is clear about that. And therefore, the BaseRuntime will capture this pattern, once for all.

Note throwing exception RuntimeTerminated is not defined as the way to end from run_forever, see BaseRuntime docstring: calling cancel will can end a forever loop peacefully. (ofcouse the Runtime designer must implement it, e.g. like in GatewayPea how I use async FIFO task end forever loop). Comparing to throwing any exception (not only RuntimeTerminated), it is more gentle and graceful. But throwing exception (not only RuntimeTerminated) allows one to jump out from multiple nested complicated loops immediately, which can be used as the last resort.

JoanFM · 2020-12-27T20:01:18Z

jina/peapods/runtimes/zed.py

+        """
+        try:
+            try:
+                self._executor = BaseExecutor.load_config(self.args.uses,


a big benefit of the latest refactor is to have executor context managed by the Pea which is a very robust design. Can we keep it this way?

is there any plan to benefit of this? that would give a lot of robustness in ensuring the proper closure of Executors. Instead of trusting that the developer will safely capture every potential problem

executor context managed , ... , is there any plan to benefit of this?

There is no magic on context manager. It relies on handwritten code such as .start(), .close. Previously one of the big headache is you never now when the context manager is ending its context, in main process/subprocess, in side forever loop or outside forever loop. It is unrealistic to just believe with ... will magically handle everything.

BasePea is designed to be Robust on handling All exceptions raised from Runtime at Any time. This is implemented by abstracting the four-function-interface in BaseRuntime, and then having clear & explicit try...except block to handle. Making sure that teardown is always properly called no matter what.

So it's not part of the PR, as I see it is a sugary syntax which only weakly benefits people who work on Runtime but no others. Further improvement on using with in separate PR is welcome but please first understand all corner cases described in the BasePea:start()

codecov · 2021-01-01T10:42:58Z

Codecov Report

Merging #1539 (0965a20) into master (4a2c744) will increase coverage by 0.22%.
The diff coverage is 83.78%.

@@            Coverage Diff             @@
##           master    #1539      +/-   ##
==========================================
+ Coverage   84.35%   84.58%   +0.22%     
==========================================
  Files         108      128      +20     
  Lines        6367     6643     +276     
==========================================
+ Hits         5371     5619     +248     
- Misses        996     1024      +28

Impacted Files	Coverage Δ
jina/clients/helper.py	`94.00% <ø> (+15.31%)`	⬆️
jina/drivers/predict.py	`89.83% <0.00%> (ø)`
jina/parsers/check.py	`0.00% <0.00%> (ø)`
jina/parsers/export_api.py	`0.00% <0.00%> (ø)`
jina/parsers/hub/new.py	`0.00% <0.00%> (ø)`
jina/parsers/logger.py	`0.00% <0.00%> (ø)`
jina/peapods/runtimes/asyncio/grpc/async_call.py	`97.91% <ø> (ø)`
jina/peapods/runtimes/ssh/__init__.py	`36.84% <36.84%> (ø)`
jina/jaml/parsers/flow/legacy.py	`50.00% <37.50%> (-2.64%)`	⬇️
jina/peapods/runtimes/zmq/base.py	`50.00% <50.00%> (ø)`
... and 92 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 33dc267...0965a20. Read the comment docs.

JoanFM · 2021-01-01T11:09:18Z

jina/peapods/peas/__init__.py

-                self.zmqlet.print_stats()
+            self.logger.error(f'{ex!r} during {self.runtime.setup!r}', exc_info=self.args.show_exc_info)
+        else:
+            self.is_ready.set()


this else construction isnt it a bit weird?

The use of the else clause is better than adding additional code to the try clause because it avoids accidentally catching an exception that wasn’t raised by the code being protected by the try ... except statement.

one of the big benefits of the four-function-design of BaseRuntime is its decoupling the progress-status of the Runtime: for example, you never have to inject is_ready from Pea into some Runtime.func() just to let Pea to know Runtime is ready.

Therefore, try except at BasePea must catch exception in a precise way. It must see the four functions from Runtime as the atomic function and catch each of them precisely.

JoanFM · 2021-01-01T11:09:49Z

jina/peapods/peas/__init__.py

+            except Exception as ex:
+                self.logger.error(f'{ex!r} during {self.runtime.teardown!r}', exc_info=self.args.show_exc_info)
+        finally:
+            _finally()


finally is not a good method name

JoanFM · 2021-01-01T11:11:28Z

jina/peapods/peas/__init__.py

-    def request(self) -> 'Request':
-        """Get the current request body inside the protobuf message"""
-        return self._request
+    def run(self):


having a specializrd Pea to handle ZEQRuntime makes the Runtime of the main part of the logic much clearer and robust. instead of adapting so many different runtimes to be handled in the same exact way.

one of the big headache is the explosive polymorphism on Pod, Pea and Runtime. We have too many Peas and Pods and then Runtimes. In this PR I killed all subclasses of Pod and Pea, they are now "sealed", only Runtime allows polymorphism.

Again, I believe the four-function-design in the BaseRuntime and its relationship with Pea is much clearer than before.

hanxiao added 2 commits December 24, 2020 14:33

refactor: runtime step 1

8a64a1f

refactor: add BaseRuntime and BasePea

9d1e06a

hanxiao requested a review from a team as a code owner December 24, 2020 15:31

hanxiao requested review from CatStark and rutujasurve94 December 24, 2020 15:31

jina-bot added size/XL area/core This issue/PR affects the core codebase area/helper This issue/PR affects the helper functionality area/network This issue/PR affects network functionality area/testing This issue/PR affects testing component/peapod labels Dec 24, 2020

hanxiao commented Dec 24, 2020

View reviewed changes

jina/peapods/runtimes/base.py Outdated

"""

def serve_forever(self):

Copy link

Member Author

hanxiao Dec 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change this to run_forever

hanxiao added 3 commits December 24, 2020 22:45

refactor: add BaseRuntime and BasePea

1323e4d

docs: fix doc website layout

1b8b9f5

docs: fix doc website layout

afb620f

jina-bot added the component/driver label Dec 24, 2020

JoanFM reviewed Dec 25, 2020

View reviewed changes

hanxiao added 2 commits December 25, 2020 21:08

docs: fix doc website layout

3e0ea98

docs: fix doc website layout

dde6237

JoanFM requested changes Dec 27, 2020

View reviewed changes

hanxiao and others added 4 commits December 27, 2020 22:48

docs: fix doc website layout

bc6acf6

docs: fix doc website layout

056af1e

fix(jaml): fix 1545

b2b3ed0

fix(flow): fix flow cls registery

3a805cb

hanxiao modified the milestones: v0.9 New Features, v0.9 Breaking Changes Dec 30, 2020

hanxiao added 14 commits December 31, 2020 15:25

tests: fix unit test

92f2fc1

tests: fix unit test

81f730d

tests: fix unit test

53257eb

tests: fix unit test

d6bdef4

tests: fix unit test

5ce262b

tests: fix unit test

4cc4106

tests: fix unit test

71bc785

tests: fix unit test

341b1fa

tests: fix unit test

7c35259

tests: fix unit test

dd040ff

tests: fix unit test

8e48d4e

tests: fix unit test

d2ef93e

tests: fix unit test

6904cc9

tests: fix unit test

ec1f26b

hanxiao mentioned this pull request Jan 1, 2021

PR#1361 test is misleading, reverting #1575

Closed

hanxiao added 2 commits January 1, 2021 11:02

tests: fix unit test

e150898

tests: fix unit test

a0170fe

tests: fix unit test

2cb724d

jina-bot added the area/helloworld This issue/PR affects the helloworld label Jan 1, 2021

JoanFM requested changes Jan 1, 2021

View reviewed changes

hanxiao added 3 commits January 1, 2021 13:08

Merge branch 'master' into refactor-runtime-ref2

1927738

tests: fix unit test

5059fd9

tests: fix unit test

0965a20

hanxiao merged commit 9b81559 into master Jan 1, 2021

hanxiao deleted the refactor-runtime-ref2 branch January 1, 2021 14:07

deepankarm mentioned this pull request Jan 1, 2021

Adapt jinad post runtime refactoring jina-ai/jinad#77

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: redesign of BasePea and BaseRuntime #1539

refactor: redesign of BasePea and BaseRuntime #1539

hanxiao commented Dec 24, 2020 •

edited

JoanFM commented Dec 24, 2020

hanxiao commented Dec 24, 2020

JoanFM commented Dec 24, 2020

JoanFM commented Dec 24, 2020

hanxiao Dec 24, 2020

hanxiao commented Dec 24, 2020 •

edited

github-actions bot commented Dec 24, 2020 •

edited

JoanFM Dec 25, 2020

JoanFM Dec 25, 2020

hanxiao Dec 25, 2020 •

edited

JoanFM Dec 27, 2020

JoanFM Dec 28, 2020

hanxiao Jan 1, 2021 •

edited

codecov bot commented Jan 1, 2021 •

edited

JoanFM Jan 1, 2021

hanxiao Jan 1, 2021

JoanFM Jan 1, 2021

hanxiao Jan 1, 2021

JoanFM Jan 1, 2021

JoanFM Jan 1, 2021

hanxiao Jan 1, 2021 •

edited

refactor: redesign of BasePea and BaseRuntime #1539

refactor: redesign of BasePea and BaseRuntime #1539

Conversation

hanxiao commented Dec 24, 2020 • edited

TL;DR

Current Problems

This PR

Progress

ETA

JoanFM commented Dec 24, 2020

hanxiao commented Dec 24, 2020

JoanFM commented Dec 24, 2020

JoanFM commented Dec 24, 2020

Choose a reason for hiding this comment

hanxiao commented Dec 24, 2020 • edited

github-actions bot commented Dec 24, 2020 • edited

Latency summary

Breakdown

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hanxiao Dec 25, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hanxiao Jan 1, 2021 • edited

Choose a reason for hiding this comment

codecov bot commented Jan 1, 2021 • edited

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hanxiao Jan 1, 2021 • edited

Choose a reason for hiding this comment

hanxiao commented Dec 24, 2020 •

edited

hanxiao commented Dec 24, 2020 •

edited

github-actions bot commented Dec 24, 2020 •

edited

hanxiao Dec 25, 2020 •

edited

hanxiao Jan 1, 2021 •

edited

codecov bot commented Jan 1, 2021 •

edited

hanxiao Jan 1, 2021 •

edited