polling seems not to be working with jinad #1815

JoanFM · 2021-01-29T09:34:42Z

Describe the bug
In the same Pod, there seems to be only one Pea receiving all the load.

hanxiao · 2021-01-29T18:55:02Z

can't reproduce in the following steps:

run jinad
run the following code

import numpy as np

from jina import Flow

with Flow().add(host='localhost:8000', parallel=3) as f:
    f.index(np.random.random([100000, 10]))

wait for jinad's log and you can see, by the termination, the peas are receiving on avg. 333 requests, which ~= 100,000 / 100 (request_size) / 3

👻         DAEMON@23375[I]:127.0.0.1:64806 is disconnected
         pod0/3@23491[I]:recv ControlRequest  from ctl▸pod0/3/ZEDRuntime▸⚐
         pod0/3@23491[I]:#sent: 670 #recv: 335 sent_size: 4.0 MB recv_size: 3.9 MB
         pod0/3@23491[I]:no update since 2021-01-29 19:51:50, will not save. If you really want to save it, call "touch()" before "save()" to force saving
         pod0/3@23375[S]:terminated
👻       PeaStore@23375[S]:445d5011-a452-4d78-a9b6-889dee224370 is released from the store.
👻         DAEMON@23375[I]:127.0.0.1:64784 is disconnected
         pod0/2@23489[I]:recv ControlRequest  from ctl▸pod0/2/ZEDRuntime▸⚐
         pod0/2@23489[I]:#sent: 670 #recv: 335 sent_size: 4.0 MB recv_size: 3.9 MB
         pod0/2@23489[I]:no update since 2021-01-29 19:51:49, will not save. If you really want to save it, call "touch()" before "save()" to force saving
         pod0/2@23375[S]:terminated
👻       PeaStore@23375[S]:f7982669-54ab-4518-81b7-1472ded8f191 is released from the store.
👻         DAEMON@23375[I]:127.0.0.1:64762 is disconnected
         pod0/1@23487[I]:recv ControlRequest  from ctl▸pod0/1/ZEDRuntime▸⚐
         pod0/1@23487[I]:#sent: 666 #recv: 333 sent_size: 4.0 MB recv_size: 3.9 MB
         pod0/1@23487[I]:no update since 2021-01-29 19:51:49, will not save. If you really want to save it, call "touch()" before "save()" to force saving
         pod0/1@23375[S]:terminated

JoanFM · 2021-01-30T11:01:11Z

Did u see them actually receiving IndexRequests? I do not remember what avg results were being printed, but we did not see them receiving any data

hanxiao · 2021-02-01T12:49:22Z

yes, they do you can replicate my steps on your laptop and check it out. For understanding the bug I need a reproducible example.

JoanFM · 2021-02-02T13:12:26Z

It does not seem to be a problem

JoanFM · 2021-02-03T09:02:11Z

Problem has been seen again!

JoanFM · 2021-02-03T20:52:16Z

It seems that when request arrivrd then no IDLE pea existed.

Would it make sense or is it possible to randomize which pea receives request if none is idle? it seems to be always the first Pea

hanxiao · 2021-02-03T21:15:05Z

i dont understand what is the problem what is the code that can reproduce the problem?

JoanFM · 2021-02-03T22:08:33Z

i dont understand what is the problem what is the code that can reproduce the problem?

It is not a problem I think. It is not easy to reproduce but we were seeing a case where scheduling: load_balance and polling: any were leading to the data always being sent to the same shard.

But it can be due to the fact that by the time next request arrived no Pea was IDLE. I am not sure if this is a known behavior.

we did not manage to reproduce it consistently, but we see it often in our tests in AWS

JoanFM · 2021-02-03T22:20:43Z

Will try to debug further

JoanFM · 2021-02-04T09:54:52Z

https://stackoverflow.com/questions/52278364/is-id-returns-the-actual-memory-address-in-cpython

The problem is that we are relying on id to identify the identity of a ZMQlet. Which can collide when working in different processes

JoanFM added type/bug Something isn't working priority/critical Urgent: Security, critical bugs, blocking issues. drop everything until this issue is addressed. labels Jan 29, 2021

JoanFM closed this as completed Feb 2, 2021

JoanFM reopened this Feb 3, 2021

JoanFM removed the priority/critical Urgent: Security, critical bugs, blocking issues. drop everything until this issue is addressed. label Feb 3, 2021

JoanFM linked a pull request Feb 4, 2021 that will close this issue

fix(zmq): change identity from id to uuid #1857

Merged

JoanFM closed this as completed in #1857 Feb 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

polling seems not to be working with jinad #1815

polling seems not to be working with jinad #1815

JoanFM commented Jan 29, 2021

hanxiao commented Jan 29, 2021 •

edited

JoanFM commented Jan 30, 2021

hanxiao commented Feb 1, 2021

JoanFM commented Feb 2, 2021

JoanFM commented Feb 3, 2021

JoanFM commented Feb 3, 2021

hanxiao commented Feb 3, 2021

JoanFM commented Feb 3, 2021

JoanFM commented Feb 3, 2021

JoanFM commented Feb 4, 2021

polling seems not to be working with jinad #1815

polling seems not to be working with jinad #1815

Comments

JoanFM commented Jan 29, 2021

hanxiao commented Jan 29, 2021 • edited

JoanFM commented Jan 30, 2021

hanxiao commented Feb 1, 2021

JoanFM commented Feb 2, 2021

JoanFM commented Feb 3, 2021

JoanFM commented Feb 3, 2021

hanxiao commented Feb 3, 2021

JoanFM commented Feb 3, 2021

JoanFM commented Feb 3, 2021

JoanFM commented Feb 4, 2021

hanxiao commented Jan 29, 2021 •

edited