New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] Add support for multimachine operators. #151
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great start! Needs a bit of cleanup but otherwise looks good. I will try this out to see if it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting close, but could you make the following change to improve clarity?
Thinking about this further, it seems a bit counter-intuitive to store both the address and a port offset in node_to_address
. I think it would be better to just store the node -> address mapping and directly compute the port for each node in run_async
.
This would also eliminate the need for processes_per_address
in connect
.
python/erdos/__init__.py
Outdated
if len(node_to_address) == 0: | ||
processes_per_address[config._address] = 1 | ||
node_to_address[0] = (config._address, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain why this is needed?
python/erdos/__init__.py
Outdated
if len(node_to_address) == 0: | ||
processes_per_address[config._address] = 1 | ||
node_to_address[0] = (config._address, 1) | ||
processes_per_address[config._address] = \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you wrap the assignment in parenthesis rather than using \
?
E.g.
processes_per_address[config._address] = (
processes_per_address.get(config._address, 0) + 1)
python/erdos/__init__.py
Outdated
global node_to_address | ||
addresses = list(node_to_address.values()) | ||
processes_per_address = {a: addresses.count(a) for a in set(addresses)} | ||
if len(node_to_address) == 0: | ||
processes_per_address[config._address] = 1 | ||
node_to_address[0] = (config._address, 1) | ||
processes_per_address[config._address] = \ | ||
processes_per_address.get(config._address, 0) + 1 | ||
node_to_address[node_id] = (config._address, | ||
processes_per_address[config._address]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this entire section -- I think it could be done in a clearer way. See my review comment.
Closing because too stale. |
Adds support for multimachine operators. Operators can now be configured with IP addresses and ports for the machines on which they should run.
Changes
OperatorConfig
class in operator.py now takes in an optionaladdr
attribute that defines the address of the machine on which the operator should run.connect
method andrun_async
method in __ init __.py now keep track of the addresses on which any operators run via two new global variables, node_addrs and num_procs. They use these addresses to set up the networking to connect these operators.