Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Add support for multimachine operators. #151

Closed
wants to merge 6 commits into from

Conversation

shreyask23
Copy link

Adds support for multimachine operators. Operators can now be configured with IP addresses and ports for the machines on which they should run.

Changes

  • The OperatorConfig class in operator.py now takes in an optional addr attribute that defines the address of the machine on which the operator should run.
  • The connect method and run_async method in __ init __.py now keep track of the addresses on which any operators run via two new global variables, node_addrs and num_procs. They use these addresses to set up the networking to connect these operators.
  • The file multimachine_pipeline.py provides an example use case for multimachine operators.

Copy link
Member

@pschafhalter pschafhalter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start! Needs a bit of cleanup but otherwise looks good. I will try this out to see if it works.

python/erdos/operator.py Outdated Show resolved Hide resolved
python/erdos/operator.py Outdated Show resolved Hide resolved
python/erdos/__init__.py Outdated Show resolved Hide resolved
python/erdos/__init__.py Outdated Show resolved Hide resolved
python/erdos/__init__.py Outdated Show resolved Hide resolved
python/erdos/__init__.py Outdated Show resolved Hide resolved
python/erdos/__init__.py Outdated Show resolved Hide resolved
python/erdos/__init__.py Outdated Show resolved Hide resolved
Copy link
Member

@pschafhalter pschafhalter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting close, but could you make the following change to improve clarity?

Thinking about this further, it seems a bit counter-intuitive to store both the address and a port offset in node_to_address. I think it would be better to just store the node -> address mapping and directly compute the port for each node in run_async.
This would also eliminate the need for processes_per_address in connect.

Comment on lines 91 to 93
if len(node_to_address) == 0:
processes_per_address[config._address] = 1
node_to_address[0] = (config._address, 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why this is needed?

if len(node_to_address) == 0:
processes_per_address[config._address] = 1
node_to_address[0] = (config._address, 1)
processes_per_address[config._address] = \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you wrap the assignment in parenthesis rather than using \?

E.g.

processes_per_address[config._address] = (
            processes_per_address.get(config._address, 0) + 1)

Comment on lines 88 to 97
global node_to_address
addresses = list(node_to_address.values())
processes_per_address = {a: addresses.count(a) for a in set(addresses)}
if len(node_to_address) == 0:
processes_per_address[config._address] = 1
node_to_address[0] = (config._address, 1)
processes_per_address[config._address] = \
processes_per_address.get(config._address, 0) + 1
node_to_address[node_id] = (config._address,
processes_per_address[config._address])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this entire section -- I think it could be done in a clearer way. See my review comment.

@sukritkalra
Copy link
Collaborator

Closing because too stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants