Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modifying shared_data leads to unexpected behaviors #119

Closed
Weiming-Hu opened this issue Apr 24, 2020 · 4 comments
Closed

Modifying shared_data leads to unexpected behaviors #119

Weiming-Hu opened this issue Apr 24, 2020 · 4 comments
Assignees

Comments

@Weiming-Hu
Copy link
Contributor

Hi team, here is yet another confusion that I'm having. It might be a very simple one but I couldn't find much information on using the shared_data attribute of an app manager, although I'm aware of a quick tutorial here.

The following code will walk you through reproducing the issue in an interactive python session.

(venv) geogadmins-Air:year_3 wuh20$ python
Python 3.7.5 (default, Nov  1 2019, 02:16:32) 
[Clang 11.0.0 (clang-1100.0.33.8)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from radical.entk import AppManager
>>> app = AppManager(hostname = "two.radical-project.org", port = "33239")
EnTK session: re.session.geogadmins-Air.wuh20.018376.0001
Creating AppManagerSetting up RabbitMQ system                                 ok
                                                                              ok
>>> print(app.shared_data) # This is expected because it should start being empty
[]
>>> app.shared_data = ["my_shared_configuration.cfg"] # Include a single file manually
>>> print(app.shared_data) # Strange! Why is it not included? But actually this works when I submit my job as it is.
[]
>>> app.shared_data.extend(["config1.cfg", "config2.cfg"]) # If I want to include more files in this way, these files are not actually included as shared data when I try to submit the job as it is.
>>> print(app.shared_data) # Although now it is printing but the first file is missing.
['config1.cfg', 'config2.cfg']
>>> app.shared_data = ["my_shared_configuration.cfg"] # And it seems I can't change it.
>>> print(app.shared_data)
['config1.cfg', 'config2.cfg']

Because I have two sets of files to include, one is the single shared file that is the same for all tasks and the other is a set of files that are specific to each task, I ended up doing this in my code.

While I don't think there is anything wrong with this, I just found this to be a little bit unexpected. I hope you can help me clear some confusion.

Much appreciated. Thank you

@andre-merzky
Copy link

This behavior is an unfortunate side effect of how the RE API is implemented: the attribute setters are actually hooked into function calls which set internal state which does not always (as in this case) represent the actual attribute types being set. We agree that this is confusing and has unexpected side effects.

Changing the api has significant intertia for us, so there is no quick solution forthcoming. We will (a) better document this behavior, and (b) take this in account when redesigning the API for later major release cycles.

Thanks for letting us know about the confusion, much appreciated!

@mturilli
Copy link
Contributor

As discussed with @lee212, we will open two tickets in EnTK repo to improve data staging documentation and starting a RFC for an API iteration.

@andre-merzky andre-merzky removed their assignment May 8, 2020
@lee212
Copy link

lee212 commented May 22, 2020

@Weiming-Hu
Copy link
Contributor Author

Thank you very much. This is working in the devel branch of radical.entk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants