Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fix / Enhancement array initialization different devices #678

Merged
merged 27 commits into from Oct 26, 2020

Conversation

mtar
Copy link
Collaborator

@mtar mtar commented Sep 25, 2020

Description

Refines the same device check on array initialization and also fixes the calling functions that it will not be called any longer. Kept as an assertion instead.

Lowers coverage

Issue/s resolved: #675

Changes proposed:

  • Refine the device check in DNDarray.__init__ and change it into an assertion
  • get right device in factories.array
    • remove the device id in devices.cpu.torch_device to be akin to str(torch.device)
    • add "cuda" in devices.__device_mapping allowing to pass torch.device.type to devices.sanitize.device
  • various changes on functions creating instances of DNDarray directly

Type of change

Bug fix

Due Diligence

  • All split configurations tested
  • Multiple dtypes tested in relevant functions
  • Documentation updated (if needed)
  • Updated changelog.md under the title "Pending Additions"

Does this change modify the behaviour of other functions? If so, which?

no

@mtar mtar changed the title add missing device changes for gpu Add missing device parameters Sep 25, 2020
@codecov
Copy link

codecov bot commented Sep 25, 2020

Codecov Report

Merging #678 into release/0.5.x will decrease coverage by 0.02%.
The diff coverage is 92.30%.

Impacted file tree graph

@@                Coverage Diff                @@
##           release/0.5.x     #678      +/-   ##
=================================================
- Coverage          97.46%   97.43%   -0.03%     
=================================================
  Files                 87       87              
  Lines              17101    17110       +9     
=================================================
+ Hits               16667    16671       +4     
- Misses               434      439       +5     
Impacted Files Coverage Δ
heat/core/devices.py 85.00% <50.00%> (-2.18%) ⬇️
heat/core/factories.py 99.05% <77.77%> (-0.95%) ⬇️
heat/core/_operations.py 93.61% <100.00%> (ø)
heat/core/dndarray.py 96.43% <100.00%> (-0.28%) ⬇️
heat/core/io.py 90.32% <100.00%> (ø)
heat/core/logical.py 100.00% <100.00%> (ø)
heat/core/manipulations.py 99.22% <100.00%> (ø)
heat/core/random.py 100.00% <100.00%> (ø)
heat/core/tests/test_exponential.py 100.00% <100.00%> (ø)
heat/core/tests/test_manipulations.py 99.92% <100.00%> (ø)
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 88f9157...27d7134. Read the comment docs.

@mtar mtar changed the base branch from master to release/0.5.x September 28, 2020 10:11
Bugfix remove implicit tests when  wrong device is given to DNDarray.
@mtar mtar marked this pull request as ready for review September 28, 2020 11:58
@mtar mtar changed the title Add missing device parameters Bug fix / Enhancement array initialization different devices Sep 28, 2020
Copy link
Contributor

@ClaudiaComito ClaudiaComito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done, thanks a lot for this. Was this all the device inconsistencies? I'm surprised we haven't got more.

I only have a few comments.

heat/core/factories.py Show resolved Hide resolved
heat/core/factories.py Show resolved Hide resolved

# change device if it do not match
if str(obj.device) != device.torch_device:
obj = obj.to(device.torch_device)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the implicit data transfer ok here? 🤔

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's analogous to dtype in factories.array(). The user explicitly wants to use a specific device here.

Copy link
Collaborator Author

@mtar mtar Oct 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On internal functions, however, it still can obfuscate some wrong devices if the device parameter is overused. I add a warning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, yes I can see why we want to transfer data here. On internal functions, we have to keep an eye on it during review and testing I guess.

CHANGELOG.md Outdated
@@ -1,3 +1,7 @@
# v0.5.1

- [#678](https://github.com/helmholtz-analytics/heat/pull/678) Bugfix: Many implicit array transfers on DNDarray initialization
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you be more descriptive? If I get it right, we are avoiding implicit array transfers or replacing them with ...

(Apologies for this comment, having spent so much time trying to figure out what our Pending additions for 0.5.0 meant, I'm a bit traumatized 😅 )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible change for clarity

Bugfix: Internal functions now use explicit device parametes for DNDarray and torch.Tensor initializations.

Copy link
Contributor

@ClaudiaComito ClaudiaComito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this, thanks Michael!

@mtar
Copy link
Collaborator Author

mtar commented Oct 7, 2020

I'm fine with this, thanks Michael!

But I'm not satisfied anymore 😃
After adding a warning, I found out that TestGaussianNB::test_fit_iris is causing a device transfer.

@mtar
Copy link
Collaborator Author

mtar commented Oct 8, 2020

run tests

Comment on lines 328 to 334
assert str(obj.device) == device.torch_device

if str(obj.device) != device.torch_device:
warnings.warn(
"Array 'obj' is not on device '{}'. It will be copied to it.".format(device),
UserWarning,
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I keep the assertion or the warning here?

Copy link
Collaborator Author

@mtar mtar Oct 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deleted the assertion as there are some rare cases where it is allowed. However the unittest results must be reviewed for possible wrong devices.

CHANGELOG.md Outdated
@@ -1,3 +1,7 @@
# v0.5.1

- [#678](https://github.com/helmholtz-analytics/heat/pull/678) Bugfix: Many implicit array transfers on DNDarray initialization
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible change for clarity

Bugfix: Internal functions now use explicit device parametes for DNDarray and torch.Tensor initializations.

@mtar mtar merged commit 9c90cb4 into release/0.5.x Oct 26, 2020
@mtar mtar deleted the bug/675-initdevice branch October 26, 2020 09:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[GPU] Frequent device tests on dndarray initialization
3 participants