Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement/select device #488

Merged
merged 59 commits into from Jun 16, 2020
Merged

Enhancement/select device #488

merged 59 commits into from Jun 16, 2020

Conversation

mtar
Copy link
Collaborator

@mtar mtar commented Feb 17, 2020

Description

This PR changes the names used for device selection for testing purposes. The old 'header' is now in BasicTest

Changes proposed:

  • rename Environment variable: DEVICE -> HEAT_TEST_USE_DEVICE
  • New value 'lcpu': global device set to gpu, local device to cpu
  • Rename variable: device -> torch_device
  • setupclass function in BasicTest
  • all unit test cases inherit from BasicTests

Type of change

Remove irrelevant options:

  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Due Diligence

  • All split configurations tested
  • Multiple dtypes tested in relavent functions
  • Documentation updated (if needed)
  • Updated changelog.md under the title "Pending Additions"

Does this change modify the behaviour of other functions? If so, which?

yes, all test functions

@codecov
Copy link

codecov bot commented Feb 17, 2020

Codecov Report

Merging #488 into master will increase coverage by 0.93%.
The diff coverage is 98.18%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #488      +/-   ##
==========================================
+ Coverage   96.48%   97.41%   +0.93%     
==========================================
  Files          75       75              
  Lines       15276    15019     -257     
==========================================
- Hits        14739    14631     -108     
+ Misses        537      388     -149     
Impacted Files Coverage Δ
heat/core/tests/test_suites/basic_test.py 91.34% <57.14%> (-8.66%) ⬇️
heat/core/linalg/tests/test_qr.py 71.42% <66.66%> (+2.42%) ⬆️
heat/core/tests/test_devices.py 58.18% <79.41%> (-6.64%) ⬇️
heat/core/tests/test_io.py 93.05% <92.45%> (+1.50%) ⬆️
heat/cluster/tests/test_kmeans.py 100.00% <100.00%> (+10.41%) ⬆️
heat/cluster/tests/test_spectral.py 100.00% <100.00%> (+10.63%) ⬆️
heat/core/linalg/tests/test_basics.py 100.00% <100.00%> (+0.61%) ⬆️
heat/core/linalg/tests/test_solver.py 100.00% <100.00%> (+15.15%) ⬆️
heat/core/operations.py 92.73% <100.00%> (ø)
heat/core/tests/test_arithmetics.py 99.60% <100.00%> (+1.72%) ⬆️
... and 26 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fc77d90...01eccf1. Read the comment docs.

@mtar mtar added the testing Implementation of tests, or test-related issues label Feb 17, 2020
Copy link
Member

@coquelin77 coquelin77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please also revert back to ht_device instead of heat_device? looking through all of these files is painful when the refactor touches so many lines. Also, the change feels arbitrary.

CHANGELOG.md Outdated
@@ -15,6 +15,7 @@
- [#470](https://github.com/helmholtz-analytics/heat/pull/470) Enhancement: Accelerate distance calculations in kmeans clustering by introduction of new module spatial.distance
- [#478](https://github.com/helmholtz-analytics/heat/pull/478) `ht.array` now typecasts the local torch tensors if the torch tensors given are not the torch version of the specified dtype + unit test updates
- [#479](https://github.com/helmholtz-analytics/heat/pull/479) Completion of spatial.distance module to support 2D input arrays of different splittings (None or 0) and different datatypes, also if second input argument is None
- [#488](https://github.com/helmholtz-analytics/heat/pull/488) Enhancement: test device selection
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you expand on this? if someone was to read this they likely wouldnt understand what this means. Its okay to use two sentences here

ht.use_device("cpu")
torch.cuda.set_device(torch.device(ht.gpu.torch_device))
torch_device = ht.gpu.torch_device
heat_device = ht.gpu
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this a function instead? and then put it in the device.py file within core?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that it should be a library function. It is only needed in the tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i understand, but it goes against the programming model that we have everywhere else. and although this is only used in the unit tests at the moment, it may be useful as a tool for other developers if it is a library function.

@mtar
Copy link
Collaborator Author

mtar commented Mar 23, 2020

can you please also revert back to ht_device instead of heat_device? looking through all of these files is painful when the refactor touches so many lines. Also, the change feels arbitrary.

ht_device is also a arbitrary name 🤷‍♂️
I changed it because I use torch_device now, but if you are fond of it I will revert it back.

When I think about it, a heat device is a heater 😀

@coquelin77
Copy link
Member

can you please also revert back to ht_device instead of heat_device? looking through all of these files is painful when the refactor touches so many lines. Also, the change feels arbitrary.

ht_device is also a arbitrary name man_shrugging
I changed it because I use torch_device now, but if you are fond of it I will revert it back.

When I think about it, a heat device is a heater grinning

honestly, i like ht_device because we typically have import heat as ht but yes it is 100% arbitrary

device = ht.gpu.torch_device
ht_device = ht.gpu
ht.torch.cuda.set_device(device)
from heat.core.tests.test_suites.basic_test import BasicTest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably be called TestCase as well

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be imported as TestCase or should the name be changed in general?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name should be changed in general in my opinion

class TestKMeans(unittest.TestCase):
class TestKMeans(BasicTest):
@classmethod
def setUpClass(cls):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be explicitly called here? Should be automatically done due to inheritance

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ones without already existing setup functions can be removed. Good.

ht.use_device("gpu")
torch.cuda.set_device(torch.device(ht.gpu.torch_device))
torch_device = ht.cpu.torch_device
ht_device = ht.cpu
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The set_device here is sufficient. I do not think that the explicit device passing, i.e. device=self.ht_device is necessary in the test cases anymore.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use the ht_device for the special cases lcpu and lgpu where a tensor is created which is different from the standard device.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not passing the device directly does not work for me

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really want to test this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A user can still want to put a specific tensor on the cpu.
It tests that a function returns the a tensor of the input device type and not the use_device one.

Copy link
Collaborator Author

@mtar mtar Apr 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the opposite case in 'lgpu' is okay? It is setting gpu specifically while use_device is cpu?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also not, there should be GPU or CPU only for now

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then why do we even offer the device parameter on tensor creation? 😕

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless we are not meaning the same thing, allocating individual tensors on a different device should very much be allowed. For the test cases it is generally

a) not necessary to do it in this way,
b) should only be tested in a limited number of test cases in the factories function.

What I am getting here is, that you want to have different devices on different MPI ranks. This would go against the BSP model. If you want to write test cases that explicitly set the device on factory creation that is possible without the introduction of lcpu, lgpu. If you want to test whether you can allocate a DNDarray on a device different from the global one, there is no need for lcpu and lgpu, but rather something along the lines of

if ht.device == ht.cpu and torch.cuda.is_available:
    self.other_device = ht.gpu
else:
    self.other_device = ht.cpu

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you are setting the torch device here as well. We can also remove all the self.device.torch_device calls from the test cases right?

heat/core/tests/test_suites/basic_test.py Show resolved Hide resolved
@@ -1,4 +1,5 @@
from unittest import TestCase
import os
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole file still needs an init.py

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

init.py is no longer needed for Python 3.3+

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or what do you mean?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, it is not needed anymore. However, we have the init.pys in all the folders. I think we should be consistent here.

return TestCase.__device

@classmethod
def setUpClass(cls):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docsstring need to be updated here by the Raises section

Copy link
Collaborator Author

@mtar mtar Jun 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Raises added
  2. I found that there is already an __init__.py in the test_suites folder.
  3. Pytorch doesn't have an option to set the standard device. It is always CPU. torch.cuda.set_device() only sets the default GPU if no ID is given

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Can you import the test cases there as well?
  2. Understood.

@@ -1,4 +1,5 @@
from unittest import TestCase
import os
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, it is not needed anymore. However, we have the init.pys in all the folders. I think we should be consistent here.

ht.use_device("gpu")
torch.cuda.set_device(torch.device(ht.gpu.torch_device))
torch_device = ht.cpu.torch_device
ht_device = ht.cpu
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you are setting the torch device here as well. We can also remove all the self.device.torch_device calls from the test cases right?

@Markus-Goetz
Copy link
Member

Bum conflicts

@Markus-Goetz Markus-Goetz merged commit 42bf442 into master Jun 16, 2020
@Markus-Goetz Markus-Goetz deleted the enhancement/select-device branch June 16, 2020 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
testing Implementation of tests, or test-related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants