Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All devices are equal! #196

Merged
merged 15 commits into from Dec 16, 2020
Merged

All devices are equal! #196

merged 15 commits into from Dec 16, 2020

Conversation

Liamdoult
Copy link
Contributor

@Liamdoult Liamdoult commented Dec 14, 2020

This PR aims to resolve 4 issues:

  1. No tests for ANE
  2. Device to device conversion
  3. Failure to run GPU Tests
  4. Failing GPU tests

Sorry, but this will be long.

1. No tests for ANE

Test calls have been added for the ANE device which will envoke ANE tensors when running tests. Same as GPU

@unittest.skipUnless(ANE, "Requires ANE")
class TestNNANE(TestNN):
  device=DeviceTypes.ANE

2. Device to device conversion

To be able to support ANE in tests, it is required that ANE support the same conversion as CPU->GPU and GPU->CPU.
I have created a universal any to any converter. This function accepts any of the accepted datatypes (GPUBuffer, ane.tensor and any np.ndarray compatible data) and returns the converted data.

@staticmethod
def data_to_device(data, device=DeviceTypes.CPU):

3. Failure to run GPU Tests

CI tests have not been run since 1a1c63a. This appears to have been caused by a missing six dependency that was thrown inside the try except for importing pyopencl, therefore defaulting GPU=False

4. Failing GPU tests

GPU tests are failing. Don't think I caused it as it is accuracy asserts being raised, but not 100%:
i.e.

>         np.testing.assert_allclose(t.grad, tt.grad.cpu().data, atol=grad_atol, rtol=grad_rtol)
E         AssertionError: 
E         Not equal to tolerance rtol=1e-06, atol=1e-06
E         
E         Mismatched elements: 2925 / 2925 (100%)

Potentially a change since OpenCL tests last passed.

ANE, CPU and OCL all now support all tests.

However tests are not currently passing on GPU and I cannot test on CPU.

Failing GPU test are not an issue caused by this update. Tests have not
been passing due to a missing "six" required installation.

OpenCL Tests have not been run since commit: 1a1c63a

devices have 3 types and are handle by a new DeviceTypes enum. (The goal
is to revert to Tensor.<type>, but this current setup allows for keyword
argument defaults: `device=DeviceType.CPU`)

All references to Tensor.GPU/CPU/ANE as been converted to the
corresponding `DeviceTypes` enum.

Refactor of the conversion code to allow for any device to any device
conversion.
@Liamdoult
Copy link
Contributor Author

Liamdoult commented Dec 14, 2020

Before ready to review:

  • Passing GPU tests
  • Clean code
  • GPU, CPU copy profiling still works
  • ANE Profiling added
  • Reduce lines

Move six into gpu required installs. Remove six from standard
installation.
@Liamdoult Liamdoult mentioned this pull request Dec 14, 2020
extra/training.py Outdated Show resolved Hide resolved
setup.py Show resolved Hide resolved
tinygrad/tensor.py Outdated Show resolved Hide resolved
tinygrad/tensor.py Outdated Show resolved Hide resolved
@geohot
Copy link
Collaborator

geohot commented Dec 14, 2020

Very nice, much needed. Let's clean it up

@Liamdoult
Copy link
Contributor Author

Liamdoult commented Dec 15, 2020

Important API change:

.cuda -> .gpu
.cuda_ -> .gpu_
.gpu -> .is_gpu

This now matches Device.GPU naming.

I do recommend that in the future this is changed to ocl and OCL for opencl as GPU can fallback to CPU using the opencl standard.

Tensor now also supports:

.cpu
.cpu_
.is_cpu
.ane
.ane_
.is_ane

@Liamdoult
Copy link
Contributor Author

@geohot Can't test ANE personally.

@Liamdoult Liamdoult marked this pull request as ready for review December 15, 2020 14:15
@geohot
Copy link
Collaborator

geohot commented Dec 16, 2020

Congrats, nice PR!

@geohot geohot merged commit bcf1518 into tinygrad:master Dec 16, 2020
@Liamdoult Liamdoult deleted the all-devices-are-equal branch December 16, 2020 08:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants