Permalink
Browse files

Removed DeviceReset from the module.

After getting a couple of reports of difficulties with DeviceReset
functionality, I decided that it is not actually a useful function
for a simple set of bindings. It should certainly appear in the full
driver API wrapper, but it serves only to complicate matters in the
context of these bindings.
  • Loading branch information...
run4flat committed Jul 6, 2011
1 parent 1c65f6e commit efe7457a3d0023f7250b8e00f1a11e7355d73e25
Showing with 28 additions and 38 deletions.
  1. +12 −21 lib/CUDA/Minimal.pm
  2. +15 −6 lib/CUDA/Minimal.xs
  3. +1 −11 t/z_kernel_invocations.t
View
@@ -14,7 +14,7 @@ require Exporter;
our @ISA = qw(Exporter);
our %EXPORT_TAGS = (
- 'error' => [qw(ThereAreCudaErrors GetLastError PeekAtLastError DeviceReset)],
+ 'error' => [qw(ThereAreCudaErrors GetLastError PeekAtLastError)],
'memory' => [qw(Free Malloc MallocFrom Transfer)],
'util' => [qw(SetSize Sizeof)],
'sync' => [qw(ThreadSynchronize)],
@@ -862,21 +862,6 @@ sub ThereAreCudaErrors () {
return PeekAtLastError() ne 'no error';
}
-=head2 DeviceReset
-
-As described in L</Unspecified launch failure>, run time errors in your kernel
-will cause all future kernel launches to fail, as well. The only method of which
-I am aware for recovering from this is to reset your device by calling
-C<DeviceReset>.
-
-However, resetting your device is not as simple as you might hope: it also
-invalidates all your device pointers. The upshot is that if a kernel launch
-fails, you can only proceed by starting from scratch or by copying the data
-currently on your device back to the CPU (and back to the GPU after the
-DeviceReset).
-
-=cut
-
require XSLoader;
XSLoader::load('CUDA::Minimal', $VERSION);
@@ -1306,18 +1291,24 @@ __END__
=head1 Unspecified launch failure
-working here
-
Normally CUDA's error status is reset to C<cudaSuccess> after calling
C<cudaGetLastError>, which happens when any of the functions in CUDA::Minimum
croak, or when you manually call L</GetLastError>. With one exception, later
checks for CUDA errors should be ok unless B<they> actually had trouble. The
exception is the C<unspecified launch failure>, which will cause all further
kernel launches to fail with C<unspecified launch failure>. You can still copy
memory to and from the device, but kernel launches will fail. The only way to
-recover from this problem without completely quitting the program is to call
-L</DeviceReset>. However, that will also invalidate your device pointers. In
-other words, recovery from a failed kernel launch is very messy.
+recover from this problem is to completely close the program.
+
+The CUDA Toolkit for versions beyond 4.1 provides a function called
+C<cudaDeviceReset>, which lets you reset the device without completely quitting
+the program. Because CUDA::Minimal is meant to be a set of simple and incomplete
+bindings, C<CUDA::Minimal> does not provide access to this function. If you find
+that you need this function, you can write your own bindings using L<Inline::C>
+or incorporate such bindings into your own XS code. Note that calling
+C<cudaDeviceReset> also invalidates your device pointers, so that you must copy
+data off the device before resetting it. Put simply, recovery from a failed
+kernel launch is very messy.
The best solution to this, in my opinion, is to make sure you have rock-solid
input validation before invoking kernels. If your kernels only know how to
View
@@ -4,12 +4,6 @@
#include "ppport.h"
-#include <cuda.h>
-
-#ifndef CUDA_VERSION
-#define CUDA_VERSION 0
-#endif
-
MODULE = CUDA::Minimal PACKAGE = CUDA::Minimal
void
@@ -152,6 +146,20 @@ PeekAtLastError()
OUTPUT:
RETVAL
+
+// Thanks to Kartik for the compiler-directive work-around code. I am removing
+// the DeviceReset bindings for now because they are only in the latest toolkit
+// (as of July 2011), and not appropriate for this module. However, conditional
+// bindings like these should show up in the driver wrapper, whenver that
+// appears.
+
+/*
+#include <cuda.h>
+
+#ifndef CUDA_VERSION
+#define CUDA_VERSION 0
+#endif
+
SV *
DeviceReset()
CODE:
@@ -165,3 +173,4 @@ DeviceReset()
OUTPUT:
RETVAL
+*/
View
@@ -1,4 +1,4 @@
-use Test::More tests => 28;
+use Test::More tests => 26;
# This file starts with z_ to ensure that it runs last.
@@ -86,13 +86,3 @@ ok(ThereAreCudaErrors, "Good kernels invoked after a failed kernel launch also f
# Check the return value of GetLastError:
CUDA::Minimal::Tests::succeed_test();
like(GetLastError, qr/unspecified/, 'Further kernel invocations return an unspecified launch failure');
-
-# See if a device reset allows for later kernel launches:
-DeviceReset;
-CUDA::Minimal::Tests::succeed_test();
-ok(!ThereAreCudaErrors, 'Kernel invocations after DeviceReset succeed');
-
-# Check if the original device-allocated memory is still good.
-CUDA::Minimal::Tests::cuda_multiply_by_constant($dev_ptr, $N_elements, 4);
-ThreadSynchronize;
-ok(ThereAreCudaErrors, 'Device resets invalidate previously allocated memory');

0 comments on commit efe7457

Please sign in to comment.