New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCM/CUDA: cudaFree() hooks #1906
Conversation
Test FAILed. |
Test FAILed. |
config/m4/cuda.m4
Outdated
@@ -1 +1,2 @@ | |||
AM_CONDITIONAL([HAVE_CUDA], [true]) | |||
AC_DEFINE([HAVE_CUDA], 1, [cuda enable]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this PR really depends on the PR which adds configuration flags from cuda, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, will resolve this once config flag is in first
src/ucm/cuda/install.c
Outdated
static int ucm_cudamem_installed = 0; | ||
static pthread_mutex_t install_mutex = PTHREAD_MUTEX_INITIALIZER; | ||
ucm_reloc_patch_t *patch; | ||
ucs_status_t status = UCS_OK; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
setting status=UCS_OK should be moved to line 58
src/ucm/cuda/replace.c
Outdated
* the event handler, and if event handler returns error code - calls the original | ||
* function. | ||
*/ | ||
#define UCM_DEFINE_CUDA_FUNC(_name, _rettype, _fail_val, ...) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't it possible to reuse the existing mmap() macros or extend them if needed?
no copy-paste please..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they are almost same, we can extend if we move macros to some common header file. is that ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved macros to commons header
@@ -334,6 +337,25 @@ void *ucm_sbrk(intptr_t increment) | |||
return event.sbrk.result; | |||
} | |||
|
|||
#if HAVE_CUDA | |||
cudaError_t ucm_cudaFree(void *addr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this should be somewhere in src/ucm/cuda/install.c?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think i have to keep it here, otherwise have to expose ucm_event_enter()/leave()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
can you please also add a unit test for this? |
also, the commit title should start with "UCM/CUDA: ..." |
2cfddd3
to
1d991c2
Compare
Test PASSed. |
Test PASSed. |
src/ucm/util/reloc.h
Outdated
ucm_trace("%s()", __FUNCTION__); \ | ||
\ | ||
if (ucs_unlikely(orig_func_ptr == NULL)) { \ | ||
pthread_mutex_lock(&ucm_##_category##_get_orig_lock); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need a separate lock for cuda?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really. but not sure if I can unify lock for both mmap and cuda
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think that if the lock will not be split, there would not be a need to define separate macros for cuda and mmap
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unified lock and used same macro for cuda and mmap
src/ucm/cuda/install.c
Outdated
|
||
|
||
static ucm_reloc_patch_t ucm_cudamem_symbol_patches[] = { | ||
{"cudaFree", ucm_override_cudaFree}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why need an array? we have only one function to patch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to able to add new funcs in future if needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then we can add the array in the future when it would be needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
1d991c2
to
0064411
Compare
Test PASSed. |
Test FAILed. |
bot:mlx:retest |
Test FAILed. |
bot:mlx:retest |
Test FAILed. |
@@ -334,6 +337,25 @@ void *ucm_sbrk(intptr_t increment) | |||
return event.sbrk.result; | |||
} | |||
|
|||
#if HAVE_CUDA | |||
cudaError_t ucm_cudaFree(void *addr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
src/ucm/util/reloc.h
Outdated
@@ -46,4 +47,74 @@ ucs_status_t ucm_reloc_modify(ucm_reloc_patch_t* patch); | |||
void* ucm_reloc_get_orig(const char *symbol, void *replacement); | |||
|
|||
|
|||
extern pthread_mutex_t ucm_reloc_get_orig_lock; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to create new file: replace,h, replace.c
"reloc" is handling changing program relocation tables
these are macros to generate replacement function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved headers to new file utils/replace.h.
Also unified {cuda,mmap}/replace.c to util/replace.c. is this ok?
Test PASSed. |
957e2e3
to
3061074
Compare
Test PASSed. |
Test PASSed. |
Test FAILed. |
Test FAILed. |
Test FAILed. |
bot:mlx:retest |
Test PASSed. |
@bureddy code looks good, however need to add a unit test for this. see #1906 (comment) |
Test PASSed. |
Test FAILed. |
Test FAILed. |
06e634d
to
7939dd3
Compare
Test FAILed. |
Test FAILed. |
7939dd3
to
ea631cc
Compare
Test PASSed. |
Test FAILed. |
bot:mlx:retest |
Test FAILed. |
ea631cc
to
e7263d0
Compare
Test PASSed. |
contrib/test_jenkins.sh
Outdated
@@ -518,6 +518,14 @@ run_coverity() { | |||
# Run the test suite (gtest) | |||
# | |||
run_gtest() { | |||
|
|||
#load cuda modules if available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation (use tabs)
contrib/test_jenkins.sh
Outdated
|
||
#load cuda modules if available | ||
if module_load dev/cuda; then | ||
if module_load dev/gdrcopy; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the purpose of using 'if' ? can just do module_load dev/cuda || true
test/gtest/ucm/cuda_hooks.cc
Outdated
} | ||
|
||
/* Install memory hooks */ | ||
result = ucm_set_event_handler(UCM_EVENT_VM_UNMAPPED,0, cuda_mem_event_callback, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space after UCM_EVENT_VM_UNMAPPED,
|
||
ret = cudaFree(ptr); | ||
EXPECT_EQ(ret, cudaSuccess); | ||
EXPECT_EQ(ptr, free_ptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to reset free_ptr to NULL before every cudaMalloc
Test FAILed. |
e7263d0
to
f9daad5
Compare
Test PASSed. |
Test PASSed. |
@yosefe updated pr with review fixes |
test/gtest/ucm/cuda_hooks.cc
Outdated
EXPECT_EQ(ret, cudaSuccess); | ||
EXPECT_EQ(ptr1, free_ptr); | ||
|
||
ret = cudaFree(ptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to set free_ptr = NULL here as well..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add it . But i think it should be fine becuase free_ptr = ptr1 here, which is different from ptr.
f9daad5
to
8b20824
Compare
Test PASSed. |
Test FAILed. |
bot:bgate:retest |
Test PASSed. |
No description provided.