Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCM/CUDA: cudaFree() hooks #1906

Merged
merged 3 commits into from Nov 1, 2017
Merged

UCM/CUDA: cudaFree() hooks #1906

merged 3 commits into from Nov 1, 2017

Conversation

bureddy
Copy link
Contributor

@bureddy bureddy commented Oct 11, 2017

No description provided.

@swx-jenkins1
Copy link

Test FAILed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/2791/ for details.

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/4796/ for details (Mellanox internal link).

@@ -1 +1,2 @@
AM_CONDITIONAL([HAVE_CUDA], [true])
AC_DEFINE([HAVE_CUDA], 1, [cuda enable])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this PR really depends on the PR which adds configuration flags from cuda, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, will resolve this once config flag is in first

static int ucm_cudamem_installed = 0;
static pthread_mutex_t install_mutex = PTHREAD_MUTEX_INITIALIZER;
ucm_reloc_patch_t *patch;
ucs_status_t status = UCS_OK;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setting status=UCS_OK should be moved to line 58

* the event handler, and if event handler returns error code - calls the original
* function.
*/
#define UCM_DEFINE_CUDA_FUNC(_name, _rettype, _fail_val, ...) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't it possible to reuse the existing mmap() macros or extend them if needed?
no copy-paste please..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are almost same, we can extend if we move macros to some common header file. is that ok?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved macros to commons header

@@ -334,6 +337,25 @@ void *ucm_sbrk(intptr_t increment)
return event.sbrk.result;
}

#if HAVE_CUDA
cudaError_t ucm_cudaFree(void *addr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this should be somewhere in src/ucm/cuda/install.c?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think i have to keep it here, otherwise have to expose ucm_event_enter()/leave()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@yosefe
Copy link
Contributor

yosefe commented Oct 13, 2017

can you please also add a unit test for this?

@yosefe yosefe added the Feature New feature label Oct 13, 2017
@yosefe
Copy link
Contributor

yosefe commented Oct 13, 2017

also, the commit title should start with "UCM/CUDA: ..."

@swx-jenkins1
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/2841/ for details.

@mellanox-github
Copy link
Contributor

Test PASSed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/4848/ for details (Mellanox internal link).

ucm_trace("%s()", __FUNCTION__); \
\
if (ucs_unlikely(orig_func_ptr == NULL)) { \
pthread_mutex_lock(&ucm_##_category##_get_orig_lock); \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need a separate lock for cuda?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really. but not sure if I can unify lock for both mmap and cuda

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think that if the lock will not be split, there would not be a need to define separate macros for cuda and mmap

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unified lock and used same macro for cuda and mmap



static ucm_reloc_patch_t ucm_cudamem_symbol_patches[] = {
{"cudaFree", ucm_override_cudaFree},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why need an array? we have only one function to patch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to able to add new funcs in future if needed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then we can add the array in the future when it would be needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@swx-jenkins1
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/2848/ for details.

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/4857/ for details (Mellanox internal link).

@yosefe
Copy link
Contributor

yosefe commented Oct 20, 2017

bot:mlx:retest

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/4879/ for details (Mellanox internal link).

@bureddy
Copy link
Contributor Author

bureddy commented Oct 20, 2017

bot:mlx:retest

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/4880/ for details (Mellanox internal link).

@@ -334,6 +337,25 @@ void *ucm_sbrk(intptr_t increment)
return event.sbrk.result;
}

#if HAVE_CUDA
cudaError_t ucm_cudaFree(void *addr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@@ -46,4 +47,74 @@ ucs_status_t ucm_reloc_modify(ucm_reloc_patch_t* patch);
void* ucm_reloc_get_orig(const char *symbol, void *replacement);


extern pthread_mutex_t ucm_reloc_get_orig_lock;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to create new file: replace,h, replace.c
"reloc" is handling changing program relocation tables
these are macros to generate replacement function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved headers to new file utils/replace.h.
Also unified {cuda,mmap}/replace.c to util/replace.c. is this ok?

@swx-jenkins1
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/2872/ for details.

@bureddy bureddy force-pushed the cuda-mem-hooks branch 2 times, most recently from 957e2e3 to 3061074 Compare October 23, 2017 21:50
@swx-jenkins1
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/2873/ for details.

@swx-jenkins1
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/2874/ for details.

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/4897/ for details (Mellanox internal link).

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/4899/ for details (Mellanox internal link).

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/4900/ for details (Mellanox internal link).

@yosefe
Copy link
Contributor

yosefe commented Oct 26, 2017

bot:mlx:retest

@mellanox-github
Copy link
Contributor

Test PASSed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/4935/ for details (Mellanox internal link).

@yosefe
Copy link
Contributor

yosefe commented Oct 29, 2017

@bureddy code looks good, however need to add a unit test for this. see #1906 (comment)

@swx-jenkins1
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/2929/ for details.

@swx-jenkins1
Copy link

Test FAILed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/2930/ for details.

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/4974/ for details (Mellanox internal link).

@swx-jenkins1
Copy link

Test FAILed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/2931/ for details.

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/4975/ for details (Mellanox internal link).

@swx-jenkins1
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/2932/ for details.

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/4976/ for details (Mellanox internal link).

@bureddy
Copy link
Contributor Author

bureddy commented Oct 31, 2017

bot:mlx:retest

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/4977/ for details (Mellanox internal link).

@swx-jenkins1
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/2933/ for details.

@@ -518,6 +518,14 @@ run_coverity() {
# Run the test suite (gtest)
#
run_gtest() {

#load cuda modules if available
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation (use tabs)


#load cuda modules if available
if module_load dev/cuda; then
if module_load dev/gdrcopy; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the purpose of using 'if' ? can just do module_load dev/cuda || true

}

/* Install memory hooks */
result = ucm_set_event_handler(UCM_EVENT_VM_UNMAPPED,0, cuda_mem_event_callback,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space after UCM_EVENT_VM_UNMAPPED,


ret = cudaFree(ptr);
EXPECT_EQ(ret, cudaSuccess);
EXPECT_EQ(ptr, free_ptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to reset free_ptr to NULL before every cudaMalloc

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/4978/ for details (Mellanox internal link).

@swx-jenkins1
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/2937/ for details.

@mellanox-github
Copy link
Contributor

Test PASSed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/4982/ for details (Mellanox internal link).

@bureddy
Copy link
Contributor Author

bureddy commented Oct 31, 2017

@yosefe updated pr with review fixes

EXPECT_EQ(ret, cudaSuccess);
EXPECT_EQ(ptr1, free_ptr);

ret = cudaFree(ptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to set free_ptr = NULL here as well..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add it . But i think it should be fine becuase free_ptr = ptr1 here, which is different from ptr.

@mellanox-github
Copy link
Contributor

Test PASSed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/4985/ for details (Mellanox internal link).

@swx-jenkins1
Copy link

Test FAILed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/2940/ for details.

@bureddy
Copy link
Contributor Author

bureddy commented Oct 31, 2017

bot:bgate:retest

@swx-jenkins1
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/2943/ for details.

@yosefe yosefe merged commit 173338c into openucx:master Nov 1, 2017
@bureddy bureddy deleted the cuda-mem-hooks branch November 3, 2017 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature New feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants