Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update IPC code to remove Cython #148

Merged
merged 10 commits into from
Jan 9, 2019
Merged

Update IPC code to remove Cython #148

merged 10 commits into from
Jan 9, 2019

Conversation

randyzwitch
Copy link
Contributor

@randyzwitch randyzwitch commented Jan 7, 2019

Goal of this PR is to remove the Cython build process completely, which should improve compatibility across package install tools.

WIP as I'm still trying to figure out if we can release shared memory; original attempt to call shmdt during load_buffer() killed the kernel, suggesting that pyarrow is still using that memory at least up to _load_schema if not all the way through _load_data().

Consider passing ptr value out of load_buffer() function, then calling shmdt twice after data has been made into pandas dataframe (once for each original handle passed from Thrift)

See if this can be pared down far enough to not need shm at all
Code works locally, removing the need for Cython to do the IPC. After 
this commit, will go back and refactor
@andrewseidl
Copy link
Contributor

Haven't looked too deeply at the new code yet, but CPU-only select_ipc finally works for me with a simple pip install, no conda needed at all. Tested with 3.7 (after tweaking pyarrow to a supported version) and 2.7.

Thanks!

@randyzwitch
Copy link
Contributor Author

Awesome @andrewseidl! Should be ready for a full review tomorrow.

Work towards #46 #31. After bringing shared memory into Python context, 
call shmdt to detach memory after df built. If this call doesn't fix 
believed memory leak, then either its not a Python issue or not 
something we can fix
Before trying to do IPC, check if libc is found
This needs validation @andrewseidl (along with the CPU deallocation). 
The GDF examples work on my local machine, so feels like it's either 
correct or a no-op
@randyzwitch
Copy link
Contributor Author

When this passes CI, it's ready for review.

Looking for someone who understands shared memory better than I do to validate that shmdt conceptually is doing what I expect, which is to detach the shared memory from the Python process. My understanding is that shmat brings the shared memory into the Python namespace, and my expectation is that calling shmdt, and calling the deallocate_* methods from Thrift will release the memory from Python and release the reference to it from OmniSci.

I call both of these memory-freeing methods after a pandas dataframe is created, so that the pandas df result will then be managed by the Python garbage collector, and no other IPC memory references should exist.

Using the Python debugger, shmdt returns the proper code from C for the given pointer, so the code is technically correct, I just need conceptual validation.

@randyzwitch randyzwitch changed the title [WIP] Update IPC code to remove Cython Update IPC code to remove Cython Jan 8, 2019
@randyzwitch randyzwitch mentioned this pull request Jan 8, 2019
@randyzwitch randyzwitch merged commit 983383b into master Jan 9, 2019
@randyzwitch randyzwitch deleted the IPCupdate branch January 9, 2019 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants