Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New PIR implementations #57

Merged
merged 87 commits into from
May 14, 2017
Merged

New PIR implementations #57

merged 87 commits into from
May 14, 2017

Conversation

ryscheng
Copy link
Member

@ryscheng ryscheng commented May 5, 2017

Added:

  • New Shard interface in libpir
  • ShardCPU and ShardCL implementations of this Shard
  • 3 different OpenCL kernel implementations for ShardCL
  • Correctness tests for all of them.

I have yet to benchmark all 3 OpenCL kernels, but it's a pretty big PR, so it's be nice to get a review earlier rather than later.

NOTE: the ShardCUDA implementation is tracked in the ryscheng-cuda branch.

@ryscheng
Copy link
Member Author

ryscheng commented May 8, 2017

Added the CUDA implementation

@willscott
Copy link
Member

pir/context_cl.go:44 - should this be a separate utility function rather than comment?
pir/context_cl.go:66 - can we flag which device to use in cases where there are multiple?
pir/context_cl.go:178 - why private method banner with no methods? get rid of it. more generally this banner should be implicit based on method names with capital / lowercase initial letters.
pir/cuda_modules/Makefile:4 - what is compute_35? why that? will it ever change?
pir/cuda_modules/pir.ptx:1 - this is generated code. shouldn't it be in gitignore?
pir/kernel_cl.go:21 - no. you can have a template file that you process, but having c code as string within go is not a reasonable way to structure this. put the kernels in c files.
pir/shard.go:12 - GetName is not part of the interface, really - implementations can over ride 'String()' to have a pretty-printing format, but this isn't a functional part of what we're asking a shard to be.
pir/shard_cl.go:186 - this seems like it's going to fully serialize kernel execution, which will result in the pipeline being fully empty at the point finish returns and meaning there are periods where the GPU doesn't have work to do. ideally the next compute call has been enqueued before the previous one finishes.
pir/shard_cl.go:198 - remove comment
pir/shard_test.go:111 - cpu tests should be in a separate file from the common helper functions

High level comments:

  • The primary difference between the PIR interface you're picking for shard and what's in the previous libPIR is that the previous libPIR would allocate the database memory for the writer to write into. (getDB, setDB). Are we okay relaxing that restrction?

  • the different systems are a bit intertwined. can we factor these to pir/cpu, pir/cuda, and pir/cl with the common interface left in pir/ ?

@ryscheng
Copy link
Member Author

ryscheng commented May 9, 2017

Okay I've started working through your comments. Just for prosperity's sake, I ran the benchmarks with OpenCL using the CPU instead, just to see how it works.
High-level, ShardCPU outperforms ShardCL on the CPU
BenchmarkShardCLReadv0-8 1 22108349995 ns/op
BenchmarkShardCLReadv1-8 1 49419166788 ns/op
BenchmarkShardCLReadv2-8 1 54085086233 ns/op
BenchmarkShardCLReadv3-8 1 27393025933 ns/op
BenchmarkShardCPUReadv0-8 1 61838823013 ns/op
BenchmarkShardCPUReadv1-8 1 8249138781 ns/op

@ryscheng
Copy link
Member Author

ryscheng commented May 10, 2017

Re: pir/cuda_modules
Specifying the architecture determines which subset of instructions you have access to. compute_35 corresponds to compute compatibility 3.5, which is when they added atomic instructions. We don't need any of the newer instructions yet, so I think we are okay here as long as your GPU has compute compatibility >3.5. See here for a more comprehensive list:
https://developer.nvidia.com/cuda-gpus

Re: committing the ptx file, I figured it was useful because

  1. it's just intermediate representation, so architecture independent and diffs properly
  2. it's pretty short
  3. means we don't need to have nvcc installed after you get the repo to run it.

Re: Shard interface
Personally, I think having Shard represent an immutable static partition of the database is a better abstraction and easier to test/debug than a server that flipped between 2 working copies of memory. If the concern is that you want to recycle existing memory allocations, we can still do that performance optimization (if its necessary), but the shard interface can be agnostic to it.

Just 3 more incomplete changes to go:

  • pir/context_cl.go:44 - should this be a separate utility function rather than comment?
  • pir/context_cl.go:66 - can we flag which device to use in cases where there are multiple?
  • pir/kernel_cl.go:21 - no. you can have a template file that you process, but having c code as string within go is not a reasonable way to structure this. put the kernels in c files.

@ryscheng
Copy link
Member Author

The plan was to separate kernel_cl.go into separate c files and read them in dynamically, but the problem is that the Cgo interface we are using for OpenCL requires a pointer to source code, and I spent all morning trying to generate a data structure that contained no go pointers. See here why:
golang/go#12416
Otherwise you get the following error:
"panic: runtime error: cgo argument has Go pointer to Go pointer [recovered]"
as seen in this commit: e853b1b

I'm officially giving up. I know kernel_cl.go is ugly, but from what I can tell, every consumer of this 'cl' package does the same thing.

Also with respect to specifying devices, in the future, my plan was to just let a single context enumerate all GPUs and create 1 context across all GPU devices. See this issue:
#58

In summary, could you take a look at the changes since the last review?

@@ -0,0 +1 @@
package pircl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stub?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With our build flags, the OpenCL and CUDA implementations are ignored for build/test on Travis. However, tests and goveralls will still look for a test report from these folders (which appear empty). See
https://travis-ci.org/privacylab/talek/jobs/230595872

I added doc.go and doc_test.go to keep the tests happy.

@willscott
Copy link
Member

i think the docs files are to make godocs show up? are tests needed as well?

I guess the only other comment is that it would be great to put in the pretty simple shim between this shard interface and the PIR interface currently used by server code. Then the end-to-end consistency test can verify correctness against these backends rather than doing that at the same time as changing server logic.

@ryscheng
Copy link
Member Author

Resolved the conflict with master

@willscott
Copy link
Member

when I try to use pircpu, its dependencies don't seem to have ever gotten installed
https://travis-ci.org/privacylab/talek/jobs/231947473

(if i run go get in that directory, i can compile locally)

That seems like it would be a problem more generally.

@ryscheng ryscheng merged commit 07c8e32 into master May 14, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants