OpenCL counterpart of cuDNN #34

dagamayank · 2016-05-25T03:01:26Z

I came across your post on the Tensorflow thread that you are developing an OpenCL counterpart for cuDNN. I would like to help/contribute on that project. Let me know where and how can I help. I have extensive OpenCL programming experience and am currently focused on ML activities at AMD.

naibaf7 · 2016-05-25T07:24:49Z

@dagamayank
Thank you, help is very welcome, especially from AMD :)
To start, you can have a look at how the kernels are generated and the public interface of the cuDNN replacement:
https://github.com/naibaf7/caffe/blob/master/src/caffe/greentea/libdnn.cpp
https://github.com/naibaf7/caffe/blob/master/include/caffe/greentea/libdnn.hpp

I can also provide you example kernel strings if you don't want to look at that part of the code and are only interested in providing help on optimizing the kernels for AMD GPUs, which would also be very welcome.

bhack · 2016-05-25T07:27:05Z

@naibaf7 Have you seen last updates on Tensorflow thread?

naibaf7 · 2016-05-25T07:46:15Z

@bhack
Yes, why? :)

bhack · 2016-05-25T07:50:19Z

Cause I think that your work could fit fine in https://docs.google.com/spreadsheets/d/1YbHn7dAFPPG_PgTtgCJlWhMGorUPYsF681TsZ4Y4LP0/edit?usp=sharing

dagamayank · 2016-05-26T03:26:50Z

@naibaf7
Kernel strings would be great to have. Also, if you can provide some steps
on how to get started that would be great.

On Wed, May 25, 2016 at 2:24 AM, Fabian Tschopp notifications@github.com
wrote:

@dagamayank https://github.com/dagamayank
Thank you, help is very welcome, especially from AMD :)
To start, you can have a look at how the kernels are generated and the
public interface of the cuDNN replacement:
https://github.com/naibaf7/caffe/blob/master/src/caffe/greentea/libdnn.cpp

https://github.com/naibaf7/caffe/blob/master/include/caffe/greentea/libdnn.hpp

I can also provide you example kernel strings if you don't want to look at
that part of the code and are only interested in providing help on
optimizing the kernels for AMD GPUs, which would also be very welcome.

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#34 (comment)

Mayank Daga
"Nothing Succeeds Like Success"

naibaf7 · 2016-05-26T22:42:54Z

@dagamayank
Ok, the easiest way to get started is to compile Caffe with the USE_LIBDNN turned on in the Makefile.config (https://github.com/naibaf7/caffe/blob/master/Makefile.config.example#L15).
Then, if you want to get a kernel string to look for optimization purposes, uncomment this line:

  ss << generate_bw_defs();
  ss << generate_bw_kernels("conv_backward");
  ss << generate_wg_defs();
  ss << generate_wg_kernels("conv_weights");

  // Write complete kernel string
  kernel_ = ss.str();

  // std::cout << kernel_ << std::endl;
}

(it's line https://github.com/naibaf7/caffe/blob/master/src/caffe/greentea/libdnn.cpp#L1588)

This will give you the kernel string in std::cout to examine it for example in AMD's GPU Open CodeXL. Every kernel string will consist of 3 main kernels: conv_forward, conv_backward and conv_weights. For conv_backward and conv_weights, there are 2 different algorithms each that can be selected:

typedef enum {
  // Stack the batch update into one GEMM block
  // (deterministic, 1 kernel call)
  // Serializes the batch and may therefore under use
  // the GPUs compute units.
  LIBDNN_CONVOLUTION_WG_ALGO_DIRECT        = 0,
  // Use multiple GEMM blocks in parallel and update weights atomically
  // (non deterministic, 1 kernel call, not supported on all devices)
  // Parallelizes the batch and has therefore higher GPU usage.
  LIBDNN_CONVOLUTION_WG_ALGO_ATOMIC        = 1,
  // Use multiple GEMM blocks and an intermediate buffer
  // to reduce weight updates
  // (deterministic, >= 2 kernel calls)
  // Parallelizes the batch and has therefore higher GPU usage.
  // NOT IMPLEMENTED YET
  LIBDNN_CONVOLUTION_WG_ALGO_REDUCTION     = 2
} libdnnConvolutionWeightAlgo_t;

typedef enum {
  // Transform data before GEMM (load, im2col, gemm, store)
  // This method is suitable for convolutions with similar
  // spatial input == output sizes, but can become inefficient
  // if input >> output (with large strides and kernels).
  LIBDNN_CONVOLUTION_BW_ALGO_IM2COL        = 0,
  // Transform data after GEMM (load, gemm, col2im, store)
  // Sometimes faster than im2col method, but uses
  // atomic operations and is not deterministic.
  LIBDNN_CONVOLUTION_BW_ALGO_COL2IM_ATOMIC = 1
} libdnnConvolutionBackwardAlgo_t;

which one is being used can be changed here:
https://github.com/naibaf7/caffe/blob/master/src/caffe/layers/libdnn_conv_layer.cpp#L63

Finally, you need to run a network in order to instantiate the layers and get some kernel strings. The recommended starting point for that is using the following command:

./build/tools/caffe time -model models/bvlc_alexnet/benchmark64.prototxt -gpu=0 -iterations=5

Together with the instructions above, you can dump the kernel strings to a text file like that, and look for optimization possibilities that way. Note that every convolution layer gets its own set of kernels, so the above command will give you many different ones.

dagamayank · 2016-05-27T13:46:40Z

@naibaf7
Thanks a lot for these instructions. I will give them a try and report back.

dagamayank · 2016-05-31T19:26:39Z

I get failure errors on running "make runtest" on the code in master branch of your repo. Is this expected? Two of the errors are from libDNN. My development environment is AMD W9100 and Ubuntu 14.04.

[----------] Global test environment tear-down
[==========] 2028 tests from 274 test cases ran. (3614992 ms total)
[ PASSED ] 2013 tests.
[ FAILED ] 15 tests, listed below:
[ FAILED ] NetTest/0.TestSharedWeightsUpdate, where TypeParam = caffe::CPUDevice
[ FAILED ] LibDNNComparativeTest/0.TestBackward, where TypeParam = float
[ FAILED ] LibDNNComparativeTest/1.TestBackward, where TypeParam = double
[ FAILED ] ConvolutionLayerTest_Spatial/1.TestSimpleConvolution_Spatial3x3, where TypeParam = caffe::GPUDevice
[ FAILED ] ConvolutionLayerTest_Spatial/1.TestSimpleConvolution_Spatial11x11x1x2_caffenet_Conv1, where TypeParam = caffe::GPUDevice
[ FAILED ] ConvolutionLayerTest_Spatial/1.TestSimpleConvolution_Spatial3x3x1_caffenet_Conv4, where TypeParam = caffe::GPUDevice
[ FAILED ] ConvolutionLayerTest_Spatial/1.TestGradient_Spatial, where TypeParam = caffe::GPUDevice
[ FAILED ] ConvolutionLayerTest_Spatial/1.TestSimpleConvolution_Spatial3x3x1_caffenet_Conv3, where TypeParam = caffe::GPUDevice
[ FAILED ] ConvolutionLayerTest_Spatial/1.TestSimpleConvolution_Spatial, where TypeParam = caffe::GPUDevice
[ FAILED ] ConvolutionLayerTest_Spatial/1.TestSimpleConvolution_Spatial3x3x2_caffenet_Conv5, where TypeParam = caffe::GPUDevice
[ FAILED ] ConvolutionLayerTest_Spatial/1.TestSimpleConvolution_Spatial5x5x1x2_caffenet_Conv2, where TypeParam = caffe::GPUDevice
[ FAILED ] ConvolutionLayerTest_Spatial/1.Test1x1Convolution_Spatial, where TypeParam = caffe::GPUDevice
[ FAILED ] ConvolutionLayerTest_Spatial/1.Test1x1Gradient_Spatial, where TypeParam = caffe::GPUDevice
[ FAILED ] ConvolutionLayerTest_Spatial/1.TestSimpleConvolution_Spatial3x3xPad1, where TypeParam = caffe::GPUDevice
[ FAILED ] ConvolutionLayerTest_Spatial/1.TestSimpleConvolution_Spatial5x5, where TypeParam = caffe::GPUDevice

naibaf7 · 2016-05-31T22:15:58Z

@dagamayank
TestSharedWeightsUpdate seems to fail by being off by a small margin. This is weird but can be ignored and is not relevant for this implementation.

The _Spatial failures are from Intel's convolution implementation. I think the fix here is to use the latest ViennaCL development branch: https://github.com/viennacl/viennacl-dev instead of what Ubuntu supplies.

As for the libDNN, this test should definitely not fail. Here it would be helpful to get the failure message from the runtest itself (i.e. where the runtest on libdnn aborted. You can test this in detail by using:
./build/test/test_all.testbin --gtest_filter=*LibDNN*Comparative*Backward* 0

dagamayank · 2016-06-01T02:44:01Z

@naibaf7
Well, I do not clearly understand the output; there are a bunch of lines with values but the last few lines are -
Error count: 134841/159600
Difference: 3.17333e+06 (value: 2.30564e+06 vs 2.2954e+06)
src/caffe/test/test_libdnn_conv.cpp:1064: Failure
Value of: false
Expected: failure
Which is: true
[ FAILED ] LibDNNComparativeTest/1.TestBackward, where TypeParam = double (11638 ms)
[----------] 1 test from LibDNNComparativeTest/1 (11638 ms total)

[----------] Global test environment tear-down
[==========] 2 tests from 2 test cases ran. (37154 ms total)
[ PASSED ] 0 tests.
[ FAILED ] 2 tests, listed below:
[ FAILED ] LibDNNComparativeTest/0.TestBackward, where TypeParam = float
[ FAILED ] LibDNNComparativeTest/1.TestBackward, where TypeParam = double

naibaf7 · 2016-06-01T07:36:53Z

@dagamayank
I just verified on my W9100 that the backward pass is fine. What driver are you using? I'm using 15.302 (Crimson Edition 15.12 Linux 64 bit).
I had problems with the old FirePro driver, so I switched to the Radeon driver.

Do you have any other OpenCL device to check if the backward pass passes the test?

dagamayank · 2016-06-01T13:56:01Z

@naibaf7
Yes, it is probably the old Firepro driver. If it works on your end with
the newer driver, I think we can call it a no-issue for now.

I am going through the kernels right now. Can you mention the reason for
random values to the #defines? It will take sometime for me to understand
what you are doing there.

On Wed, Jun 1, 2016 at 2:36 AM, Fabian Tschopp notifications@github.com
wrote:

@dagamayank https://github.com/dagamayank
I just verified on my W9100 that the backward pass is fine. What driver
are you using? I'm using 15.302 (Crimson Edition 15.12 Linux 64 bit).
I had problems with the old FirePro driver, so I switched to the Radeon
driver.

Do you have any other OpenCL device to check if the backward pass passes
the test?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#34 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AIdLMgIUKLvebfxKvpJv2y3FvITbTpxPks5qHTaWgaJpZM4ImHlA
.

Mayank Daga
"Nothing Succeeds Like Success"

naibaf7 · 2016-06-01T14:01:55Z

@dagamayank
The defines are defining constants for the kernel, such as padding (v_p), striding (v_s), dilation (v_d) and image sizes (v_imsi, v_imso) in each dimension. Other defines are for the GEMM core configuration (such as TSK, TSM, TSN, WPTM, WPTN, ...)

I put it into defines rather than directly into the kernel string for better readability of the kernel itself (i.e. easier to see where a constant is used and why).
As for documentation, all the values are explained in:
https://github.com/naibaf7/caffe/blob/master/src/caffe/greentea/libdnn.cpp
(look for add_def, which is the C++ method I use for declaring new kernel #defines).

dagamayank · 2016-06-01T17:42:04Z

@naibaf7

Are you using autotuning to generate the values of those constants? In
other words, will the constants be same for different kernels and for
different networks?

On Wed, Jun 1, 2016 at 9:01 AM, Fabian Tschopp notifications@github.com
wrote:

@dagamayank https://github.com/dagamayank
The defines are defining constants for the kernel, such as padding (v_p),
striding (v_s), dilation (v_d) and image sizes (v_imsi, v_imso) in each
dimension. Other defines are for the GEMM core configuration (such as TSK,
TSM, TSN, WPTM, WPTN, ...)

I put it into defines rather than directly into the kernel string for
better readability of the kernel itself (i.e. easier to see where a
constant is used and why).
As for documentation, all the values are explained in:
https://github.com/naibaf7/caffe/blob/master/src/caffe/greentea/libdnn.cpp
(look for add_def, which is the C++ method I use for declaring new kernel
#defines).

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#34 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AIdLMryIrMypGnQtyJCAj583knY-8qvOks5qHZDTgaJpZM4ImHlA
.

Mayank Daga
"Nothing Succeeds Like Success"

naibaf7 · 2016-06-02T00:54:30Z

@dagamayank
Some of the values can be autotuned (such as WPTM, WPTN), others are defined by the convolution settings (such as v_p, v_s, v_d). However the autotuner can't store the tuning results yet, so that's experimental.
That means values such as WPTM, WPTN will be the same for every kernel/network at the moment, while v_p, v_s, v_d depends on what kind of convolution you choose (3x3 unpadded, 11x11 with stride, etc.) the image input/output sizes (v_imsi, v_imso) obviously depend on how big the image/feature maps are in the network.

I hope that helps.

naibaf7 · 2016-06-03T14:37:38Z

@dagamayank
Have you made any progress on this or is something too complicated?

dagamayank · 2016-06-03T14:39:26Z

@naibaf7
I did not get a chance to work on it yet. Working on some internal fires now but I will soon get to it. Auto-generated kernels are not the most simplest ones to understand :)

naibaf7 · 2016-06-03T14:44:46Z

@dagamayank
I understand. I will work on the project this weekend and hopefully have some improvements until monday.
One interesting thing I found is that I'm better off targeting TLP instead of ILP on the AMD W9100, i.e. take care not to use too many VGPRS on the AMD card (to get >= 4 waves in flight). On the nVidia card (GTX 980) it was better to push for high ILP (use more #pragma unroll) and relax on occupancy/TLP.
Would be interested what your opinion on this is, and if I am right with these assumptions...

Using vectors of size 4 and 16x16 thread blocks (64x64xTSK shared memory tiling) seems to work best on both cards so far though.

dagamayank · 2016-06-03T14:48:45Z

@naibaf7
In my experience using fewer registers is generally a better choice on AMD GPUs. This allows improved occupancy as well as lets the compiler to generate better code.

One question I had was - do I have to run the entire Alexnet or can I just run the 1st convolution layer using cifar10? What kind of performance are you seeing right now?

naibaf7 · 2016-06-03T15:01:02Z

@dagamayank
You can remove the layers after the 1st convolution in the prototxt file, or start with any other convolution as long as you have the input data defined & connected correctly.
However the first convolution is usually not the most interesting as it has only a few input feature maps.
Performance wise, on AlexNet forward pass I see these numbers (batch size 64):
(These are all untuned in default configuration, so there should be plenty of headroom)

GTX 980 cuDNN forward: 34ms
GTX 980 libDNN forward (CUDA): 70ms
GTX 980 libDNN forward (OpenCL): 90ms
W9100 libDNN forward (OpenCL): 100ms (although here you may see 130ms on the code that you have, I improved the memory access pattern since then. I get this performance at 5 waves in flight.).
GTX 980 cuBLAS forward: 110ms
GTX 980 clBLAS forward: 184ms
W9100 clBLAS forward: 275ms

Especially the clBLAS forward performance is extremely detrimental, which was my main motivation to create libDNN. At this stage, libDNN beats cuBLAS-based implementations. The goal is to get within 70-80% of cuDNN.

naibaf7 · 2016-06-25T02:50:30Z

@dagamayank
LibDNN is now available as a standalone library:
https://github.com/naibaf7/libdnn

zazd · 2016-07-02T03:42:42Z

@naibaf7 I am very interesting in the LibDNN. It gets a good capability. For I am not familiar with opencl , I just glance over the LibDNN, it seems that it is also using matrix multiplication. If possibly, would your tell me if it is principle same to with cudnn? or so nice as you can provide me the references such as paper or document. Thank you.

naibaf7 · 2016-07-02T06:48:53Z

@zazd Yes it uses a local-memory and register-level GEMM.
It is similar to cuDNN, you can read up more here: https://arxiv.org/pdf/1410.0759.pdf

naibaf7 · 2016-10-19T12:35:04Z

@bhack @gstoner
Good news for the RX 480: Performance issues and thermal envelope crashes have been completely fixed since Linux kernel 4.8 AMDGPU drivers.
It is now possible to use the RX 480 for deep learning without limitations on any Linux :)

With LibDNN on both the GTX 1080 and RX 480, the RX 480 performs exactly half as fast as the GTX 1080, just like expected.

bhack · 2016-10-19T12:58:34Z

Do you have v2 kernels?

naibaf7 · 2016-10-19T13:01:44Z

@bhack
For the external library I did not port them yet...
Quite busy with a new project at the moment regarding sparse RNN's. :)
Let me know if you need something though. This was just a heads up because the RX 480 did not work well at all for the past 3 months.

bhack · 2016-10-19T13:06:25Z

@naibaf7 It is hard to talk about this topic.. We actually are the only one that use libdnn as upstream :wrink:. It could be nice if caffe could use libdnn as upstream naturally instead of having libdnn downstream. /cc @edgarriba

naibaf7 · 2016-10-19T13:12:11Z

@bhack
Yeah last week, Codeplay's CEO contacted me regarding some stuff in OpenCL TensorFlow. If he expresses interest as well, I will definitely re-focus more on the libdnn standalone. But I haven't heard back (yet).

bhack · 2016-10-19T13:25:24Z

I think also that @hughperkins could be interested to the standalone upstream

dagamayank · 2016-10-19T14:28:08Z

@naibaf7 do you have Winograd kernels in libDNN?

naibaf7 · 2016-10-19T14:54:26Z

@dagamayank No not yet...

bhack · 2016-10-19T15:32:09Z

Could be interesting if @dicecco1 would contribute upstream on libdnn standalone

dicecco1 · 2016-10-19T16:01:18Z

I'd be interested in being involved in this, though the way that OpenCL is used with FPGAs has some differences/conflicts with the current way that greentea has been setup.

Currently compile time for kernels is on the order of hours for FPGA implementations, so they use offline compilation and program the FPGA with the binary (this still takes on the order of 300-400ms), so between kernels there has to be little or no reprogramming.

bhack · 2016-10-19T16:04:17Z

So it is pratically impossibile to have an autotuning approach like libdnn. Right?

edgarriba · 2016-10-19T16:14:19Z

Apart from that I think it's quite straightforward to provide a couple of interfaces for offline building and import built binaries. Is that right @naibaf7?

dicecco1 · 2016-10-19T16:14:32Z

Yeah, essentially for the FPGA implementations you need to decide more on an architecture (since in FPGAs you're configuring circuits rather than processing instructions) and it is usually best to have something that is either general (e.g. can handle different sizes/strides) or is very specific to a model (e.g. tuned to be very high performance for AlexNet). Autotuning for different layers would fit more into the model specific approach to FPGA implementations, but this would still be offline.

bhack · 2016-10-19T16:25:32Z

@dicecco1 I have not checked in detail your paper but your Winograd kernel could be ported also on GPU/CPU or need to be heavily reeinginered?

dicecco1 · 2016-10-19T16:31:40Z

The winograd kernel would need to be heavily re-engineered for CPU/GPU implementations.

bhack · 2016-10-19T17:14:58Z

I don't know if also @keryell is interested in dicecco1 kernels

bhack · 2016-10-19T22:22:20Z

For all in the thread I'm talking of https://github.com/dicecco1/fpga_caffe

naibaf7 · 2016-10-19T23:23:30Z

There certainly are ways to either cache or tune the kernels on a surrogate platform.
The key here would be to know the FPGA's details and make educated guesses about the performance instead of tuning directly on the FPGA.

naibaf7 · 2016-10-19T23:33:30Z

@bhack @dicecco1
The issue of having to massively re-engineer winograd kernels to fit to new platforms has been noticed by the developers of NEON/Nervanasys as well as @hughperkins. There's good reasons Nervanasys has built specific compilers for Maxwell/Pascal.
The architectural differences are even bigger when going to AMD; VGPRS usage has to be kept in check, and the constant buffers/local memory has to be optimized differently. Local memory is bigger on Maxwell/Pascal than on Polaris/Hawaii, and the cache system works completely different (AMD has 64 KB constant buffers, nVidia uses a read-through/write-through configurable caching system).

bhack · 2016-10-24T09:38:45Z

@naibaf7 Can you notify us if you have some feedback of others interested to have v3 kernels and libdnn standalone as upstream?

naibaf7 · 2016-10-24T13:02:14Z

@bhack
Yes. Still waiting on feedback here :)

hughperkins · 2016-10-24T23:03:17Z

Observation: I'm still waiting on an example of calling libdnn from C++ :-)

bhack · 2016-10-24T23:13:13Z

You can seen an example with tuning commented at https://github.com/tiny-dnn/tiny-dnn/blob/master/tiny_dnn/core/kernels/conv2d_op_libdnn.h

bhack · 2016-10-29T21:47:39Z

@naibaf7 ok please give us an update as you can cause the standalone version it is quite on hold.

naibaf7 · 2016-10-30T00:01:41Z

@bhack
Yes, quite unfortunately, since I'm working hard on my semester project (sparse repeated pattern recurrent neural networks); unfortunately my university does not accredit my work on Caffe :)
The current timeline is as follows:

Non-atomic backward kernel for pooling by beginning of december.
Updated standalone LibDNN by end of december, will include V2 (convolution kernels) and V3+V4 (pooling kernels).

naibaf7 · 2016-12-05T19:26:17Z

Status update, non-atomic backward kernels for pooling finished, library unit tested & verified with style-transfer and MNIST examples.
Next step: Standalone LibDNN update by end of december (latest).

bhack · 2016-12-05T19:27:34Z

Latest? Is it the end of the project?

naibaf7 · 2016-12-05T19:42:24Z

No, this is just the latest point in time I project being done with this step :)
It could be from anywhere 2 to 4 weeks until LibDNN is on the newest kernel versions and gets pooling support.

bhack · 2016-12-05T19:42:52Z

Ok ;)

naibaf7 · 2016-12-05T19:43:24Z

After that I don't know what the next optimization is going to be. Either mobile ARM chips with integrated GPUs or AMD's Vega and FP16, depending on what I can get my hands on first.

bhack mentioned this issue Nov 4, 2016

gpu supporting tiny-dnn/tiny-dnn#71

Open

OpenCL counterpart of cuDNN #34

OpenCL counterpart of cuDNN #34

Comments

dagamayank commented May 25, 2016

naibaf7 commented May 25, 2016

bhack commented May 25, 2016

naibaf7 commented May 25, 2016

bhack commented May 25, 2016

dagamayank commented May 26, 2016

naibaf7 commented May 26, 2016 • edited Loading

dagamayank commented May 27, 2016

dagamayank commented May 31, 2016

naibaf7 commented May 31, 2016 • edited Loading

dagamayank commented Jun 1, 2016 • edited Loading

naibaf7 commented Jun 1, 2016

dagamayank commented Jun 1, 2016

naibaf7 commented Jun 1, 2016

dagamayank commented Jun 1, 2016

naibaf7 commented Jun 2, 2016

naibaf7 commented Jun 3, 2016

dagamayank commented Jun 3, 2016

naibaf7 commented Jun 3, 2016 • edited Loading

dagamayank commented Jun 3, 2016

naibaf7 commented Jun 3, 2016 • edited Loading

naibaf7 commented Jun 25, 2016

zazd commented Jul 2, 2016

naibaf7 commented Jul 2, 2016

naibaf7 commented Oct 19, 2016 • edited Loading

bhack commented Oct 19, 2016

naibaf7 commented Oct 19, 2016 • edited Loading

bhack commented Oct 19, 2016

naibaf7 commented Oct 19, 2016

bhack commented Oct 19, 2016

dagamayank commented Oct 19, 2016

naibaf7 commented Oct 19, 2016

bhack commented Oct 19, 2016

dicecco1 commented Oct 19, 2016

bhack commented Oct 19, 2016

edgarriba commented Oct 19, 2016 • edited Loading

dicecco1 commented Oct 19, 2016

bhack commented Oct 19, 2016

dicecco1 commented Oct 19, 2016

bhack commented Oct 19, 2016

bhack commented Oct 19, 2016

naibaf7 commented Oct 19, 2016

naibaf7 commented Oct 19, 2016 • edited Loading

bhack commented Oct 24, 2016

naibaf7 commented Oct 24, 2016

hughperkins commented Oct 24, 2016

bhack commented Oct 24, 2016

bhack commented Oct 29, 2016 • edited Loading

naibaf7 commented Oct 30, 2016 • edited Loading

naibaf7 commented Dec 5, 2016

bhack commented Dec 5, 2016

naibaf7 commented Dec 5, 2016

bhack commented Dec 5, 2016

naibaf7 commented Dec 5, 2016 • edited Loading

naibaf7 commented May 26, 2016 •

edited

Loading

naibaf7 commented May 31, 2016 •

edited

Loading

dagamayank commented Jun 1, 2016 •

edited

Loading

naibaf7 commented Jun 3, 2016 •

edited

Loading

naibaf7 commented Jun 3, 2016 •

edited

Loading

naibaf7 commented Oct 19, 2016 •

edited

Loading

naibaf7 commented Oct 19, 2016 •

edited

Loading

edgarriba commented Oct 19, 2016 •

edited

Loading

naibaf7 commented Oct 19, 2016 •

edited

Loading

bhack commented Oct 29, 2016 •

edited

Loading

naibaf7 commented Oct 30, 2016 •

edited

Loading

naibaf7 commented Dec 5, 2016 •

edited

Loading