Add CUDA Kernel for TreeSearch Ngram Blocking #4633

pearlli98 · 2022-06-27T19:21:18Z

Major changes

add availability of cuda kernel for running ngram blocking (both self-blocking and context-blocking), can be activated by the flag --gpu-beam-blocking True, functionalities are wrapped in the class NGramRepeatBlockFunction
all subclasses of TreeSearch has new boolean attribute gpu_beam_blocking that we use in _block_ngrams()
use tensors for the attributes self.partial_hyps and self.context of TorchGeneratorAgent instead of lists
add unit tests for gpu ngram blocking
add ninja and protobuf to dependency list to enable CUDA extensions

Testing steps

gpu tests: with cuda enabled

run pytest tests/test_transformers.py -k test_beamsearch_blocking_gpu
pytest tests/test_transformers.py -k test_beamsearch_contextblocking_gpu

cpu tests:

run pytest tests/test_transformers.py -k test_beamsearch_blocking_cpu
pytest tests/test_transformers.py -k test_beamsearch_contextblocking_cpu

Other information

To run an interactive model with gpu beam blocking, do
parlai interactive --model-file "zoo:tutorial_transformer_generator/model" --gpu-beam-blocking True

Evaluation
Have done evaluation on the convai2 teacher task on 3 settings: (1) code on main (2) new code with cpu (3) new code with gpu kernel, results shown below

Correctness: green check, utterances from all 3 settings are identical.
Runtime: We are seeing ~10% improvement with the new gpu kernel.

	main	gpu kernel	cpu
average of 10 runs	697.817s	620.084s	689.101s
change	\	-11.14%	-1.25%

dexterju27 · 2022-06-27T19:38:03Z

parlai/core/torch_generator_agent.py

@@ -1466,7 +1503,7 @@ def _block_block_list(self, logprobs: torch.Tensor) -> torch.Tensor:
                        logprobs[beam_id][ngram[-1]] = neginf(logprobs.dtype)
        return logprobs

-    def advance(self, logprobs):
+    def advance(self, logprobs, step):


I think a easy walk around here to avoid breaking current tests, is to set a default parameter to the step here, step=0 or sth when it is not given, since it is used in non GPU beam blocking settings.

dexterju27

The test failure seems to be cased by
NameError: name 'NGramRepeatBlock' is not defined indicates the CUDA cpp binding function is not handled properly by the CI.

dexterju27 · 2022-06-27T19:39:07Z

parlai/core/torch_generator_agent.py

@@ -956,6 +965,7 @@ def _treesearch_factory(self, device, verbose=False):
            )
        elif method == 'beam':
            return BeamSearch(
+                self.opt['gpu_beam_blocking'],


You have set self.gpu_beam_blocking, why not use it?

dexterju27 · 2022-06-27T19:40:29Z

parlai/core/torch_generator_agent.py

@@ -1443,15 +1459,36 @@ def _block_ngrams(
            Source text to grab ngrams from. If None, it uses the current
            hypothesis (i.e. self-blocking).
        """
+        context = None
+        if self.gpu_beam_blocking:
+            if if_context_blocking:


Let's make it if self.gpu_beam_blocking and if_context_blocking?

…function to cast into list, set current ngram_size for context blocking, move path to cpu when needed

… code

klshuster · 2022-06-29T18:37:56Z

is this ready for review? or still WIP?

…e kernel code" This reverts commit 9af834a.

pearlli98 · 2022-06-30T14:47:47Z

is this ready for review? or still WIP?

this is ready for review.

This reverts commit a73241c.

This reverts commit 38c8d97.

klshuster

I think this looks great! I'm approving but it looks like we still have cleaninstall failures; any chance you could try fixing those?

klshuster · 2022-07-05T15:54:12Z

parlai/core/torch_generator_agent.py

@@ -1367,11 +1391,14 @@ def set_context(self: TSType, context: torch.LongTensor) -> TSType:
            a LongTensor representing the input context; used for context
            ngram blocking, if supplied
        """
-        self.context = context.tolist()
+        self.context = torch.Tensor(context.tolist()).long()


what is default dtype here? curious why .long() cast is needed

outdated change, reverted to main code.

cleaninstall failures seemed to have disappeared without me changing anything.

klshuster · 2022-07-05T16:00:45Z

parlai/ops/ngram_repeat_block.py

+        step,
+        beam_size,
+        no_repeat_ngram_size,
+        if_context_blocking=False,


is this parameter necessary? I.e., can't we assume that if context is passed, we block on it? or does this make the downstream logic easier to handle in the kernel

this helps downstream kernel logic easier. Kernel params can't be none so I initialize a placeholder tensor as the empty context. Will need a bool to tell whether we are doing self-blocking or context-blocking.

klshuster · 2022-07-05T16:02:43Z

parlai/core/torch_generator_agent.py

@@ -1725,7 +1792,8 @@ def select_paths(self, logprobs, prior_scores, current_length) -> _PathSelection
        voc_size = logprobs.size(-1)

        # get the backtracking hypothesis id as a multiple of full voc_sizes
-        hyp_ids = best_idxs // voc_size
+        # hyp_ids = best_idxs // voc_size


nit: feel free to delete. and also, thank you for changing this (I've seen this warning several times)

klshuster · 2022-07-05T16:03:40Z

setup.py

@@ -9,6 +9,8 @@

 from setuptools import setup, find_packages

+# from torch.utils.cpp_extension import BuildExtension, CUDAExtension


nit: can you please delete these commented lines

stephenroller · 2022-07-05T16:20:28Z

parlai/clib/cuda/ngram_repeat_block_cuda.cpp

@@ -0,0 +1,50 @@
+/*


Is our code freshly rewritten, or did we get it from fastseq. We need to maintain the original copyright headers (in addition to our own) if that's the case. Both are MIT so it's no problem, but we need to include Copyright (c) Microsoft etc

Thanks for pointing this out! We definitely referenced their code but i've also made some significant changes. I think we should add the copyright here? can you let me know what's the right move? not super familiar with this.

would sth like this work?

/* Copyright (c) Facebook, Inc. and its affiliates. Copyright (c) Microsoft Corporation. This source code is licensed under the MIT license found in the LICENSE file in the root directory of this source tree. */

As we discussed in our mtg, you can cite the original file and say we adapt from it. And maintain the Microsoft copyright header.

stephenroller

wow great progress!

dexterju27 · 2022-07-05T18:19:19Z

parlai/clib/cuda/ngram_repeat_block_cuda_kernel.cu

@@ -0,0 +1,112 @@
+/*
+Copyright (c) Facebook, Inc. and its affiliates.
+This source code is licensed under the MIT license found in the


And what @stephenroller mentioned about licensing should apply to this file as well, I suppose?

yes, edited this as well.

dexterju27 · 2022-07-05T18:20:04Z

parlai/clib/cuda/ngram_repeat_block_cuda_kernel.cu

+                                  int vocab_size,
+                                  int no_repeat_ngram_size,
+                                  bool if_context_blocking) {
+  auto row = blockIdx.x;


Could you add some comments here on what row and col means here?

dexterju27 · 2022-07-05T18:20:56Z

parlai/clib/cuda/ngram_repeat_block_cuda_kernel.cu

+
+  // final thread writes the end of previous ngram array to tokens_shm
+  if (col == blockDim.x - 1) {
+    for (int i=1; i<no_repeat_ngram_size; i++){


nit: spacing, you could apply a CPP formatter on this file

dexterju27 · 2022-07-05T18:26:17Z

tests/test_transformers.py

@@ -422,7 +422,59 @@ def test_beamsearch_blocking(self):
            assert '34 34' not in text

    @pytest.mark.nofbcode
-    def test_beamsearch_contextblocking(self):
+    @testing_utils.skipUnlessGPU


Thanks for adding all these tests

pearlli98 added 12 commits June 23, 2022 11:11

add cuda and cpp code for ngram blocking

02fca24

add python wrapper

bd8ab04

modify agent to use cuda kernel for self-blocking

0eddfe7

Merge branch 'main' into pearlli-ngram-blocking-kernel

9d76ff0

add context blocking

deda51e

change load paths

189d35b

add ninja to requirement

9ccf003

modify setup script to install kernel ahead of time

e864f49

change circleci test to use gpu to build website

49c9bfb

change back to JIT, switch directory when loadingcuda moddule

1d2448e

add check for cuda

0263c8f

get rid of ninja

cbca092

pearlli98 requested a review from dexterju27 June 27, 2022 19:21

pearlli98 self-assigned this Jun 27, 2022

facebook-github-bot added the CLA Signed label Jun 27, 2022

remove unused param

f75dc56

dexterju27 reviewed Jun 27, 2022

View reviewed changes

pearlli98 added 12 commits June 27, 2022 12:49

move "hyps to cuda" into _block_ngrams()

c03c0d2

set gpu_beam_blocking as attribute for TreeSearch, modify block_list …

aae7127

…function to cast into list, set current ngram_size for context blocking, move path to cpu when needed

fix lint formatting issues

f4d6bf1

add init file to new folders

7076ff4

add new line at end of file

05b81b6

new lint errors

d6d7ab6

add ninja

6e17d5f

set protobuf

538cd62

cast tensor to list in to pass gpu tests

b5b1df1

debug long gpu tests

e4b78c7

fix pointer bug in kernel code and change ngram_size param

7a80cfb

add gpu unit tests and fix torch warning

fe327f4

pearlli98 requested review from EricMichaelSmith, stephenroller and klshuster June 28, 2022 21:21

pearlli98 changed the title ~~[WIP] Add CUDA Kernel for Beam Blocking~~ [WIP] Add CUDA Kernel for TreeSearch Ngram Blocking Jun 28, 2022

pearlli98 requested a review from dexterju27 June 28, 2022 21:26

pearlli98 added 2 commits June 28, 2022 15:29

use tolist() for conversion

acdafcf

get rid of context's conversion to list, add check data before kernel…

9af834a

… code

Revert "get rid of context's conversion to list, add check data befor…

48d9d97

…e kernel code" This reverts commit 9af834a.

pearlli98 changed the title ~~[WIP] Add CUDA Kernel for TreeSearch Ngram Blocking~~ Add CUDA Kernel for TreeSearch Ngram Blocking Jun 30, 2022

pearlli98 added 7 commits June 30, 2022 14:20

replace tensor with list for cpu code to make faster

547d4fa

remove unused import

986fe66

change botocore version

38c8d97

change botocore again

a73241c

Revert "change botocore again"

0817758

This reverts commit a73241c.

Revert "change botocore version"

2f3754b

This reverts commit 38c8d97.

modify pacer set_batch_context

e6dad90

klshuster approved these changes Jul 5, 2022

View reviewed changes

stephenroller reviewed Jul 5, 2022

View reviewed changes

pearlli98 added 2 commits July 5, 2022 10:18

remove comments and outdated changes

5128346

add comments and copyright headers

557ea12

dexterju27 reviewed Jul 5, 2022

View reviewed changes

format c++ and cu file

1cb1048

pearlli98 closed this Jul 5, 2022

pearlli98 reopened this Jul 5, 2022

pearlli98 marked this pull request as ready for review July 5, 2022 19:19

pearlli98 merged commit dff9aab into main Jul 5, 2022

pearlli98 deleted the pearlli-ngram-blocking-kernel branch July 5, 2022 19:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CUDA Kernel for TreeSearch Ngram Blocking #4633

Add CUDA Kernel for TreeSearch Ngram Blocking #4633

pearlli98 commented Jun 27, 2022 •

edited

Loading

dexterju27 Jun 27, 2022

dexterju27 left a comment

dexterju27 Jun 27, 2022

dexterju27 Jun 27, 2022

klshuster commented Jun 29, 2022

pearlli98 commented Jun 30, 2022

klshuster left a comment

klshuster Jul 5, 2022

pearlli98 Jul 5, 2022

pearlli98 Jul 5, 2022

klshuster Jul 5, 2022

pearlli98 Jul 5, 2022

klshuster Jul 5, 2022

pearlli98 Jul 5, 2022

klshuster Jul 5, 2022

pearlli98 Jul 5, 2022

stephenroller Jul 5, 2022

pearlli98 Jul 5, 2022

pearlli98 Jul 5, 2022

dexterju27 Jul 5, 2022

stephenroller left a comment

dexterju27 Jul 5, 2022 •

edited

Loading

pearlli98 Jul 5, 2022

dexterju27 Jul 5, 2022

dexterju27 Jul 5, 2022

dexterju27 Jul 5, 2022

		@@ -9,6 +9,8 @@

		from setuptools import setup, find_packages

		# from torch.utils.cpp_extension import BuildExtension, CUDAExtension

Add CUDA Kernel for TreeSearch Ngram Blocking #4633

Add CUDA Kernel for TreeSearch Ngram Blocking #4633

Conversation

pearlli98 commented Jun 27, 2022 • edited Loading

Choose a reason for hiding this comment

dexterju27 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

klshuster commented Jun 29, 2022

pearlli98 commented Jun 30, 2022

klshuster left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stephenroller left a comment

Choose a reason for hiding this comment

dexterju27 Jul 5, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pearlli98 commented Jun 27, 2022 •

edited

Loading

dexterju27 Jul 5, 2022 •

edited

Loading