Batched model #3

conordaly0 · 2020-10-30T15:58:39Z

Hi folks,

I've added changes to support batched data through the model. I haven't updated the Tokenizer or the sampling methods -- if you think these need to be updated I'll address this in another submission.

One thing to consider is the shape of inputs to the model. For batchless inputs we used the convention:

X = (1) x (numSubwords)

In this submission, I've inserted a batch dimension on the outside

X = (1) x (numSubwords) x (numObs)

This has the irritation of the leading singleton dimension, though it fits nicely with our CBT formatting labels. We could also consider:

X = (numSubwords) x (numObs)

Here we just squeeze out the singleton dimension, but note that format breaks the conventions we used for the batchless model.

X = (numObs) x (numSubwords)

Having leading numObs is perhaps more consistent with the format used by S&ML, but internally we will be permuting the activations to have numObs be the trailing dimension.

bwdGitHub · 2020-10-30T16:08:45Z

The test failure looks like a tolerance issue, I can take a look and push a fix onto this branch.

bwdGitHub · 2020-10-30T16:20:04Z

On formats, the 1xTxB makes some sense to me, although dlarray would flip that to "CBT" if we were using formatted data and formatted operations, so I wonder if that would cause pain if we ever swap to a formatted version.

bwdGitHub · 2020-10-30T16:32:11Z

+transformer/+layer/attention.m

 if ~isempty(past)
-    PK = past(:,:,:,1);
-    PV = past(:,:,:,2);
+    PK = permute(past(:,:,:,1,:), [1 2 3 5 4]);


Might be worth adding a comment reminding what the current dimension format is.

Yes good idea -- I'll add a comment

bwdGitHub · 2020-10-30T16:37:14Z

test/gpt2/tmodel.m

-        function canUseModel(test)
-            inputs = test.prepareInputs();
-            test.verifyWarningFree(@() test.model(inputs{:}));
+        function canUseModel(test, InputData)


I'd like to see a new test that checks each batched operation matches calling the operation on each observation in turn, i.e. f([x1,x2]) = [f(x1),f(x2)] in some sense.

Doing that at the model level will be most efficient, but it'd be good practice to do that at the unit level too. I can take a look at this too, but I think I'd rather see it before merging into master.

Absolutely. I was using such a test locally to confirm my changes. I'll add something to tmodel.

conordaly0 · 2020-11-03T09:59:35Z

The test failure looks like a tolerance issue, I can take a look and push a fix onto this branch.

I noticed that too. I could always add a tolerance to the comparison for now?

bwdGitHub · 2020-11-03T10:04:13Z

The test failure looks like a tolerance issue, I can take a look and push a fix onto this branch.

I noticed that too. I could always add a tolerance to the comparison for now?

Sure, that's all I was planning to do.

(huh, github makes it hard to reply to comments)

conordaly0 · 2020-11-03T11:26:43Z

On formats, the 1xTxB makes some sense to me, although dlarray would flip that to "CBT" if we were using formatted data and formatted operations, so I wonder if that would cause pain if we ever swap to a formatted version.

I think it would be OK because on construction we can always specify the labels in the most convenient order. Then we would need t make sure the internal operations behave themselves with respect to formatting labels, which I don't think would be too much of a pain.

bwdGitHub · 2020-11-03T18:03:45Z

I think the last CI job stalled for some reason, I've set it to re-run. Downloading the weights did take a long time when I did it manually recently. Hopefully doesn't become a frequent problem.

adulai

(still getting used to GitHub code review tools) Looks good to me, I have added some comments about past/present.

adulai · 2020-11-03T16:29:07Z

+transformer/+layer/attention.m

 if ~isempty(past)
-    PK = past(:,:,:,1);
-    PV = past(:,:,:,2);
+    PK = permute(past(:,:,:,1,:), [1 2 3 5 4]);
+    PV = permute(past(:,:,:,2,:), [1 2 3 5 4]);
    K = cat(2,PK,K);
    V = cat(2,PV,V);
 end

 % Set present. Note that this is done differently from the original
 % implementation which sets the value of present before the previous if
 % statement.
-present = cat(4,K,V);
+present = cat(5,K,V);
+present = permute(present, [1 2 3 5 4]);


Can we change the layout of past/present? I would prefer it if the observation dimension is the 4-th dimension, I think if we did it that way, we could get rid of a lot of the calls to permute. Then the code would look something like this:

if ~isempty(past) PK = past(:,:,:,:,1); PV = past(:,:,:,:,2); K = cat(2,PK,K); V = cat(2,PV,V); end % Set present. Note that this is done differently from the original % implementation which sets the value of present before the previous if % statement. present = cat(5,K,V);

I think what you propose is a better format. My concern was the backwards incompatibility of past/presents associated with the old version, however, since it's a simple fix to update the format we should move ahead with the most appropriate format.

I'll make this change and update the request.

Batched model

5aa071c

conordaly0 requested review from adulai and bwdGitHub October 30, 2020 15:58

bwdGitHub reviewed Oct 30, 2020

View reviewed changes

More comments in attention and batch model test

e33baf2

bwdGitHub approved these changes Nov 3, 2020

View reviewed changes

adulai reviewed Nov 3, 2020

View reviewed changes

conordaly0 added 3 commits November 5, 2020 09:10

More convenient pasts layout

9d354fd

Fix tblock

0de614f

Extract data for verifyEqual

2ed3b1c

conordaly0 merged commit 59abf30 into master Nov 5, 2020

conordaly0 deleted the batchModel branch November 5, 2020 15:00

Batched model #3

Batched model #3

Uh oh!

Conversation

conordaly0 commented Oct 30, 2020

Uh oh!

bwdGitHub commented Oct 30, 2020

Uh oh!

bwdGitHub commented Oct 30, 2020

Uh oh!

bwdGitHub Oct 30, 2020

Choose a reason for hiding this comment

Uh oh!

conordaly0 Nov 3, 2020

Choose a reason for hiding this comment

Uh oh!

bwdGitHub Oct 30, 2020

Choose a reason for hiding this comment

Uh oh!

conordaly0 Nov 3, 2020

Choose a reason for hiding this comment

Uh oh!

conordaly0 commented Nov 3, 2020

Uh oh!

bwdGitHub commented Nov 3, 2020

Uh oh!

conordaly0 commented Nov 3, 2020

Uh oh!

bwdGitHub commented Nov 3, 2020

Uh oh!

adulai left a comment

Choose a reason for hiding this comment

Uh oh!

adulai Nov 3, 2020

Choose a reason for hiding this comment

Uh oh!

conordaly0 Nov 4, 2020

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants