Skip to content

Conversation

@conordaly0
Copy link
Member

Hi folks,

I've added changes to support batched data through the model. I haven't updated the Tokenizer or the sampling methods -- if you think these need to be updated I'll address this in another submission.

One thing to consider is the shape of inputs to the model. For batchless inputs we used the convention:

X = (1) x (numSubwords)

In this submission, I've inserted a batch dimension on the outside

X = (1) x (numSubwords) x (numObs)

This has the irritation of the leading singleton dimension, though it fits nicely with our CBT formatting labels. We could also consider:

  1. X = (numSubwords) x (numObs)

Here we just squeeze out the singleton dimension, but note that format breaks the conventions we used for the batchless model.

  1. X = (numObs) x (numSubwords)

Having leading numObs is perhaps more consistent with the format used by S&ML, but internally we will be permuting the activations to have numObs be the trailing dimension.

@bwdGitHub
Copy link
Collaborator

The test failure looks like a tolerance issue, I can take a look and push a fix onto this branch.

@bwdGitHub
Copy link
Collaborator

On formats, the 1xTxB makes some sense to me, although dlarray would flip that to "CBT" if we were using formatted data and formatted operations, so I wonder if that would cause pain if we ever swap to a formatted version.

if ~isempty(past)
PK = past(:,:,:,1);
PV = past(:,:,:,2);
PK = permute(past(:,:,:,1,:), [1 2 3 5 4]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth adding a comment reminding what the current dimension format is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes good idea -- I'll add a comment

function canUseModel(test)
inputs = test.prepareInputs();
test.verifyWarningFree(@() test.model(inputs{:}));
function canUseModel(test, InputData)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see a new test that checks each batched operation matches calling the operation on each observation in turn, i.e. f([x1,x2]) = [f(x1),f(x2)] in some sense.

Doing that at the model level will be most efficient, but it'd be good practice to do that at the unit level too. I can take a look at this too, but I think I'd rather see it before merging into master.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely. I was using such a test locally to confirm my changes. I'll add something to tmodel.

@conordaly0
Copy link
Member Author

The test failure looks like a tolerance issue, I can take a look and push a fix onto this branch.

I noticed that too. I could always add a tolerance to the comparison for now?

@bwdGitHub
Copy link
Collaborator

The test failure looks like a tolerance issue, I can take a look and push a fix onto this branch.

I noticed that too. I could always add a tolerance to the comparison for now?

Sure, that's all I was planning to do.

(huh, github makes it hard to reply to comments)

@conordaly0
Copy link
Member Author

On formats, the 1xTxB makes some sense to me, although dlarray would flip that to "CBT" if we were using formatted data and formatted operations, so I wonder if that would cause pain if we ever swap to a formatted version.

I think it would be OK because on construction we can always specify the labels in the most convenient order. Then we would need t make sure the internal operations behave themselves with respect to formatting labels, which I don't think would be too much of a pain.

@bwdGitHub
Copy link
Collaborator

I think the last CI job stalled for some reason, I've set it to re-run. Downloading the weights did take a long time when I did it manually recently. Hopefully doesn't become a frequent problem.

Copy link
Collaborator

@adulai adulai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(still getting used to GitHub code review tools) Looks good to me, I have added some comments about past/present.

Comment on lines 65 to 76
if ~isempty(past)
PK = past(:,:,:,1);
PV = past(:,:,:,2);
PK = permute(past(:,:,:,1,:), [1 2 3 5 4]);
PV = permute(past(:,:,:,2,:), [1 2 3 5 4]);
K = cat(2,PK,K);
V = cat(2,PV,V);
end

% Set present. Note that this is done differently from the original
% implementation which sets the value of present before the previous if
% statement.
present = cat(4,K,V);
present = cat(5,K,V);
present = permute(present, [1 2 3 5 4]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change the layout of past/present? I would prefer it if the observation dimension is the 4-th dimension, I think if we did it that way, we could get rid of a lot of the calls to permute. Then the code would look something like this:

if ~isempty(past)
    PK = past(:,:,:,:,1);
    PV = past(:,:,:,:,2);
    K = cat(2,PK,K);
    V = cat(2,PV,V);
end

% Set present. Note that this is done differently from the original
% implementation which sets the value of present before the previous if
% statement.
present = cat(5,K,V);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you propose is a better format. My concern was the backwards incompatibility of past/presents associated with the old version, however, since it's a simple fix to update the format we should move ahead with the most appropriate format.

I'll make this change and update the request.

@conordaly0 conordaly0 merged commit 59abf30 into master Nov 5, 2020
@conordaly0 conordaly0 deleted the batchModel branch November 5, 2020 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants