-
Notifications
You must be signed in to change notification settings - Fork 62
Batched model #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The test failure looks like a tolerance issue, I can take a look and push a fix onto this branch. |
|
On formats, the 1xTxB makes some sense to me, although |
+transformer/+layer/attention.m
Outdated
| if ~isempty(past) | ||
| PK = past(:,:,:,1); | ||
| PV = past(:,:,:,2); | ||
| PK = permute(past(:,:,:,1,:), [1 2 3 5 4]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth adding a comment reminding what the current dimension format is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes good idea -- I'll add a comment
| function canUseModel(test) | ||
| inputs = test.prepareInputs(); | ||
| test.verifyWarningFree(@() test.model(inputs{:})); | ||
| function canUseModel(test, InputData) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see a new test that checks each batched operation matches calling the operation on each observation in turn, i.e. f([x1,x2]) = [f(x1),f(x2)] in some sense.
Doing that at the model level will be most efficient, but it'd be good practice to do that at the unit level too. I can take a look at this too, but I think I'd rather see it before merging into master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely. I was using such a test locally to confirm my changes. I'll add something to tmodel.
I noticed that too. I could always add a tolerance to the comparison for now? |
Sure, that's all I was planning to do. (huh, github makes it hard to reply to comments) |
I think it would be OK because on construction we can always specify the labels in the most convenient order. Then we would need t make sure the internal operations behave themselves with respect to formatting labels, which I don't think would be too much of a pain. |
|
I think the last CI job stalled for some reason, I've set it to re-run. Downloading the weights did take a long time when I did it manually recently. Hopefully doesn't become a frequent problem. |
adulai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(still getting used to GitHub code review tools) Looks good to me, I have added some comments about past/present.
+transformer/+layer/attention.m
Outdated
| if ~isempty(past) | ||
| PK = past(:,:,:,1); | ||
| PV = past(:,:,:,2); | ||
| PK = permute(past(:,:,:,1,:), [1 2 3 5 4]); | ||
| PV = permute(past(:,:,:,2,:), [1 2 3 5 4]); | ||
| K = cat(2,PK,K); | ||
| V = cat(2,PV,V); | ||
| end | ||
|
|
||
| % Set present. Note that this is done differently from the original | ||
| % implementation which sets the value of present before the previous if | ||
| % statement. | ||
| present = cat(4,K,V); | ||
| present = cat(5,K,V); | ||
| present = permute(present, [1 2 3 5 4]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change the layout of past/present? I would prefer it if the observation dimension is the 4-th dimension, I think if we did it that way, we could get rid of a lot of the calls to permute. Then the code would look something like this:
if ~isempty(past)
PK = past(:,:,:,:,1);
PV = past(:,:,:,:,2);
K = cat(2,PK,K);
V = cat(2,PV,V);
end
% Set present. Note that this is done differently from the original
% implementation which sets the value of present before the previous if
% statement.
present = cat(5,K,V);There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what you propose is a better format. My concern was the backwards incompatibility of past/presents associated with the old version, however, since it's a simple fix to update the format we should move ahead with the most appropriate format.
I'll make this change and update the request.
Hi folks,
I've added changes to support batched data through the model. I haven't updated the Tokenizer or the sampling methods -- if you think these need to be updated I'll address this in another submission.
One thing to consider is the shape of inputs to the model. For batchless inputs we used the convention:
X = (1) x (numSubwords)In this submission, I've inserted a batch dimension on the outside
X = (1) x (numSubwords) x (numObs)This has the irritation of the leading singleton dimension, though it fits nicely with our CBT formatting labels. We could also consider:
X = (numSubwords) x (numObs)X = (numObs) x (numSubwords)