|
19 | 19 | "It's a Python based scientific computing package targeted at two sets of audiences:\n", |
20 | 20 | "\n", |
21 | 21 | "- A replacement for numpy to use the power of GPUs\n", |
22 | | - "- a deep learning research platform that provides maximum flexibility and speed" |
| 22 | + "- a deep learning research platform that provides maximum flexibility and speed\n", |
| 23 | + "\n", |
| 24 | + "**If you want to complete the full tutorial, including training a neural network for image classification, you have to install the `torchvision` package.**" |
23 | 25 | ] |
24 | 26 | }, |
25 | 27 | { |
|
88 | 90 | "x.size()" |
89 | 91 | ] |
90 | 92 | }, |
| 93 | + { |
| 94 | + "cell_type": "markdown", |
| 95 | + "metadata": {}, |
| 96 | + "source": [ |
| 97 | + "*NOTE: `torch.Size` is in fact a tuple, so it supports the same operations*" |
| 98 | + ] |
| 99 | + }, |
91 | 100 | { |
92 | 101 | "cell_type": "code", |
93 | 102 | "execution_count": null, |
|
293 | 302 | "## Autograd: automatic differentiation\n", |
294 | 303 | "\n", |
295 | 304 | "The `autograd` package provides automatic differentiation for all operations on Tensors. \n", |
296 | | - "It is a define-by-run framework, which means that your backprop is defined by how your code is run. \n", |
| 305 | + "It is a define-by-run framework, which means that your backprop is defined by how your code is run, and that every single iteration can be different. \n", |
297 | 306 | "\n", |
298 | 307 | "Let us see this in more simple terms with some examples.\n", |
299 | 308 | "\n", |
300 | 309 | "`autograd.Variable` is the central class of the package. \n", |
301 | | - "It wraps a Tensor, and afterwards you can run tensor operations on it, and finally call `.backward()`\n", |
| 310 | + "It wraps a Tensor, and supports nearly all of operations defined on it. Once you finish your computation you can call `.backward()` and have all the gradients computed automatically.\n", |
302 | 311 | "\n", |
303 | | - "You can access the raw tensor through the `.data` attribute, and after computing the backward pass, a gradient w.r.t. this variable is accumulated into `.grad` attribute.\n", |
| 312 | + "You can access the raw tensor through the `.data` attribute, while the gradient w.r.t. this variable is accumulated into `.grad`.\n", |
304 | 313 | "\n", |
305 | 314 | "\n", |
306 | 315 | "\n", |
307 | 316 | "There's one more class which is very important for autograd implementation - a `Function`. \n", |
308 | 317 | "\n", |
309 | | - "`Variable` and `Function` are interconnected and build up an acyclic graph, that encodes a complete history of computation. Each variable has a `.creator` attribute that references a `Function` that has created the `Variable` (except for Variables created by the user - these have `creator=None`).\n", |
| 318 | + "`Variable` and `Function` are interconnected and build up an acyclic graph, that encodes a complete history of computation. Each variable has a `.creator` attribute that references a `Function` that has created the `Variable` (except for Variables created by the user - their `creator is None`).\n", |
310 | 319 | "\n", |
311 | 320 | "If you want to compute the derivatives, you can call `.backward()` on a `Variable`. \n", |
312 | | - "If `Variable` is a scalar (i.e. it holds a one element tensor), you don't need to specify any arguments to `backward()`, however if it has more elements, you need to specify a `grad_output` argument that is a tensor of matching shape.\n" |
| 321 | + "If `Variable` is a scalar (i.e. it holds a one element data), you don't need to specify any arguments to `backward()`, however if it has more elements, you need to specify a `grad_output` argument that is a tensor of matching shape.\n" |
313 | 322 | ] |
314 | 323 | }, |
315 | 324 | { |
|
523 | 532 | "outputs": [], |
524 | 533 | "source": [ |
525 | 534 | "import torch.nn as nn\n", |
| 535 | + "import torch.nn.functional as F\n", |
| 536 | + "# Some more python helpers\n", |
| 537 | + "import functools\n", |
| 538 | + "import operator\n", |
526 | 539 | "\n", |
527 | 540 | "class Net(nn.Container):\n", |
528 | 541 | " def __init__(self):\n", |
529 | 542 | " super(Net, self).__init__()\n", |
530 | 543 | " self.conv1 = nn.Conv2d(1, 6, 5) # 1 input image channel, 6 output channels, 5x5 square convolution kernel\n", |
531 | | - " self.pool = nn.MaxPool2d(2,2) # A max-pooling operation that looks at 2x2 windows and finds the max.\n", |
532 | 544 | " self.conv2 = nn.Conv2d(6, 16, 5)\n", |
533 | 545 | " self.fc1 = nn.Linear(16*5*5, 120) # an affine operation: y = Wx + b\n", |
534 | 546 | " self.fc2 = nn.Linear(120, 84)\n", |
535 | 547 | " self.fc3 = nn.Linear(84, 10)\n", |
536 | | - " self.relu = nn.ReLU()\n", |
537 | 548 | "\n", |
538 | 549 | " def forward(self, x):\n", |
539 | | - " x = self.pool(self.relu(self.conv1(x)))\n", |
540 | | - " x = self.pool(self.relu(self.conv2(x)))\n", |
541 | | - " x = x.view(-1, 16*5*5)\n", |
542 | | - " x = self.relu(self.fc1(x))\n", |
543 | | - " x = self.relu(self.fc2(x))\n", |
| 550 | + " x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2)) # Max pooling over a (2, 2) window\n", |
| 551 | + " x = F.max_pool2d(F.relu(self.conv2(x)), 2) # If the size is a square you can only specify a single number\n", |
| 552 | + " x = x.view(-1, self.num_flat_features(x))\n", |
| 553 | + " x = F.relu(self.fc1(x))\n", |
| 554 | + " x = F.relu(self.fc2(x))\n", |
544 | 555 | " x = self.fc3(x)\n", |
545 | 556 | " return x\n", |
| 557 | + " \n", |
| 558 | + " def num_flat_features(self, x):\n", |
| 559 | + " return functools.reduce(operator.mul, x.size()[1:])\n", |
546 | 560 | "\n", |
547 | 561 | "net = Net()\n", |
548 | 562 | "net" |
|
610 | 624 | "source": [ |
611 | 625 | "> #### NOTE: `torch.nn` only supports mini-batches\n", |
612 | 626 | "The entire `torch.nn` package only supports inputs that are a mini-batch of samples, and not a single sample. \n", |
613 | | - "For example, `nn.Conv2d` will take in a 4D Tensor of `nSamples x nChannels x Height x Width` \n", |
614 | | - "*This is done to simplify developer code and eliminate bugs*" |
| 627 | + "For example, `nn.Conv2d` will take in a 4D Tensor of `nSamples x nChannels x Height x Width`.\n", |
| 628 | + "\n", |
| 629 | + "> *If you have a single sample, just use `input.unsqueeze(0)` to add a fake batch dimension.*" |
615 | 630 | ] |
616 | 631 | }, |
617 | 632 | { |
618 | 633 | "cell_type": "markdown", |
619 | 634 | "metadata": {}, |
620 | 635 | "source": [ |
621 | | - "##### Review of what you learnt so far:\n", |
| 636 | + "### Recap of all the classes you've seen so far:\n", |
| 637 | + "\n", |
| 638 | + "* `torch.Tensor` - A **multi-dimensional array**.\n", |
| 639 | + "* `autograd.Variable` - **Wraps a Tensor and records the history of operations** applied to it. Has the same API as a `Tensor`, with some additions like `backward()`. Also **holds the gradient** w.r.t. the tensor.\n", |
| 640 | + "* `nn.Module` - Neural network module. **Convenient way of encapsulating parameters**, with helpers for moving them to GPU, exporting, loading, etc.\n", |
| 641 | + "* `nn.Container` - `Module` that is a **container for other Modules**.\n", |
| 642 | + "* `nn.Parameter` - A kind of Variable, that is **automatically registered as a parameter when assigned as an attribute to a `Module`**.\n", |
| 643 | + "* `autograd.Function` - Implements **forward and backward definitions of an autograd operation**. Every `Variable` operation, creates at least a single `Function` node, that connects to functions that created a `Variable` and **encodes its history**.\n", |
| 644 | + "\n", |
| 645 | + "##### At this point, we covered:\n", |
622 | 646 | "- Defining a neural network\n", |
623 | 647 | "- Processing inputs and calling backward.\n", |
624 | 648 | "\n", |
|
670 | 694 | " -> loss\n", |
671 | 695 | "```\n", |
672 | 696 | "\n", |
673 | | - "So, when we call `loss.backward()`, the whole graph is differentiated w.r.t. the loss, and all Variables in the graph will have their `.grad` Tensor accumulated with the gradient.\n", |
| 697 | + "So, when we call `loss.backward()`, the whole graph is differentiated w.r.t. the loss, and all Variables in the graph will have their `.grad` Variable accumulated with the gradient.\n", |
674 | 698 | " " |
675 | 699 | ] |
676 | 700 | }, |
|
727 | 751 | "```python\n", |
728 | 752 | "learning_rate = 0.01\n", |
729 | 753 | "for f in net.parameters():\n", |
730 | | - " f.data.sub_(f.grad * learning_rate)\n", |
| 754 | + " f.data.sub_(f.grad.data * learning_rate)\n", |
731 | 755 | "```\n", |
732 | 756 | "\n", |
733 | 757 | "However, as you use neural networks, you want to use various different update rules such as SGD, Nesterov-SGD, Adam, RMSProp, etc.\n", |
|
822 | 846 | "transform=transforms.Compose([transforms.ToTensor(),\n", |
823 | 847 | " transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),\n", |
824 | 848 | " ])\n", |
825 | | - "trainset = torchvision.datasets.CIFAR10(root='/Users/soumith/code/pytorch-vision/test/cifar', \n", |
826 | | - " train=True, download=True, transform=transform)\n", |
| 849 | + "trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)\n", |
827 | 850 | "trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, \n", |
828 | 851 | " shuffle=True, num_workers=2)\n", |
829 | 852 | "\n", |
830 | | - "testset = torchvision.datasets.CIFAR10(root='/Users/soumith/code/pytorch-vision/test/cifar', \n", |
831 | | - " train=False, download=True, transform=transform)\n", |
| 853 | + "testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)\n", |
832 | 854 | "testloader = torch.utils.data.DataLoader(testset, batch_size=4, \n", |
833 | 855 | " shuffle=False, num_workers=2)\n", |
834 | 856 | "classes = ('plane', 'car', 'bird', 'cat',\n", |
|
1163 | 1185 | "metadata": {}, |
1164 | 1186 | "source": [ |
1165 | 1187 | "#### Training on the GPU\n", |
1166 | | - "The idea is pretty simple. \n", |
1167 | | - "Just like how you transfer a Tensor on to the GPU, you transfer the neural net onto the GPU." |
| 1188 | + "Just like how you transfer a Tensor on to the GPU, you transfer the neural net onto the GPU.\n", |
| 1189 | + "This will recursively go over all modules and convert their parameters and buffers to CUDA tensors." |
1168 | 1190 | ] |
1169 | 1191 | }, |
1170 | 1192 | { |
|
1207 | 1229 | "- [More tutorials](https://github.com/pytorch/tutorials)\n", |
1208 | 1230 | "- [Chat with other users on Slack](pytorch.slack.com/messages/beginner/)" |
1209 | 1231 | ] |
1210 | | - }, |
1211 | | - { |
1212 | | - "cell_type": "code", |
1213 | | - "execution_count": null, |
1214 | | - "metadata": { |
1215 | | - "collapsed": true |
1216 | | - }, |
1217 | | - "outputs": [], |
1218 | | - "source": [] |
1219 | 1232 | } |
1220 | 1233 | ], |
1221 | 1234 | "metadata": { |
1222 | 1235 | "kernelspec": { |
1223 | | - "display_name": "Python 2", |
| 1236 | + "display_name": "Python 3", |
1224 | 1237 | "language": "python", |
1225 | | - "name": "python2" |
| 1238 | + "name": "python3" |
1226 | 1239 | }, |
1227 | 1240 | "language_info": { |
1228 | 1241 | "codemirror_mode": { |
1229 | 1242 | "name": "ipython", |
1230 | | - "version": 2 |
| 1243 | + "version": 3 |
1231 | 1244 | }, |
1232 | 1245 | "file_extension": ".py", |
1233 | 1246 | "mimetype": "text/x-python", |
1234 | 1247 | "name": "python", |
1235 | 1248 | "nbconvert_exporter": "python", |
1236 | | - "pygments_lexer": "ipython2", |
1237 | | - "version": "2.7.12" |
| 1249 | + "pygments_lexer": "ipython3", |
| 1250 | + "version": "3.5.2" |
1238 | 1251 | } |
1239 | 1252 | }, |
1240 | 1253 | "nbformat": 4, |
|
0 commit comments