<a href="https://colab.research.google.com/github/heoge/temp_repo/blob/main/w4/Intro_Pytorch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [29]:
import torch

In [30]:

torch.ones(size=(2, 1)) #1 Unlike in other frameworks, the shape argument is named ''size'' rather than ''shape''


tensor([[1.],
        [1.]])

In [31]:
torch.zeros(size=(2, 1))


tensor([[0.],
        [0.]])

In [32]:
torch.tensor([1, 2, 3], dtype=torch.float32) #2 Unlike in other frameworks, you cannot pass dtype=''float32'' as a string. The dtype argument must be a torch dtype instance.

tensor([1., 2., 3.])

In [33]:
torch.normal( #1 Equivalent to tf.random.normal(shape=(3, 1), mean=0., stddev=1.)
... mean=torch.zeros(size=(3, 1)),
... std=torch.ones(size=(3, 1)))

tensor([[-0.4551],
        [-1.2391],
        [-0.3496]])

In [34]:
torch.rand(3, 1) #1 Equivalent to tf.random.uniform(shape=(3, 1), minval=0., maxval=1.)

tensor([[0.6937],
        [0.8126],
        [0.6752]])


Like NumPy arrays, but unlike TensorFlow tensors, PyTorch tensors are assignable.

In [35]:
x = torch.zeros(size=(2, 1))

In [36]:
x[0, 0] = 1.
x

tensor([[1.],
        [0.]])

While you can just use a regular torch.Tensor to store the trainable state of a model,
PyTorch does provide a specialized tensor subclass for that purpose, the
torch.nn.parameter.Parameter class. Compared to a regular tensor, it provides semantic
clarity – if you see a Parameter, you’ll know it’s a piece of trainable state, whereas a
Tensor could be anything. As a result, it enables PyTorch to automatically track and
retrieve the Parameters you assign to PyTorch models – similar to what Keras does with
Keras Variable instances.

In [37]:
x = torch.zeros(size=(2, 1))
p = torch.nn.parameter.Parameter(data=x) #1 A Parameter can only be created using a torch.Tensor value – no NumPy arrays allowed.

In [38]:
a = torch.ones((2, 2))
b = torch.square(a) #1 Take the square, same as np.square
c = torch.sqrt(a) #2 Take the square root, same as np.sqrt
d = b + c #3 Add two tensors (element-wise)
e = torch.matmul(a, b) #4 Take the product of two tensors (see chapter 2), same as np.matmul
f = torch.cat((a, b), axis=0) #5 Concatenate a and b along axis 0, same as np.concatenate

Dense layer

In [39]:
def dense(inputs, W, b):
  return torch.nn.relu(torch.matmul(inputs, W) + b)


### COMPUTING GRADIENTS WITH PYTORCH

In [57]:
input_var = torch.tensor(3.0, requires_grad=True) #1 In order to compute gradients with respect to a tensor, it must be created with requires_grad=True.
result = torch.square(input_var) #2 Calling backward() populates the ''grad'' attribute on all tensors create with requires_grad=True.
result.backward()
gradient = input_var.grad
gradient


tensor(6.)

If you call backward() multiple times in a row, the .grad attribute will ''accumulate''
gradients: each new call will sum the new gradient with the preexisting one. For instance,
in the code below, input_var.grad is not the gradient of square(input_var) with respect
to input_var, rather it is the sum of that gradient and the previously computed gradient –
hence its value has doubled since our last code snippet.

리셋하지 않으면 계속 6씩 늘어남

In [58]:
result = torch.square(input_var)
result.backward()
input_var.grad

tensor(12.)

In [59]:
result = torch.square(input_var)
result.backward()
input_var.grad

tensor(18.)

In [60]:
result = torch.square(input_var)
result.backward()
input_var.grad

tensor(24.)

In order to reset gradients, you can just set .grad to None:

리셋하는 코드 생성하면 처음으로 돌아감

In [55]:
input_var.grad = None

In [49]:
result = torch.square(input_var) #2 Calling backward() populates the ''grad'' attribute on all tensors create with requires_grad=True.
result.backward()
gradient = input_var.grad
gradient

tensor(6.)

# An end-to-end example: a linear classifier in pure PyTorch

requires_grad=True

In [16]:
input_dim = 2
output_dim = 1
W = torch.rand(input_dim, output_dim, requires_grad=True)
b = torch.zeros(output_dim, requires_grad=True)

This is our model

In [17]:
def model(inputs, W, b):
  return torch.matmul(inputs, W) + b

We just switch from tf.square to torch.square and from
tf.reduce_mean to tf.mean.

In [18]:
def mean_squared_error(targets, predictions):
  per_sample_losses = torch.square(targets - predictions)
  return torch.mean(per_sample_losses)

In [20]:
learning_rate = 0.1
def training_step(inputs, targets, W, b):
  predictions = model(inputs) #1 Forward pass
  loss = mean_squared_error(targets, predictions) #1
  loss.backward() #2 Compute gradients
  grad_loss_wrt_W, grad_loss_wrt_b = W.grad, b.grad #3 Retrieve gradients
  with torch.no_grad():
    W -= grad_loss_wrt_W * learning_rate #4 Update weights inside a no_grad scope
    b -= grad_loss_wrt_b * learning_rate #4
  W.grad = None #5 Reset gradients
  b.grad = None #5
  return loss

Now for the training step. Here’s how it works:
1. loss.backward() runs backpropagation starting from the loss output
node, and populates the tensor.grad attribute on all tensors that were
involved in the computation of loss. tensor.grad represents the
gradient of the loss with regard to that tensor.
2. We use the .grad attribute to recover the gradients of the loss with
regard to W and b.
3. We update W and b using those gradients. Because these updates are not
intended to be part of the backwards pass, we do them inside a
torch.no_grad() scope, which skips gradient computation for everything
inside it.
4. We reset the contents of the .grad property of our W and b parameters,
by setting it None. If we didn’t do this, gradient values would accumulate
across multiple calls to training_step(), resulting in invalid values.

## PACKAGING STATE AND COMPUTATION WITH MODULES

PyTorch also has a higher-level, object-oriented API for performing backpropagation, which
requires relying on two new classes: the torch.nn.Module class, as well as an optimizer
class from the torch.optim module, such as torch.optim.SGD (the equivalent of
keras.optimizers.SGD).
The general idea is to define a subclass of torch.nn.Module, which will:
1. Hold some Parameters, to store state variables. Those are defined in the
init() method.
2. Implement the forward pass computation in the forward() method.

In [21]:
class LinearModel(torch.nn.Module):
  def __init__(self):
    super().__init__()
    self.W = torch.nn.Parameter(torch.rand(input_dim, output_dim))
    self.b = torch.nn.Parameter(torch.zeros(output_dim))
  def forward(self, inputs):
    return torch.matmul(inputs, self.W) + self.b


In [22]:
model = LinearModel()

When using an instance of torch.nn.Module, rather than calling the forward() method
directly, you’d use call(), which redirects to forward() but adds a few framework hooks
to it.

In [23]:
inputs = torch.tensor([[1., 2.], [3., 4.]])
torch_inputs = torch.tensor(inputs)
output = model(inputs)

  torch_inputs = torch.tensor(inputs)


In [24]:
output

tensor([[0.8981],
        [1.8143]], grad_fn=<AddBackward0>)

Now, let’s get our hands on a PyTorch optimizer. To instantiate it, you will need to provide
the list of parameters that the optimizer is intended to update. You can retrieve it from our
Module instance via .parameters().

In [25]:
#learning_rate = 0.001
optimizer = torch.optim.SGD(model.parameters()) #, lr=learning_rate)


Using our Module instance and the PyTorch SGD optimizer, can run a simplified training step:

In [26]:
def training_step(inputs, targets):
  predictions = model(inputs)
  loss = mean_squared_error(targets, predictions)
  loss.backward()
  optimizer.step()
  model.zero_grad()
  return loss

## MAKING PYTORCH MODULES FAST USING COMPILATION


One last thing. Similarly to how TensorFlow lets you compile functions for better
performance, PyTorch lets you compile functions or even Module instances, via the
torch.compile() utility. This API leverages PyTorch’s very own compiler, named Dynamo.
Let’s try it on our linear regression Module:

In [27]:
compiled_model = model.compile()

The resulting object is intended to work identically to the original – except the forward and
backward pass should run faster.
You can also leverage torch.compile() as a function decorator:

In [28]:
@torch.compile
def dense(inputs, W, b):
  return torch.nn.relu(torch.matmul(inputs, W) + b)