<a href="https://colab.research.google.com/github/kweenkeen/bds-files/blob/master/PyTorchPractice1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Tensor

##Warm-up: numpy

Before introducing PyTorch, we will first implement the network using numpy.

Numpy provides an n-dimensional array object, and many functions for manipulating these arrays. Numpy is a generic framework for scientific computing; it does not know anything about computation graphs, or deep learning, or gradients. However, we can easily use numpy to fit a two-layer network to random data by manually implementing the forward and backward passes through the network using numpy operations.

In [3]:
import numpy as np

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random input and output data
x = np.random.randn(N, D_in)
y = np.random.randn(N, D_out)

# Randomly initialize weights
w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)

learning_rate = 1e-6
for t in range(500):
  # Forward pass: compute predicted y
  h = x.dot(w1)
  h_relu = np.maximum(h,0)
  y_pred = h_relu.dot(w2)

  # Compute and print loss
  loss = np.square(y_pred - y)
  print(t, loss)

  # Backprop to compute gradients of w1 and w2 with respect to loss
  grad_y_pred = 2.0 * (y_pred - y)
  grad_w2 = h_relu.T.dot(grad_y_pred)
  grad_h_relu = grad_y_pred.dot(w2.T)
  grad_h = grad_h_relu.copy()
  grad_h[h < 0] = 0
  grad_w1 = x.T.dot(grad_h)

  #Update weights
  w1 -= learning_rate * grad_w1
  w2 -= learning_rate * grad_w2

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
  1.23664134e-11 5.96746892e-11]
 [4.62474474e-07 4.97528966e-08 1.15428547e-07 7.72342084e-09
  1.02887911e-07 7.55788970e-08 9.82605965e-09 2.17396180e-09
  1.54057710e-08 2.98291254e-10]
 [8.41029216e-09 2.23199603e-08 1.02928065e-08 7.69982622e-10
  8.16922572e-09 7.36886884e-12 1.06134956e-10 4.61346512e-09
  1.38076552e-11 4.47371286e-10]]
474 [[2.22644995e-09 3.81035222e-09 2.73155216e-09 2.28641608e-11
  1.14669658e-09 6.10531709e-10 2.85668769e-11 5.01701841e-10
  1.65682877e-11 1.97981132e-10]
 [3.08881467e-08 6.35900710e-09 1.43705669e-08 2.17957695e-09
  4.49646738e-09 5.20162557e-09 3.58985404e-10 7.19047368e-10
  5.76354914e-11 2.23605463e-10]
 [2.85422274e-08 4.43004910e-10 3.97568881e-09 1.75163482e-09
  1.34355636e-09 2.59994032e-09 1.42054626e-09 5.58557472e-10
  1.48523415e-09 2.79079191e-10]
 [8.37229566e-09 7.69449244e-11 4.95876504e-10 7.71304703e-10
  1.09265048e-12 7.01445172e-12 6.24542508e-10 1.7

## PyTorch: Tensors

A **Tensor** is conceptually identical to a numpy array: a Tensor is an n-dimensional array, and PyTorch provides many functions for operating on these Tensors. Behind the scenes, Tensors can keep track of a computational graph and gradients, but they're also useful as a generic tool for scientific computing.

Also unlike numpy, PyTorch Tensors cqan utilize GPUs to accelerate their numeric computations. To run a PyTorch Tensor on a GPU, you simply need to cast it to a new datatype.

Here we use PyTorch Tensors to fit a two-layer network to random data. Like the numpy example above we need to manually implement the forward and backward passes through the network:



In [1]:
# -*- coding: utf-8 -*-

!nvcc --version

import torch
dtype = torch.float
device = torch.device("cpu")
device = torch.device("cuda:0")

# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, 1000, 100, 10

# Create random input and output data
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

# Randomly initialize weights
w1 = torch.randn(D_in, H, device=device, dtype=dtype)
w2 = torch.randn(H, D_out, device=device, dtype=dtype)

learning_rate = 1e-6
for t in range(500):
    # Forward pass: compute predicted y
    h = x.mn(w1)
    h_relu = h.clamp(min=0)
    y_pred = h.relu.mm(w2)

    # Compute and print loss
    loss = (y_pred -y).pow(2).sum().item()
    if t % 100 == 99:
      print(t, loss)
    
    # Backprop to compute gradients of w1 and w2 with respect to loss
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.t().mm(grad_y_pred)
    grad_h_relu = grad_y_pred.mm(w2.t())
    gred_h = grad_h_relu.clone()
    gred_h[h < 0] = 0
    grad_w1 = x.t().mm(grad_h)

    #Update weights using gradient descent
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243


RuntimeError: ignored